A tokenised program file on a disk device has the following format.
FF
Bytes | Format | Meaning |
---|---|---|
2 | Unsigned 16-bit little-endian integer. | Memory location of the line following the current one. This is used internally by GW-BASIC but ignored when a program is loaded. |
2 | Unsigned 16-bit little-endian integer. | The line number. |
Variable | Tokenised BASIC, see below. | The contents of the line. |
1 |
00 (NUL byte)
|
End of line marker. |
1A
is written to mark the end of file. This is optional;
the file will be read without problems if it is omitted.
The printable ASCII characters in the range 20
—7E
are used for string literals, comments, variable names,
and elements of statement syntax that are not reserved words.
Reserved words are represented by their reserved word tokens and
numeric literals are represented by numeric token sequences.
Numeric literals are stored in tokenised programs
according to the following representation. All numbers are positive; negative numbers are stored
simply by preceding the number with EA
, the token for -
.
Class | Bytes | Format |
---|---|---|
Indirect line numbers | 3 |
0E followed by an unsigned 16-bit little-endian integer.
|
Octal integers | 3 |
0B followed by an unsigned 16-bit little-endian integer.
|
Hexadecimal integers | 3 |
0C followed by an unsigned 16-bit little-endian integer.
|
Positive decimal integers less than 11 | 1 |
Tokens 11 —1B represent 0—10.
|
Positive decimal integers less than 256 | 2 |
0F followed by an unsigned 8-bit integer.
|
Other decimal integers | 3 |
1C followed by a two's complement signed 16-bit little-endian integer.
GW-BASIC will recognise a negative number encountered this way but it will not store
negative numbers itself using the two's complement, but rather by preceding the positive
number with EA .
|
Single precision floating-point number | 5 |
1D followed by a four-byte single in Microsoft Binary Format.
|
Double precision floating-point number | 9 |
1F followed by an eight-byte double in Microsoft Binary Format.
|
Most keywords in PC-BASIC are reserved words. Reserved words are represented in a tokenised program by a single- or double-byte token. The complete list is below.
All function names and operators are reserved words and all statements start with a reserved word
(which in the case of LET
is optional). However, the converse is not true:
not all reserved words are statements, functions, or operators.
For example, TO
and SPC(
only occur as part of a statement syntax.
Furthermore, some keywords that form part of statement syntax are not reserved words:
examples are AS
, BASE
, and ACCESS
.
Keywords that are not reserved words are spelt out in full text in the tokenised source.
A variable or user-defined function name must not be identical to a reserved word. The list below is an exhaustive list of reserved words that can be used to determine whether a name is legal.
81
END
82
FOR
83
NEXT
84
DATA
85
INPUT
86
DIM
87
READ
88
LET
89
GOTO
8A
RUN
8B
IF
8C
RESTORE
8D
GOSUB
8E
RETURN
8F
REM
90
STOP
91
PRINT
92
CLEAR
93
LIST
94
NEW
95
ON
96
WAIT
97
DEF
98
POKE
99
CONT
9C
OUT
9D
LPRINT
9E
LLIST
A0
WIDTH
A1
ELSE
A2
TRON
A3
TROFF
A4
SWAP
A5
ERASE
A6
EDIT
A7
ERROR
A8
RESUME
A9
DELETE
AA
AUTO
AB
RENUM
AC
DEFSTR
AD
DEFINT
AE
DEFSNG
AF
DEFDBL
B0
LINE
B1
WHILE
B2
WEND
B3
CALL
B7
WRITE
B8
OPTION
B9
RANDOMIZE
BA
OPEN
BB
CLOSE
BC
LOAD
BD
MERGE
BE
SAVE
BF
COLOR
C0
CLS
C1
MOTOR
C2
BSAVE
C3
BLOAD
C4
SOUND
C5
BEEP
C6
PSET
C7
PRESET
C8
SCREEN
C9
KEY
CA
LOCATE
CC
TO
CD
THEN
CE
TAB(
CF
STEP
D0
USR
D1
FN
D2
SPC(
D3
NOT
D4
ERL
D5
ERR
D6
STRING$
D7
USING
D8
INSTR
D9
'
DA
VARPTR
DB
CSRLIN
DC
POINT
DD
OFF
DE
INKEY$
E6
>
E7
=
E8
<
E9
+
EA
-
EB
*
EC
/
ED
^
EE
AND
EF
OR
F0
XOR
F1
EQV
F2
IMP
F3
MOD
F4
\
FD81
CVI
FD82
CVS
FD83
CVD
FD84
MKI$
FD85
MKS$
FD86
MKD$
FD8B
EXTERR
FE81
FILES
FE82
FIELD
FE83
SYSTEM
FE84
NAME
FE85
LSET
FE86
RSET
FE87
KILL
FE88
PUT
FE89
GET
FE8A
RESET
FE8B
COMMON
FE8C
CHAIN
FE8D
DATE$
FE8E
TIME$
FE8F
PAINT
FE90
COM
FE91
CIRCLE
FE92
DRAW
FE93
PLAY
FE94
TIMER
FE95
ERDEV
FE96
IOCTL
FE97
CHDIR
FE98
MKDIR
FE99
RMDIR
FE9A
SHELL
FE9B
ENVIRON
FE9C
VIEW
FE9D
WINDOW
FE9E
PMAP
FE9F
PALETTE
FEA0
LCOPY
FEA1
CALLS
FEA5
PCOPY
FEA7
LOCK
FEA8
UNLOCK
FF81
LEFT$
FF82
RIGHT$
FF83
MID$
FF84
SGN
FF85
INT
FF86
ABS
FF87
SQR
FF88
RND
FF89
SIN
FF8A
LOG
FF8B
EXP
FF8C
COS
FF8D
TAN
FF8E
ATN
FF8F
FRE
FF90
INP
FF91
POS
FF92
LEN
FF93
STR$
FF94
VAL
FF95
ASC
FF96
CHR$
FF97
PEEK
FF98
SPACE$
FF99
OCT$
FF9B
LPOS
FF9A
HEX$
FF9C
CINT
FF9D
CSNG
FF9E
CDBL
FF9F
FIX
FFA0
PEN
FFA1
STICK
FFA2
STRIG
FFA3
EOF
FFA4
LOC
FFA5
LOF
The following additional reserved words are activated by the option
syntax={pcjr|tandy}
.
FEA4
NOISE
FEA6
TERM
The tokens 10
, 1E
and 0D
are
known to be used internally by GW-BASIC. They should not appear in a
correctly stored tokenised program file.
Floating point numbers in GW-BASIC and PC-BASIC are represented in Microsoft Binary Format (MBF), which differs from the IEEE 754 standard used by practically all modern software and hardware. Consequently, binary files generated by either BASIC are fully compatible with each other and with some applications contemporary to GW-BASIC, but not easily interchanged with other software. QBASIC, for example, uses IEEE floats.
MBF differs from IEEE in the position of the sign bit and in using only 8 bits for the exponent, both in single- and in double-precision. This makes the range of allowable numbers in an MBF double-precision number smaller, but their precision higher, than for an IEEE double: an MBF single has 23 bits of precision, while an MBF double has 55 bits of precision. Both have the same range.
Unlike IEEE, the Microsoft Binary Format does not support signed zeroes, subnormal numbers, infinities or not-a-number values.
MBF floating point numbers are represented in bytes as follows:
Here, E0 is the exponent byte and the other bytes form the mantissa, in little-endian order so that M1 is the most significant byte. The most significant bit of M1 is the sign bit, followed by the most significant bits of the mantissa: M1 = s0 f1 f2 f3 f4 f5 f6 f7. The other bytes contain the less-significant mantissa bits: M2 = f8 f9 fA fB fC fD fE fF, and so on.
The value of the floating-point number is v = 0 if E0 = 0 and v = (-1) s0 × mantissa × 2 E0 - 128 otherwise, where the mantissa is formed as a binary fraction mantissa = 0 . 1 f1 f2 f3 ...
The protected format is an encrypted form of the tokenised format. GW-BASIC would refuse to show the source code of such files. This protection scheme could easily be circumvented by changing a flag in memory. Deprotection programs have circulated widely for decades and the decryption algorithm and keys were published in a mathematical magazine.
A protected program file on a disk device has the following format.
FE
0B 0A 09 08 07 06 05 04 03 02 01
1E 1D C4 77 26 97 E0 74 59 88 7C
A9 84 8D CD 75 83 43 63 24 83 19 F7 9A
0D 0C 0B 0A 09 08 07 06 05 04 03 02 01
1A
is written to mark the end of file. This is optional;
the file will be read without problems if it is omitted. Since the
end-of-file marker of the tokenised program is included in the encrypted
content, a protected file is usually one byte longer than its
unprotected equivalent.
BSAVE
file formatA memory-dump file on a disk device has the following format.
FD
Bytes | Format | Meaning |
---|---|---|
2 | Unsigned 16-bit little-endian integer. | Segment of the memory block. |
2 | Unsigned 16-bit little-endian integer. | Offset of the first byte of the memory block. |
2 | Unsigned 16-bit little-endian integer. | Length of the memory block in bytes. |
1A
is written to mark the end of file. This is optional;
the file will be read without problems if it is omitted.
Files on cassette are stored as frequency-modulated sound. The payload format of files on cassette is the same as for files on disk device, but the headers are different and the files may be split in chunks.
A 1-bit is represented by a single 1 ms wave period (1000 Hz). A 0-bit is represented by a single 0.5 ms wave period (2000 Hz).
A byte is sent as 8 bits, most significant first. There are no start- or stopbits.
A file is made up of two or more records. Each record has the following format:
Length | Format | Meaning |
---|---|---|
256 bytes |
All FF
|
2048 ms pilot wave at 1000 Hz, used for calibration. |
1 bit |
0
|
Synchronisation bit. |
1 byte |
16 (SYN )
|
Synchronisation byte. |
256 bytes | Data block. | |
2 bytes | Unsigned 16-bit big-endian integer | CRC-16-CCITT checksum. |
31 bits | 30 1s followed by a 0. | End of record marker. |
Tokenised, protected and BSAVE
files consist of a header record
followed by a single record which may contain multiple 256-byte data blocks, each followed by the 2 CRC bytes.
Plain text program files and data files consist of a header record followed by multiple single-block records.
Bytes | Format | Meaning |
---|---|---|
1 |
A5
|
Header record magic byte |
8 | 8 characters | Filename. |
1 |
File type. 00 for data file,
01 for memory dump,
20 or A0 for protected,
40 for plain text program,
80 for tokenised program.
|
|
2 | Unsigned 16-bit little-endian integer | Length of next data record, in bytes. |
2 | Unsigned 16-bit little-endian integer | Segment of memory location. |
2 | Unsigned 16-bit little-endian integer | Offset of memory location. |
1 |
00
|
End of header data |
239 |
All 01
|
Filler |
Bytes | Format | Meaning |
---|---|---|
1 | 8-bit unsigned integer | Number of payload bytes in last record, plus one. If zero, the next record is not the last record. |
255 | Payload data. If this is the last record, any unused bytes are filled by repeating the last payload byte. |
PC-BASIC uses a number of file formats to support its emulation of legacy hardware, which are documented in this section. These file formats are not used by GW-BASIC or contemporary software.
The HEX file format for bitfonts was developed by Roman Czyborra for the GNU Unifont package. PC-BASIC uses an extended version of this file format to store its fonts.
A HEX file is an ASCII text file, consisting of lines terminated by LF
.
Each line of this file is one of the following:
#
character.
Unicode-codepage mappings are stored in UCP files.
A UCP file is an ASCII text file, consisting of lines terminated by LF
.
Each line of this file is one of the following:
#
character.
A CAS file is a bit-level representation of cassette data introduced by the PCE emulator.
CAS-files produced by PC-BASIC start with the characters
PC-BASIC tapeEOF
. This sequence is followed by seven 0 bits,
followed by the tape contents. The seven zero bits are intended to
ensure that the tape contents are byte-aligned; the one bit is made up
by the synchronisation bit following the pilot wave.
Note that PC-BASIC does not require the introductory sequence to read a CAS-file correctly, nor does it require the contents of a CAS-file to be byte-aligned. However, new files produced by PC-BASIC follow this convention.
Depending on context, PC-BASIC will treat a code point in
the control characters range as a control character or as a
glyph defined by the active codepage which by
default is codepage 437. Code points of
&h80
or higher are always interpreted as a
codepage glyph.
This is a list of the American Standard Code for Information Interchange (ASCII).
ASCII only covers 128 characters and defines the code point ranges
&h00
–&h1F
and &h7F
as control characters which do not have a printable glyph assigned
to them. This includes such values as the Carriage Return (CR
)
character that ends a program line.
In the context of this documentation, character &h1A
(SUB
)
will usually be indicated as EOF
since it plays the role of end-of-file marker in DOS.
_0 |
_1 |
_2 |
_3 |
_4 |
_5 |
_6 |
_7 |
_8 |
_9 |
_A |
_B |
_C |
_D |
_E |
_F |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0_ |
NUL |
SOH |
STX |
ETX |
EOT |
ENQ |
ACK |
BEL |
BS |
HT |
LF |
VT |
FF |
CR |
SO |
SI |
1_ |
DLE |
DC1 |
DC2 |
DC3 |
DC4 |
NAK |
SYN |
ETB |
CAN |
EM |
SUB |
ESC |
FS |
GS |
RS |
US |
2_ |
|
! |
" |
# |
$ |
% |
& |
' |
( |
) |
* |
+ |
, |
- |
. |
/ |
3_ |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
: |
; |
< |
= |
> |
? |
4_ |
@ |
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
K |
L |
M |
N |
O |
5_ |
P |
Q |
R |
S |
T |
U |
V |
W |
X |
Y |
Z |
[ |
\ |
] |
^ |
_ |
6_ |
` |
a |
b |
c |
d |
e |
f |
g |
h |
i |
j |
k |
l |
m |
n |
o |
7_ |
p |
q |
r |
s |
t |
u |
v |
w |
x |
y |
z |
{ |
| |
} |
~ |
DEL |
This table shows the characters that are produced by the 256 single-byte code points when the DOS Latin USA codepage 437 is loaded, which is the default. Other codepages can be loaded to assign other characters to these code points.
&h00
cannot be redefined.
&h20
–&h7E
will result in a different glyph being shown on the screen, but the
character will continue to be treated as the corresponding ASCII character.
It will retain its ASCII value when transcoded into UTF-8. This happens,
for example, with the Yen sign (¥
) which is
assigned to ASCII code point &h5C
in code page 932:
in that codepage it is treated as if it were a backslash (\
).
_0 |
_1 |
_2 |
_3 |
_4 |
_5 |
_6 |
_7 |
_8 |
_9 |
_A |
_B |
_C |
_D |
_E |
_F |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0_ |
|
☺ |
☻ |
♥ |
♦ |
♣ |
♠ |
• |
◘ |
○ |
◙ |
♂ |
♀ |
♪ |
♫ |
☼ |
1_ |
► |
◄ |
↕ |
‼ |
¶ |
§ |
▬ |
↨ |
↑ |
↓ |
→ |
← |
∟ |
↔ |
▲ |
▼ |
2_ |
|
! |
" |
# |
$ |
% |
& |
' |
( |
) |
* |
+ |
, |
- |
. |
/ |
3_ |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
: |
; |
< |
= |
> |
? |
4_ |
@ |
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
K |
L |
M |
N |
O |
5_ |
P |
Q |
R |
S |
T |
U |
V |
W |
X |
Y |
Z |
[ |
\ |
] |
^ |
_ |
6_ |
` |
a |
b |
c |
d |
e |
f |
g |
h |
i |
j |
k |
l |
m |
n |
o |
7_ |
p |
q |
r |
s |
t |
u |
v |
w |
x |
y |
z |
{ |
| |
} |
~ |
⌂ |
8_ |
Ç |
ü |
é |
â |
ä |
à |
å |
ç |
ê |
ë |
è |
ï |
î |
ì |
Ä |
Å |
9_ |
É |
æ |
Æ |
ô |
ö |
ò |
û |
ù |
ÿ |
Ö |
Ü |
¢ |
£ |
¥ |
₧ |
ƒ |
A_ |
á |
í |
ó |
ú |
ñ |
Ñ |
ª |
º |
¿ |
⌐ |
¬ |
½ |
¼ |
¡ |
« |
» |
B_ |
░ |
▒ |
▓ |
│ |
┤ |
╡ |
╢ |
╖ |
╕ |
╣ |
║ |
╗ |
╝ |
╜ |
╛ |
┐ |
C_ |
└ |
┴ |
┬ |
├ |
─ |
┼ |
╞ |
╟ |
╚ |
╔ |
╩ |
╦ |
╠ |
═ |
╬ |
╧ |
D_ |
╨ |
╤ |
╥ |
╙ |
╘ |
╒ |
╓ |
╫ |
╪ |
┘ |
┌ |
█ |
▄ |
▌ |
▐ |
▀ |
E_ |
α |
ß |
Γ |
π |
Σ |
σ |
µ |
τ |
Φ |
Θ |
Ω |
δ |
∞ |
φ |
ε |
∩ |
F_ |
≡ |
± |
≥ |
≤ |
⌠ |
⌡ |
÷ |
≈ |
° |
∙ |
· |
√ |
ⁿ |
² |
■ |
|
PC-BASIC uses PC/XT scancodes, which originated on the 83-key IBM Model F keyboard
supplied with the IBM PC 5150. The layout of this keyboard was quite distinct
from modern standard keyboards with 101 or more keys, but keys on a modern keyboard produce
the same scancode as the key with the same function on the Model F. For example,
the key that (on a US keyboard) produces the \ was located next to
the left Shift key on the Model F keyboard and has scancode
&h2B
. The (US) backslash key still has this scancode, even
though it is now usually found above the Enter key.
To further complicate matters, keyboards for different locales have their layout remapped in software rather than in hardware, which means that they produce the same scancode as the key that on a US keyboard is in the same location, regardless of which character they actually produce.
Therefore, the A on a French keyboard will produce the same scancode as the Q on a UK or US keyboard. The aforementioned US \ key is identified with the key that is generally found to the bottom left of Enter on non-US keyboards. For example, on my UK keyboard this is the # key. Non-US keyboards have an additional key next to the left Shift which on the UK keyboard is the \. Therefore, while this key is in the same location and has the same function as the Model F \, it has a different scancode.
In the table below, the keys are marked by their function on a US keyboard, but it should be kept in mind that the scancode is linked to the position, not the function, of the key.
Key | Scancode |
---|---|
Esc | 01 |
1 ! | 02 |
2 @ | 03 |
3 # | 04 |
4 $ | 05 |
5 % | 06 |
6 ^ | 07 |
7 & | 08 |
8 * | 09 |
9 ( | 0A |
0 ) | 0B |
- _ | 0C |
= + | 0D |
Backspace | 0E |
Tab | 0F |
q Q | 10 |
w W | 11 |
e E | 12 |
r R | 13 |
t T | 14 |
y Y | 15 |
u U | 16 |
i I | 17 |
o O | 18 |
p P | 19 |
[ { | 1A |
] } | 1B |
Enter | 1C |
Ctrl | 1D |
a A | 1E |
s S | 1F |
d D | 20 |
f F | 21 |
g G | 22 |
h H | 23 |
j J | 24 |
k K | 25 |
l L | 26 |
; : | 27 |
' " | 28 |
` ~ | 29 |
Left Shift | 2A |
\ | | 2B |
z Z | 2C |
x X | 2D |
c C | 2E |
v V | 2F |
b B | 30 |
n N | 31 |
m M | 32 |
, < | 33 |
. > | 34 |
/ ? | 35 |
Right Shift | 36 |
keypad * PrtSc | 37 |
Alt | 38 |
Space | 39 |
Caps Lock | 3A |
F1 | 3B |
F2 | 3C |
F3 | 3D |
F4 | 3E |
F5 | 3F |
F6 | 40 |
F7 | 41 |
F8 | 42 |
F9 | 43 |
F10 | 44 |
Num Lock | 45 |
Scroll Lock Pause | 46 |
keypad 7 Home | 47 |
keypad 8 ↑ | 48 |
keypad 9 Pg Up | 49 |
keypad - | 4A |
keypad 4 ← | 4B |
keypad 5 | 4C |
keypad 6 → | 4D |
keypad + | 4E |
keypad 1 End | 4F |
keypad 2 ↓ | 50 |
keypad 3 Pg Dn | 51 |
keypad 0 Ins | 52 |
keypad . Del | 53 |
SysReq | 54 |
\ | (Non-US 102-key) | 56 |
F11 | 57 |
F12 | 58 |
Left Logo (Windows 104-key) | 5B |
Right Logo (Windows 104-key) | 5C |
Menu (Windows 104-key) | 5D |
ひらがな/カタカナ Hiragana/Katakana (Japanese 106-key) | 70 |
\ _ (Japanese 106-key) | 73 |
変換 Henkan (Japanese 106-key) | 79 |
無変換 Muhenkan (Japanese 106-key) | 7B |
半角/全角 Hankaku/Zenkaku (Japanese 106-key) | 29 |
¥ | (Japanese 106-key) | 7D |
한자 Hanja (Korean 103-key) | F1 |
한/영 Han/Yeong (Korean 103-key) | F2 |
\ ? ° (Brazilian ABNT2) | 73 |
keypad . (Brazilian ABNT2) | 7E |
Alongside scancodes, most keys also carry
a character value the GW-BASIC documentation calls extended ASCII.
Since this is a rather overloaded term, we shall use the abbreviation
e-ASCII exclusively for these values.
The values returned by the
INKEY$
function are e-ASCII values.
e-ASCII codes are one or
two bytes long; single-byte codes are simply ASCII codes whereas
double-byte codes consist of a NUL
character plus
a code indicating the key pressed. Some, but certainly not all,
of these codes agree with the keys' scancodes.
Unlike scancodes, e-ASCII codes of unmodified keys and those of keys modified by Shift, Ctrl or Alt are all different.
Unmodified, Shifted and Ctrled e-ASCII codes are connected to a key's meaning, not its location. For example, the e-ASCII for Ctrl+a are the same on a French and a US keyboard. By contrast, the Alted codes are connected to the key's location, like scancodes. The US keyboard layout is used in the table below.
Key | e-ASCII | e-ASCII Shift | e-ASCII Ctrl | e-ASCII Alt |
---|---|---|---|---|
Esc | 1B |
1B |
1B |
|
1 ! | 31 |
21 |
00 78 |
|
2 @ | 32 |
40 |
00 03 |
00 79 |
3 # | 33 |
23 |
00 7A |
|
4 $ | 34 |
24 |
00 7B |
|
5 % | 35 |
25 |
00 7C |
|
6 ^ | 36 |
5E |
1E |
00 7D |
7 & | 37 |
26 |
00 7E |
|
8 * | 38 |
2A |
00 7F |
|
9 ( | 39 |
28 |
00 80 |
|
0 ) | 30 |
29 |
00 81 |
|
- _ | 2D |
5F |
1F |
00 82 |
= + | 3D |
2B |
00 83 |
|
Backspace | 08 |
08 |
7F |
00 8C |
Tab | 09 |
00 0F |
00 8D |
00 8E |
q Q | 71 |
51 |
11 |
00 10 |
w W | 77 |
57 |
17 |
00 11 |
e E | 65 |
45 |
05 |
00 12 |
r R | 72 |
52 |
12 |
00 13 |
t T | 74 |
54 |
14 |
00 14 |
y Y | 79 |
59 |
19 |
00 15 |
u U | 75 |
55 |
15 |
00 16 |
i I | 69 |
49 |
09 |
00 17 |
o O | 6F |
4F |
0F |
00 18 |
p P | 70 |
50 |
10 |
00 19 |
[ { | 5B |
7B |
1B |
|
] } | 5D |
7D |
1D |
|
Enter | 0D |
0D |
0A |
00 8F |
a A | 61 |
41 |
01 |
00 1E |
s S | 73 |
53 |
13 |
00 1F |
d D | 64 |
44 |
04 |
00 20 |
f F | 66 |
46 |
06 |
00 21 |
g G | 67 |
47 |
07 |
00 22 |
h H | 68 |
48 |
08 |
00 23 |
j J | 6A |
4A |
0A |
00 24 |
k K | 6B |
4B |
0B |
00 25 |
l L | 6C |
4C |
0C |
00 26 |
; : | 3B |
3A |
||
' " | 27 |
22 |
||
` ~ | 60 |
7E |
||
\ | | 5C |
7C |
1C |
|
z Z | 7A |
5A |
1A |
00 2C |
x X | 78 |
58 |
18 |
00 2d |
c C | 63 |
43 |
03 |
00 2E |
v V | 76 |
56 |
16 |
00 2F |
b B | 62 |
42 |
02 |
00 30 |
n N | 6E |
4E |
0E |
00 31 |
m M | 6D |
4D |
0D |
00 32 |
, < | 2C |
3C |
||
. > | 2E |
3E |
||
/ ? | 2F |
3F |
||
PrtSc | 00 72 |
00 46 |
||
Space | 20 |
20 |
20 |
00 20 |
F1 | 00 3B |
00 54 |
00 5E |
00 68 |
F2 | 00 3C |
00 55 |
00 5F |
00 69 |
F3 | 00 3D |
00 56 |
00 60 |
00 6A |
F4 | 00 3E |
00 57 |
00 61 |
00 6C |
F5 | 00 3F |
00 58 |
00 62 |
00 6D |
F6 | 00 40 |
00 59 |
00 63 |
00 6E |
F7 | 00 41 |
00 5A |
00 64 |
00 6F |
F8 | 00 42 |
00 5B |
00 65 |
00 70 |
F9 | 00 43 |
00 5C |
00 66 |
00 71 |
F10 | 00 44 |
00 5D |
00 67 |
00 72 |
F11 (Tandy) | 00 98 |
00 A2 |
00 AC |
00 B6 |
F12 (Tandy) | 00 99 |
00 A3 |
00 AD |
00 B7 |
Home | 00 47 |
00 47 |
00 77 |
|
End | 00 4F |
00 4F |
00 75 |
|
PgUp | 00 49 |
00 49 |
00 84 |
|
PgDn | 00 51 |
00 51 |
00 76 |
|
↑ | 00 48 |
00 48 |
||
← | 00 4B |
00 87 |
00 73 |
|
→ | 00 4D |
00 88 |
00 74 |
|
↓ | 00 50 |
00 50 |
||
keypad 5 | 35 |
35 |
05 |
|
Ins | 00 52 |
00 52 |
||
Del | 00 53 |
00 53 |
PC-BASIC (rather imperfectly) emulates the memory of real-mode MS-DOS.
This means that memory can be addressed in segments of 64 KiB.
Each memory address is given by the segment value and the 0--65535
byte offset with respect to that segment.
Note that segments overlap: the actual memory address is found by segment*16 + offset
.
The maximum memory size that can be addressed by this scheme is thus 1 MiB, which was the size of the conventional and upper
memory in real-mode MS-DOS.
Areas of memory with a special importance are:
Segment | Name | Purpose |
---|---|---|
&h0000
|
Low memory | Holds machine information, among other things |
&h13AD (may vary)
|
Data segment | Program code, variables, arrays, strings |
&hA000 (EGA) &hB000 (MDA) &hB800 (CGA) |
Video segment | Text and graphics on visible and virtual screens |
&hC000
|
-- | RAM font definition, among other things |
&hF000
|
Read-only memory | ROM font definition, among other things |
The data segment is organised as follows. The addresses may vary depending on the settings of various options; given here are the default values for GW-BASIC 3.23.
Offset | Size (bytes) | Function |
---|---|---|
&h0000
|
3429
|
Interpreter workarea. Unused in PC-BASIC; can be adjusted with
the --reserved-memory option.
|
&h0D65
|
(max-files+1) * 322
|
File blocks: one for the program plus one for each file allowed by --max-files |
&h126D
|
3 + c
|
Program code. An empty program uses 3 bytes. |
&h1270 + c
|
v
|
Scalar variables. |
&h1270 + c + v
|
a
|
Array variables. |
&hFDFC - s
|
a
|
String variables, filled downward from &hFDFC
|
&hFDFC
|
512
|
BASIC stack, size set by CLEAR statement.
|
&hFFFE
|
Top of data segment, set by CLEAR statement.
|