-
Notifications
You must be signed in to change notification settings - Fork 0
/
z80float.doc
339 lines (278 loc) · 14.7 KB
/
z80float.doc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
Adding a Floating Point Processor to a CP/M system
--------------------------------------------------
August 31, 1990 Update
One of my fondest memories is from an old edition of Byte, where
the venerable 8080A is rated against an IBM 370, and comes out
looking pretty good, except for floating point operations. Here's
where the 370 gets to eat our dust, fellow 8 bitters.
For the price of two I/O ports, and a couple of hundred dollars
in cash, you can add a hardware floating point processor to your
CP/M system. The hardware interface is dead simple; on my Ampro
little board, it consists of the single chip floating point
processor and a 74LS00 NAND gate. The software interface is
equally simple; think of the chip as an RPN calculator, where you
push operands on the stack, execute an operation, pop the result
off, and you've got the picture.
The chip in question is the AMD 9511A Arithmetic Processor Unit,
also known as the Intel 8231A (to save my fingers and your eyes,
from here on in I'll refer to it at the APU). It runs around $200
Canadian, and it's worth every penny. What can it do? Here's the
rundown:
o 32 bit floating point format,
o 32 and 16 bit integer formats,
o add, subtract, multiply, divide in all formats,
o trig and inverse trig operation in floating point format,
o square root, logarithms (natural and base 10), and
exponentiation,
o converts between float and 32/16 bit integer formats,
o 4 level floating point or 32 bit integer stack,
o 8 level 16 bit integer stack,
o overflow, underflow, range, divide by zero error indicators
There's also a sister chip, the AMD 9512/Intel 8232, that handles
floating point only, with single (32 bit) and double (64 bit)
precisions. It has the same pin-out at the 9511/8231, but only
does the basic four operations: addition, subtraction,
multiplication and division.
Before you start
----------------
A few things to think about should be mentioned right off the
top. The APU isn't a scientific calculator; the data range in the
floating point format is somewhat restricted. The largest
floating point value that can be represented is 9.22E+18; the
smallest is 5.42E-20. To put this into some kind of perspective,
the single precision IEEE floating point format goes a magnitude
of two bigger and smaller. The range of a scientific calculator
extends from 9.9E+99 down to 1.0E-99.
So if you can't stand to lose any of the range on your floating
point numbers, this may not be for you.
Second, not all languages are good candidates for adding an APU.
At the very least, you'll need to be able to replace some of the
runtime library. If there isn't a obvious runtime library, as is
the case with BASIC and Pascal, you're pretty much out of luck.
Also, the closer that the floating point format used by your
language is to the format used by the APU, the better. Something
like Cromemco C, which stores it's floating point numbers in BCD,
or Turbo Pascal, which uses 6 bytes, are probably best avoided.
Some fairly good candidates are:
o Turbo Modula-2 (trig and higher math functions successfully
replaced)
o Microsoft FORTRAN-80 (commercial packages in the early 80s
reported success with replacing all floating point
operations, and long integer as well)
o Aztec C (although the exponent is base 256 rather than base
2, the availability of the source to entire runtime library
raises the possibility of rewriting the whole works).
The Hardware
------------
Hooking up the APU is very easy. You need to supply the
following:
o +12 and +5 volt supplies
o a birectional 8 bit data bus
o a 2 to 4MHz TTL level clock (depending on the speed of the
APU chip you buy)
o an active high reset line
o an active low chip select
o an active low read strobe
o an active low write strobe
o optionally, a connection to the /WAIT line on the Z80
Nothing very special is required, but there are a few oddities
for a chip supposed designed by Intel for 8080/5 compatibility.
First is the active high reset line. Most CPUs, including the
8080, 8085 and Zilog's Z80 has active low resets. You'll need an
inverter to change the sense of this line. Second is the /PAUSE,
or READY output from the APU. While this is of the correct
polarity to connect directly to the Z80s /WAIT line, it would
have been nice if it had been an open collector output (if this
is the only slow device in your system it makes no difference -
hook it up direct).
Something to watch out for when your designing your interface is
the timing of your chip select and write strobes. The trailing
edge of the write pulse _must_ occur at least 60ns before the
chip select is deasserted. When I first hooked the APU up the
expansion board on my Ampro Little Board, I could not write to
the chip because the chip select was derived from the read and
write strobes as well as the decoded address. The solution I
chose was to permanently ground the chip select pin on the APU,
and OR the write strobe with the chip selected generated by the
PAL on the expansion board (so the APU is always selected, but
only sees a read or write pulse when the port address is
correct).
The Software
------------
I found it convenient to separate the software modules into an
APU communications module written in assembler, and a math
function module written in a high level language (Modula-2). The
communications module, APUlib, handles pushing and popping
operands to the APU stack, executing APU commands and returning
statuses (stati?), and translating between the IEEE floating
point format used by Turbo Modula-2 and the internal APU format.
The math library, MathLib, calls the routines in APUlib to
perform the various trig, log, and exponentiation functions
defined in the library.
APUlib contains six routines callable from Modula-2. There are
two routines for each data type. The only difference between the
two is the number of operands pushed on the stack, one or two.
o APUfp1 pushes one floating point operand on the APU stack
(after conversion), executes an APU command, pops and
converts a floating point result from the stack;
o APUfp2 pushes two floating point operands on the APU stack
(after conversion), executes an APU command, pops and
converts a floating point result from the stack;
o APUdi1 pushes one 32 bit integer operand on the APU stack,
executes an APU command, and pops a 32 bit result from the
stack
o APUdi2 pushes two 32 bit integer operands on the APU stack,
executes an APU command, and pops a 32 bit result from the
stack
o APUsi1 pushes one 16 bit integer operand on the APU stack,
executes an APU command, and pops a 16 bit result from the
stack
o APUsi2 pushes two 16 bit integer operands on the APU stack,
executes an APU command, and pops a 16 bit result from the
stack
All of the routines return an integer error code, indicating the
success or failure of the APU operation. Zero indicates success,
non-zero, failure. The specific code returned is the 4 error bits
from the APU status register, shifted left one bit.
All of the routines pass their operands and result by reference
(ie a pointer to the data in HL). The floating point routines
ALTER THE FORMAT OF THE OPERANDS IN PLACE. This is normally not a
good idea! The design of MathLib makcs it possible; all MathLib
floating point parameter are passed by value (ie a temporary copy
is made on the stack of the variable from the caller). It is this
temporary stack copy that is altered.
MathLib is an replacement and extension of the Turbo Modula-2
MathLib module. All the MathLib functions have been duplicated,
and new routines that represent hardware APU functions (Tan,
Arccos, Arcsin, Log, Pwr, etc.) have been added.
No attempt has been made in MathLib to limit incoming floating
point numbers to the subrange of values that can be handled by
the APU. This hasn't proved to be a problem with the application
I've been working with, but keep it in mind. If anyone knows of a
fast way to handle it, perhaps they could add it to the next
release.
To use APUlib, and MathLib, you must first compile the .DEF and
converted from .REL format to .MCD format with the REL utility.
Delete the existing MATHLIB.MCD and MATHLIB.SYM from your
SYSLIB.LIB, and add in APULIB.MCD and .SYM, and MATHLIB.MCD and
automatically use the new APU based functions. .COM files will
need to be relinked, of course.
Remember that for programs using overlays, APULIB will have to be
included with MATHLIB.
What you can expect
-------------------
If your application makes use of lots of trigonometric functions,
logarithms, or square roots, you will notice the increase in
speed. Will you EVER notice! A graphic sky chart generator
program written in Turbo Modula-2, SKY, used to take 3m32s
seconds to read in and perform trig on a database of 2000+ stars.
Displaying the data took 0m53s. Just by replacing MathLib, read
time when down to 0m55s, and display time to 0m22s!
As a experiment, I created four new functions in MathLib: Fadd,
Fsub, Fdiv and Fmul, to replace the floating point addition,
subtraction, division and multiplication operators. After
altering a great deal of source code to use the function calls, I
discovered that in some cases, the new code took longer! I didn't
persue the matter, but it appears at first blush that the
overhead of calling the MathLib function, then the APULib
routine, is greater than the time required to perform the
operation in software. I'm not sure that I believe this, but I
haven't disassembled the code to see just what is going on. But
if anyone knows where in TM-2 that the basic REAL and
LONGINT, for that matter, operators are handled, I would be _very_
interested to hear from them.
APU Floating Point Format
-------------------------
The format used by the APU to store a float point number is a
normalized 24 bit mantissa, 7 bit 2's complement exponent, and 1
bit mantissa sign. In memory, it looks like this
adr+0 - least significant byte of mantissa
adr+1 - middle byte of mantissa
adr+2 - most significant byte of mantissa
adr+3 - 2s complement exponent (bits 0-6) & mantissa sign (bit 7)
The most significant bit of adr+2 is always one (normalized
format, no hidden most significant bit), except for the value
zero, which is represented by four bytes of zero.
This is also the order in which floating point numbers are pushed
into the APU; from least significant byte to most significant.
When popping results from the stack, the most significant byte
comes off first.
To convert the APU floating point format to and from IEEE format
involves two steps: converted a 7 bit 2s complement exponent to
an 8 bit exponent with a bias of 127, and shifting the least
significant bit of the exponent into the most significant bit of
the mantissa. In the IEEE format, the most significant bit of the
mantissa is assumed to be one, and it's place is used for the
least significant bit of the exponent. The exponent itself is
biased by 127; an exponent of 0 represent the value zero. The
purpose of biasing the exponent is to make comparing two floating
point numbers easier (simply byte by byte comparison can be
used).
There _really_ should be a check in the pushf routine in APUlib
to check for IEEE exponents that cannot be converted to APU
exponents (8 bits into 7 bits). There's no problem in popf, as 7
bits can always be expanded to 8.
Conversion to and from other floating point formats will involve
the same steps. If the number comes in not normalized (there is
no assumed mantissa msb set, or the msb of the mantissa is not
set), it must be normalized. Any bias on the exponent must be
removed, and if the exponent is not base 2, the mantissa and
exponent must be adjusted so that it is. Honestly, I'm not sure
how to do this, because I haven't had to do it yet!
adjusted
Using the APU with another language
-----------------------------------
To use the APU with another language, several things need to be
done:
o the floating point conversions in pushf and popf will need to
be changed to reflect the floating point format used in the
language
o the parameter passing mechanism used by the language will
dictate changes to the APUlib routines; TM-2 passes
everything on the stack, while FORTRAN-80 has a different
scheme
o the math routines in the language's link library will have to
be replaced with APU equivalents
o your applications must be recompiled with the new link
library
I realize this is _extremely_ sketchy, but every language will
present different problems, and will require different solutions.
All I can tell you is _what_ I think you'll have to do, and
you'll have to carry the ball from there and figure out _how_ to
do it.
Final notes
-----------
A search of the SIG/M and CPMUG user libraries turned up
virtually nothing in the way of floating point software, hardware
descriptions, or documentation. Nor have I seen much in my
current travels on the Z-nodes. I hope this library goes a little
way towards filling this void.
On the off chance you're that one in a thousand person who wants
to use APUlib with Turbo Modula-2, but can't use the REL utility
because you can't generate a system with a large enough TPA for
REL to run, I've included APULIB.MCD and APULIB.SYM. Normally,
you would have to reassemble APULIB.MAC with your I/O ports, but
I've gone through the .MCD file and made a note of all the places
where the I/O port addresses occur (the current values are 08 for
the data port, and 09 for the status port). If worse comes to
worse, you can always patch the .MCD file with your favourite hex
editor.
Data port offsets (in hex): 0060, 0082, 00A0, 00C0, 00CD, 00FE
Control port offsets(in hex): 0130, 0132
If you load APULIB.MCD into memory with DDT or the like, add 100
hex to the offsets to get their locations in memory.
My sincere thanks to David Goodenough for pointing me towards
this chip.
And of course, APUlib and MathLib, both original works by me, are
released to the public domain. Do with them as you like, people.
Happy APUing!
Wayne Hortensius
August 31, 1990
8/31/90 Update notes
In spite of a couple of mistakes in APULIB.MAC (one benign and
one nasty), I've been happily using these routines for the
better part of a year now. Apart from fixing the two mistakes,
some code speedups have also been incorporated in this release.
My UUCP address is absent from this release as Canada Remote
Systems has bit the dust, and so I now longer have access to
Usenet.