[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [f-cpu] by the way
Yann Guidon wrote:
by the way one other question:
ieee defines 4 rounding modes (round to nearest, to + infinity, to
-infinity and to Zero), so i put a input in the architecture:
RoundMode: in std_ulogic_vector(1 downto 0); -- 00 : nearest, 01:
zero, 10: +inf, 11: -inf
In the fctools package, I currently use this encoding:
00 -> to nearest/even
01 -> toward zero
10 -> toward -infinity
11 -> toward +infinity
(as far as I remember).
According to IEEE 754, there is a mode register. AFAI understand, that
does not mean that its value can't be overridden in the instruction.
But how will it be implemented higher? special register changed by an
instruction? or directly from the instruction?
i'm not sure yet, it's a possibility.
one parameter is the patents : i guess that most implementations
use a "special configuration register" for this, and it's likely to
be covered by patents, so we usually chose the other solution.
If there is room, the encoding should be:
i'll have to check that, but i have no time at this very moment.
it also depends on what room remains in the opcodes.
00 -> use value from the control word instead
01 -> toward zero
10 -> toward -infinity
11 -> toward +infinity
Or even better, make the translation user programmable (just like the
size bits). For the FPU, this means that the two bits select one of four
rounding modes defined in the FPU control register. That should probably
look like this:
0-2 rounding mode 0
3-5 rounding mode 1
6-8 rounding mode 2
9-11 rounding mode 3
12-63 to be specified
That would allow us to choose from 8 different rounding modes (IEEE 754
knows only 4 modes, but there are others, in particular round-half-up
and round-half-down). For FC0, the third mode bit could be made a
reserved bit (always 0).
The main reason for choosing this scheme is that a program can be re-run
with different rounding modes in effect without recompiling it. See W.
Kahan's papers (http://http.cs.berkeley.edu/~wkahan/ieee754status/) for
reasons why this is a Good Thing (tm).
The FPU control register should not be a special register, by the way.
It should be a separate FP execution unit that executes a special
fpcntrl [r3, ]r2, r1
with semantics similar to these:
r1 := fpu_control;
fpu_control := (fpu_control & r3) | (r2 & ~r3);
where r3 permits to set only selected bits (default: all bits). That is,
you can modify and save the control word with a single instruction, and
restore it with another single instruction. That would not be possible
with a special register (unless we implement a similar instruction for
special registers, which may be a good thing, too).
Other FP units should get a copy of the control register as an
additional (hidden) operand. That way, every instruction will work with
the current control register value, even if there is an instruction
fmul r1, r2, r3
fpcntrl r4, r5, r6
that changes the register's value behind the fmul instruction's back. Of
course the compiler must not reorder instructions in this case. For ISO
C99, "#pragma FENV_ACCESS" should do the trick.
An other thing: how do you plan to manage exception when they occurs?
I mean, when a exception occurs (ieee flag clear), the unit will ouput
an exception flag, and the ieee standard wants it to stay raised
untill an special instruction clear it...
Wait a minute.
when an exception is detected, the pipeline is not filled with
the following instructions, but a handling code is fetched and executed.
There are two kinds of exceptions: faults and traps. A fault occurs
before the instruction is started, a trap when it is finished. The IEEE
exceptions Overflow, Underflow and Inexact are likely to be traps,
Divide-by-0 and Invalid may be either kind. But since they depend on the
instructions' operands, I suggest to make them traps in FC0, too. The
only way to realize faults in FC0 is to mark a faulting operand in the
scoreboard (could be used for signalling NaNs, for example).
IEEE 754 mandates a single flag (per exception), AFAIK. That doesn't
mean that we can't implement an (additional) per-register exception flag
in the scoreboard. When e.g.
"flags", particularly for FP and carry etc. create potential problems
in more sophisticated architectures : the famous bottlenecks.
so it is not likely that a single flag will be used.
fadd r1, r2, r3
raises an exception, r3's flag would be set. We could also use the
remaining condition code (which is currently referred to "nan") to check
that bit in a program:
jmp.e r3, r2[, r1]
would jump to [r2] if the exception bit for r3 is set. Exception flags
could also be made accumulative, that is, they could propagate from the
operands to the result (maybe depending on another FP control flag).
Additionally, the instruction *using* the exceptional value could
trigger a fault. That would be easier to implement than a trap in the
instruction that *produces* it.
however, the register set will also contain a "property bit"
Why not implement the IEEE default action when the "compliance bit" is
not set? That is,
that indicates an error condition and, if the IEEE compliance
flag is not used, the result would be NaN or another "special
Invalid -> return a (quiet) NaN
Divide-by-0 -> return +/- infinity
Overflow -> return +/- infinity
Underflow -> return a denormalized value
Inexact -> return the rounded result
And, of course, raise the exceptions' flags in the (global) FPU status
register. That way, we get minimal IEEE compliance from the beginning.
Oh well, I forgot the f*cking task switch. Is it really necessary to
jump directly from one task to another without intervention of the OS?
In that case, we must save/restore all FPU control and status bits
(including bits from the scoreboard) to/from the CMB. Or we have to
this is necessary to restore the flag after a task switch.
"monitor" FPU instructions the way IA-32 processors do it: the first FP
instruction after a task switch will fault, giving the OS an opportunity
to save the old task's FP status and load the new task's one. The
advantage is that tasks which don't do FP at all (which probably is the
majority) will not trigger the fault.
Well, lots of them. User-mode exception handlers would be a good thing,
for example: Save the faulting instruction, its address, its operands'
values and its (default) result to special registers, then branch to the
address of the handler (defined in another special register). Since the
OS will not be involved, FP exception handling could be rather fast.
Then, a conditional instruction can jump to a specific code on this case :
if r25.nan jump r12
other ideas welcome.
To unsubscribe, send an e-mail to email@example.com with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/