[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] by the way



Hi F-gang,

Yann Guidon wrote:
hi,

gaetan@xeberon.net wrote:

by the way one other question:
ieee defines 4 rounding modes (round to nearest, to + infinity, to -infinity and to Zero), so i put a input in the architecture:
RoundMode: in std_ulogic_vector(1 downto 0); -- 00 : nearest, 01: zero, 10: +inf, 11: -inf
In the fctools package, I currently use this encoding:

	00 -> to nearest/even
	01 -> toward zero
	10 -> toward -infinity
	11 -> toward +infinity

(as far as I remember).

But how will it be implemented higher? special register changed by an instruction? or directly from the instruction?
According to IEEE 754, there is a mode register. AFAI understand, that does not mean that its value can't be overridden in the instruction.

i'm not sure yet, it's a possibility.
one parameter is the patents : i guess that most implementations
use a "special configuration register" for this, and it's likely to
be covered by patents, so we usually chose the other solution.
Or both.

i'll have to check that, but i have no time at this very moment.
it also depends on what room remains in the opcodes.
If there is room, the encoding should be:

00 -> use value from the control word instead
01 -> toward zero
10 -> toward -infinity
11 -> toward +infinity

Or even better, make the translation user programmable (just like the size bits). For the FPU, this means that the two bits select one of four rounding modes defined in the FPU control register. That should probably look like this:

bits meaning
0-2 rounding mode 0
3-5 rounding mode 1
6-8 rounding mode 2
9-11 rounding mode 3
12-63 to be specified

That would allow us to choose from 8 different rounding modes (IEEE 754 knows only 4 modes, but there are others, in particular round-half-up and round-half-down). For FC0, the third mode bit could be made a reserved bit (always 0).

The main reason for choosing this scheme is that a program can be re-run with different rounding modes in effect without recompiling it. See W. Kahan's papers (http://http.cs.berkeley.edu/~wkahan/ieee754status/) for reasons why this is a Good Thing (tm).

The FPU control register should not be a special register, by the way. It should be a separate FP execution unit that executes a special instruction:

fpcntrl [r3, ]r2, r1

with semantics similar to these:

r1 := fpu_control;
fpu_control := (fpu_control & r3) | (r2 & ~r3);

where r3 permits to set only selected bits (default: all bits). That is, you can modify and save the control word with a single instruction, and restore it with another single instruction. That would not be possible with a special register (unless we implement a similar instruction for special registers, which may be a good thing, too).

Other FP units should get a copy of the control register as an additional (hidden) operand. That way, every instruction will work with the current control register value, even if there is an instruction sequence like

fmul r1, r2, r3
fpcntrl r4, r5, r6

that changes the register's value behind the fmul instruction's back. Of course the compiler must not reorder instructions in this case. For ISO C99, "#pragma FENV_ACCESS" should do the trick.

An other thing: how do you plan to manage exception when they occurs? I mean, when a exception occurs (ieee flag clear), the unit will ouput an exception flag, and the ieee standard wants it to stay raised untill an special instruction clear it...
Yep.

when an exception is detected, the pipeline is not filled with
the following instructions, but a handling code is fetched and executed.
Wait a minute.

There are two kinds of exceptions: faults and traps. A fault occurs before the instruction is started, a trap when it is finished. The IEEE exceptions Overflow, Underflow and Inexact are likely to be traps, Divide-by-0 and Invalid may be either kind. But since they depend on the instructions' operands, I suggest to make them traps in FC0, too. The only way to realize faults in FC0 is to mark a faulting operand in the scoreboard (could be used for signalling NaNs, for example).

"flags", particularly for FP and carry etc. create potential problems
in more sophisticated architectures : the famous bottlenecks.
so it is not likely that a single flag will be used.
IEEE 754 mandates a single flag (per exception), AFAIK. That doesn't mean that we can't implement an (additional) per-register exception flag in the scoreboard. When e.g.

fadd r1, r2, r3

raises an exception, r3's flag would be set. We could also use the remaining condition code (which is currently referred to "nan") to check that bit in a program:

jmp.e r3, r2[, r1]

would jump to [r2] if the exception bit for r3 is set. Exception flags could also be made accumulative, that is, they could propagate from the operands to the result (maybe depending on another FP control flag).

Additionally, the instruction *using* the exceptional value could trigger a fault. That would be easier to implement than a trap in the instruction that *produces* it.

however, the register set will also contain a "property bit"
that indicates an error condition and, if the IEEE compliance
flag is not used, the result would be NaN or another "special pseudo-value".
Why not implement the IEEE default action when the "compliance bit" is not set? That is,

Invalid -> return a (quiet) NaN
Divide-by-0 -> return +/- infinity
Overflow -> return +/- infinity
Underflow -> return a denormalized value
Inexact -> return the rounded result

And, of course, raise the exceptions' flags in the (global) FPU status register. That way, we get minimal IEEE compliance from the beginning.

this is necessary to restore the flag after a task switch.
Oh well, I forgot the f*cking task switch. Is it really necessary to jump directly from one task to another without intervention of the OS? In that case, we must save/restore all FPU control and status bits (including bits from the scoreboard) to/from the CMB. Or we have to
"monitor" FPU instructions the way IA-32 processors do it: the first FP instruction after a task switch will fault, giving the OS an opportunity to save the old task's FP status and load the new task's one. The advantage is that tasks which don't do FP at all (which probably is the majority) will not trigger the fault.

Then, a conditional instruction can jump to a specific code on this case :
 if r25.nan jump r12

other ideas welcome.
Well, lots of them. User-mode exception handlers would be a good thing, for example: Save the faulting instruction, its address, its operands' values and its (default) result to special registers, then branch to the address of the handler (defined in another special register). Since the OS will not be involved, FP exception handling could be rather fast.

Michael.

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/