[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] F-CPU vs ALPHA



hi !

Michael Riepe wrote:
> On Tue, Aug 14, 2001 at 10:43:12AM +0200, Yann Guidon wrote:
> >
> > i wrote this stuff tonight.
> Some remarks:
cool :-)

> [...]
> > //  There is no integer divide instruction.
> > Michael is (was ?) trying to make a divide unit, i understand his pain.
> > but the instruction is defined in the opcode map anyway, even if it must
> > be emulated.
> 
> I'm still working on it.  The real pain is that the IDU duplicates so
> many things that are already present (parts of the INC, SHL and ASU
> units are needed for operand normalization and result postprocessing)
> but I can't drop them and make the divider core a separate instruction
> because it uses too many registers (3r4w).

3R4W ? can you explain, or write a draft, about this unit ?
i'm surprised.

if there is a painful problem, maybe we can go the route of half-emulation,
like with the emulation of multiplies with a sequence of partial
instructions ?

> [...]
> > //  The floating operate instructions include four complete sets of VAX and
> > //  IEEE arithmetic, plus conversions between float and integer.
> >
> > IEEE only is goinf to be supported, in 32, 40, 64, 80 bit modes.
> > 32 and 64 bit versions are preferred of course.
> 
> We don't have any registers that can hold 80-bit floating-point
> values,
i told you, i was tired (me too :-))
on top of that, it goes against the idea of SIMD.

> and I'd rather add a `tiny float' data type (1+4+11 bits)
?

> for low-precision operations than implement 40-bit FP (which is IMHO
> rather useless, and a waste of space).
are tiny floats really useful ?

> > //  There is no floating square root instruction.
> > We intend to provide "seed" generation for accelerating Newton-Raphson
> > computations of divide and SQRT.
> In the FC0, we can probably get away with some bit-shuffling (hardwired).
> For sqrt(x), divide the exponent by 2 (right shift); for 1/x, invert it.

believe me, if this was so simples, others would do it already.
The problems is that this trick "only" does not provide enough precision.

The strategy in most computers is not to run the NR iterations inside
a loop, but to emit a fixed number of unrolled instructions.
The "seed" must be constructed in such a way that when the specified number
of iteration is executed, we get at least a certain precision/acuracy.

For example, imagine we have a table that yields a "seed" for 8 acurate
bits (in addition to a correct exponenent). A single run will give 16 acurate
bits then another will give 32 bits of precision. if you work with IEEE floats,
the compiler will generate 2 optimized unrolled iteration bodies.
If you work with IEEE long reals, it will generate 3 iterations.

If you "only" do an exponent adjustment,
 1) you won't get a precise garanteed minimum acuracy of the approximation
 2) you will thus have to put one NR iteration in a while(){} loop
    and if you work with SIMD float, the slow convergence of one number
    will keep  the whole SIMD packet from being used even though other
    numbers are "ready" (converged).

of course, if you don't mind, you can "forget" or "simplify" the seed LUT.
but the program/compiler must be adapted in consequence.

> [...]
> > //  Instead, the byte shifting and masking is done in Alpha with normal 64-bit
> > //  register-to-register instructions, crafted to keep the sequences short.
> >
> > hmmm, this family of instructions is often very useful anyway !
> 
> Indeed... but why limit it to bytes?  The ia64 (aka Itanium) has dedicated
> insert/extract instructions that work with arbitrary bit fields.

concerning the f-cpu, as long as we don't have a final specification of the SHL
unit, we can't be sure.

> [...]
> > //  If precise arithmetic exceptions are desired, trap barrier instructions can
> > //  be explicitly inserted in the program to force traps to be delivered at
> > //  specific points.
> >
> > no need for them. But there is a "barrier" instruction that can do that anyway
> > (flushes the pipeline, "serialise" the issue, wait until all operations are completed...)
> 
> Shouldn't we drop the serialize instruction and make it a special register?

that would be an interesting idea. we have already defined "serialize" in the instruction
set, however. but using the SR would probably help reduce a very little overhead
caused by the instruction. It frees one opcode byte ("only").
i'll check whatever i can do.

> [...]
> > I am still wondering if PALcode is such a good idea.
> > We're used for a long time to rewrite the trap handlers for each new computer.
> > Maybe this idea came from the VMS transition constraint, but there is
> > no need of this stuff in the F-CPU.
> We can implement a `PALcall' SR if we have to.
and what would we do with it ? :-)

the PALcode is probably a good idea for the Alpha but from a F-CPU POV, it goes
against our logic.
In the DEC world, you buy one expensive ALPHA computer and every 6 month or so,
the maintainance service sends you a pile of CDs with updates, including PALcode
changes/patches. Nice, because there are several ALPHA families and even more
members (ie : 21064, 21064A, 21066, 21068, ...) so Digital can manage to make
one PALcode revision for every member.
In the F-CPU world, the CPU itself is a "commodity" : you can buy a bunch of
(cheap or not, according to what you can access, want, have, and can pay)
chips, which can have different versions and come from a lot of different
vendors/funders. In the current PC industry, a new CPU version comes every 6 months
in average, maybe the F-CPU will shorten this to even less (due to increased
competitivity and open sourcing). In this condition, it is not realistic
to have PALcode : if one vendor goes out of business and takes the PALcode
away (even though he should release the source code under GPL in the "ideal
case"), you won't be able to use the chip. The PALcode becomes like a key
and if you don't have it, you won't be able to make your computer work.
On top of that, i imagine that it's the place where companies that don't
want to play the "open" game will put "proprietary feature" in order
to make others captive.

Maybe some points are wrong but i believe that PALcode is not a good
idea here. Maybe i don't understand completely what the PALcode philosophy
is, but IMHO it does what the OS should do. PALcode "hides" stuffs from the
OS and enables the vendor to include "non documented" or "chip-specific"
features which break the architecture standards.

And now, i think i'll update my text with these comments :-)

>  Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
>  "All I wanna do is have a little fun before I die"
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/