[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (m) Re: [f-cpu] Re: Floating-Point?



On Fri, Aug 17, 2001 at 04:24:25AM +0200, Yann Guidon wrote:
[...]
> > Mapped to (2's complement signed) integer, the order is as follows
> > (assuming IEEE `single' format):
> > 
> >         s  eeeee  fff  meaning
> >         =====================
> >         0     ff   >0  NAN
> >         0     ff   =0  +INF
> >         0  01-fe  any  positive normal
> >         0     00   >0  positive subnormal
> >         0     00   =0  +0
> >         1     ff   >0  NAN
> >         1     ff   =0  -INF
> >         1  01-fe  any  negative normal
> >         1     00   >0  negative subnormal
> >         1     00   =0  -0
> > 
> > That's obviously not correct.
> where ? negative normal/subnormal ? +0/-0 ?
> we'll have to ask some experts...

The point is: when compared as 2's complement signed integers, the
order of FP values would be

	-0 < -x < -INF < +0 < +x < +INF

which is certainly not correct.

> >  I didn't find .zero or a negation suffix, though.
> it's a higher-level trick :-)
> the syntax defines the usual émove src, dest"
> and "cmove cond.y == x, src, dest" for conditional moves.

Sorry, but that's plain ugly :(

> i have chosen to explicitely use "cmove" because i feared
> a shift-reduce problem.

Not in my parser :)

The trick is not to add grammar rules for individual instructions, but
to use a general rule like

    instruction : opcode
                | opcode operand_list

    operand_list : operand
                 | operand_list ',' operand

    operand : '$' expression
            | register

or something like that, and let the semantic actions do the rest (asm
directives have individual rules, but not machine instructions).

BTW: to avoid ambiguities, either expressions or register names must be
prefixed with a special character; otherwise it's impossible to tell the
difference between the symbol `r63' and the register `r63' -- and symbols
like `r63' *must* be allowed (otherwise, compilers may have trouble).
Prefixing the expression is the lesser evil, IMHO (register names are
used more often than expressions -- and I never liked the `%eax' syntax ;).

> "y" can be nothing, ".lsb" and ".msb"
> and "x" can be 0 or 1.

what about NaN?  `r3.nan == 0' would be pretty confusing, and
`r3 == nan' is yet another syntactic special case.

> There is an inconsistency for
> "cmove r==1" (i'll fix that with inverting everything :*D)
> but otherwise it is written "r.lsb==0", "r==0", "r.msb==0"
> and idem for 1. the x value gives the negation directly ;-)

Why not `r != 0', `r.lsb != 0' and `r.msb != 0'?  Or use a syntax similar
to C (or VHDL):

	 !=0      =0     =0 (alt.)
	r       !r       not r
	r.nan   !r.nan   not r.nan
	r.lsb   !r.lsb   not r.lsb
	r.msb   !r.msb   not r.msb

> and the absence of specifier means the whole register, hence
> zero or not.
> i know, it's lame, but it works :-) and it's less confusing than
> -nm for example.

If you really intend to do that, please do it a little different: use
`if <condition> <instruction>', where <instruction> can be move or jmp.
It's much cleaner that way (from both the stylistic and the syntactic
point of view).  If your asm directives don't start with "." (that is,
`.if' instead of `if'), you could use `cond[itional]' as the keyword.
Or use ia64-style syntax:

	(r3.lsb) move r2, r1
	(!r3) jmp r2, r1

Anyway, this is still a special case that has to be handled in the
parser :(

[...]
> >  I could
> > also use `n' for `NAN', but that makes the instruction encoder a little
> > more complex and is probably misleading anyway.  But I'll think it over.
> > [...]
> 
> 'i' for 'i'nvalid ?

Also possible.  Thank you :)

> > > >         - Can we please drop the `a' from `jmpa'?
> > > probably. i don't remember where it comes from, probably from Mathias.
> > It was meant to indicate an (A)bsolute jump.  Since that's the only
> > one the F-CPU knows, the suffix is redundant (and it looks too much
> > like `jump always').
> ok let's go.

Done :)

> > > >         - When calling functions through pointers, it would be nice to
> > > >           be able to tell the F-CPU *a priori* that a register contains a
> > > >           code address.  While this can be done with an explicit prefetch
> > > >           (load to r0) for data pointers, there is no way to specify that
> > > >           a register contains a code address that the CPU will have to
> > > >           visit soon.
> > > what about loadaddr(i) ?
> > Not useful.
> that's sad.
> 
> Maybe... we can use one or two bits from the add/sub instructions
> so they validate and prefetch the resulting pointer ?

Umh... not really.  There is not always an add/sub instruction in the
chain (expecially not when a function's address is read from memory).

> >  Imagine a C++ `member' function -- the first (hidden)
> > argument is a pointer to the class, the class contains a pointer to
> > the virtual method table, and the VMT contains pointers to all the
> > members.
> i can understand english, french and german,
> i know a few words in several european langages such as
> italian, spanish, portugese, tchech, russian, polish, latvian...
> but i still believe that C++ comes from Mars.

Or worse ;)  But C++ is not the only language that does things like
that.

> >  To call another member, you have to
> > 
> >         // let r1 point to the current instance
> >         load r1, r2                             // get pointer to VMT (usually stored at offset 0)
> >         add $offset, r2, r3             // VMT slot address
> >         load r3, r4                             // get member's address
> >         // argument passing omitted
> >         jmp r4, r5                              // call member
> 
> what an ugly code :-(
> is all this necessary ???

I think so.  How would *you* do it?

> There is one "programming trick" in FC0 :
> you use a "software barrel of 8 registers" for data and another
> "barrel" for instructions. when you want to access data,
> use post-incremented form as possible, with an increment
> such as it points to the data that you will access in 8 L/S
> instructions. the code above is extremely short-sighted and
> utterly underefficient.

Since it is lacking any context, there is no other way, IMHO.  Of course
you can keep the VMT pointer in a register if you use it more than once.
If you call a function from inside a loop, you can also keep its address
in a register (and you probably won't have the prefetch problem when
the loop is executed the second time).  But that's not always the case.

> > Both r2 (data pointer) and r4 (code pointer) are loaded from memory,
> > and r3 (also a data pointer) is calculated from r2 and a constant
> > (which probably has to be loadcons'd if it is too large).  But there
> > is no loadaddr[i] in that sequence, and the CPU has no way to tell
> > that r4 points to a function that is going to be called real soon
> > (that is, its code should be prefetched to avoid a stall).
> maybe we need a sort of loadaddr(i) without PC ?
> i have the feeling that i miss something here...

It need not be a load (or move) operation, just a no-op that hints the
CPU to prefetch code.  A little like `cachemm', but more lightweight
and `for-the-moment' (cachemm is supposed to change the caching
parameters permanently, isn't it?).

> > > >           The same is true when an absolute code address is
> > > >           obtained via loadcons (which will probably be the common idiom
> > > >           when a function in another object file is called, unless jump
> > > >           tables are used -- which points us back to the `code pointer
> > > >           in register' problem, again).
> > > if the data/code is not explicitely prefetched, the code will still work,
> > > but with the "late fetch" penalty : the CPU will perform the "fetch"
> > > operation automatically while stalling the decode stage.
> > The point is that one cannot prefetch code.  `load r4, r0' will prefetch
> > the code into the D-cache, not the I-cache.
> something seems broken here.

The question is what: the argument, or the design? ;)

> > > >           To cut a long story short: I'd like to have an instruction
> > > >           that explicitly `tags' a register as a pointer, and probably
> > > >           initiates a prefetch cycle (for code or data, depending on
> > > >           the instruction's flags).  It may or may not move data from
> > > >           one register to another (one idea I had was a `pointer move'
> > > >           instruction); if it does, it might be a good idea to let it
> > > >           participate in address calculation (i.e. let it be able to
> > > >           add two operands, like the `lea' instruction on Intel CPUs).
> > > this is what loadaddr is meant to do.
> > 
> > But it only works with PC-relative addressing.  While that's fine for
> > conditional branches, loops and local function calls, inter-module calls
> > cannot use it because the target address is resolved at link time (and
> > what's worse: it may be too far away for a 16-bit displacement unless
> > you limit the text segment size to 64 KB -- which is not a realistical
> > value at all).
> 
> what's your conclusion, doctor ?

Add an instruction that says `i will soon jump to <reg>'.  Nothing
more, nothing less.

> > > >         - Let's clarify the suffix order, e.g. like this (? means the
> > > >           suffix is currently unused, and its name is unassigned):
> > [...]
> > > wow, what a work :-)
> > 
> > You should see the complete flex source for the encoder; this is only
> > a small snippet ;)
> 
> i WANT to see it ;-)

Don't worry, you will :)

[...]
> > > >           Since there are some unused flags, another variant might be
> > > >           interesting: `storem r2, r1', where r2 is used as a mask
> > > >           (bit <n> == 1 means "load/store register <n>"), and r1 is the
> > > >           address of the source/destination memory area (which must be
> > > >           big enough to hold all registers, just like the CMB).
> > >
> > > this mask idea is interesting. It remembers me of the 6809 by the way :-)
> > > however it means that 4x loadcons might be necessary (in arbitrary cases)
> > > to backup the whole (non-contiguous) register set.
> > 
> > You can still use loadm/storem if you have only two or three contiguous
> > register blocks to save/restore.  The mask is useful when a) the registers
> > to save are too scattered or b) not known at compile time (emulators,
> > debuggers, ...), and you don't want to loop over the whole register bank
> > (that is, 63 times) and loadm/storem a single register each time.
> 
> in such a 'scattered' case, why not use a loop with a get/put
> to the register set ?

Loops are evil ;)

Of course, you can also use regular load/store with auto-increment to
save/restore registers (1 instruction per copy, plus setup).

> > > >           Maybe it would be wiser to put the memory address into the
> > > >           rightmost operand in *all* memory operations (load, store,
> > > >           cachemm, loadm and storem).  Some instructions will always
> > > >           have the wrong operand order, though.
> > > right. but i still prefer to leave the "pointer" field in the middle,
> > > because it is the most usual case where it makes sense (at least for myself).
> > Ok, then let it stay that way.  After all, it's matter of taste, and
> > it's MUCH easier to create machine code if the order of arguments
> > in an assembler instruction corresponds to the order of slots in the
> > instruction word.
> right. But using BISON, one can easily swap operand orders.
> replace "$2" with "$4" and vice versa.

I'm doing that differently.  There are too many instruction variants to
handle each of them in its own grammar rule, and I wanted the parser to
be more general anyway (that is, portable to other architectures).

> > A simulator/debugger can really benefit from an elaborated binary format
> > like ELF.  It will have access to symbol names, line numbers and symbolic
> > debugging information (if the compiler/assembler supports it)...
> > 
> > Hexdumps? Nein danke :)
> 
> "elaborated" sometimes becomes "utterly complex"...

ELF is pretty clean once you're used to it (except the part that is
responsible for dynamic linking -- but we don't need that from the
beginning).  And it's the binary format of choice for Linux.

> > [...]
> > > expansion/reduction is another problem but i think that the SHL unit can do this,
> > > too.
> > 
> > FP expansion is trivial, but FP reduction may trigger exceptions (or at
> > least need rounding), and therefore has to be handled separately.
> couldn't reduction be handled in a FP unit such as fadd ?

There will probably be a `normalize-and-round' subunit that is shared
by the FP units (or duplicated, for speed).

[...]
> > If there is enough room in the SHL unit, we can add a little logic that
> > does it in one operation.  I suggest we define the `widen' instruction
> > as follows:
> 
> the 6809 defined a funny opcode : "sex" for "sign extension" :*)
> now i understan much more about the meaning of Life :-D

Try `man sex' (if you have emacs installed ;)

> >         [s]widenb[s][.b|.d|.q] r2, r1   //  8->xx
> >         [s]widenw[s][.b|.d|.q] r2, r1   // 16->xx
> >         [s]widenq[s][.b|.d|.q] r2, r1   // 32->xx
> >         [s]widen[s][.b|.d|.q]  r2, r1   // 64->xx
> > 
> > that is, [.b|.d|.q] refers to the new size, `s-' means SIMD (as usual),
> > and `-s' activates sign extension.  We need only a single opcode (the
> > source size can be encoded in the flag bits -- since the instruction
> > uses only two registers and no immediate operand, we have plenty of them).
> exactly.

Minor correction: the syntax should be [s]widen[s][b|d|q][.b|.d|.q],
that is, -s suffix before -b/-d/-q (source size).  I already added that
to my encoder :)

CU on the bazaar,
-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/