[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: (m) Re: [f-cpu] Re: Floating-Point?
On Fri, Aug 17, 2001 at 04:24:25AM +0200, Yann Guidon wrote:
[...]
> > Mapped to (2's complement signed) integer, the order is as follows
> > (assuming IEEE `single' format):
> >
> > s eeeee fff meaning
> > =====================
> > 0 ff >0 NAN
> > 0 ff =0 +INF
> > 0 01-fe any positive normal
> > 0 00 >0 positive subnormal
> > 0 00 =0 +0
> > 1 ff >0 NAN
> > 1 ff =0 -INF
> > 1 01-fe any negative normal
> > 1 00 >0 negative subnormal
> > 1 00 =0 -0
> >
> > That's obviously not correct.
> where ? negative normal/subnormal ? +0/-0 ?
> we'll have to ask some experts...
The point is: when compared as 2's complement signed integers, the
order of FP values would be
-0 < -x < -INF < +0 < +x < +INF
which is certainly not correct.
> > I didn't find .zero or a negation suffix, though.
> it's a higher-level trick :-)
> the syntax defines the usual émove src, dest"
> and "cmove cond.y == x, src, dest" for conditional moves.
Sorry, but that's plain ugly :(
> i have chosen to explicitely use "cmove" because i feared
> a shift-reduce problem.
Not in my parser :)
The trick is not to add grammar rules for individual instructions, but
to use a general rule like
instruction : opcode
| opcode operand_list
operand_list : operand
| operand_list ',' operand
operand : '$' expression
| register
or something like that, and let the semantic actions do the rest (asm
directives have individual rules, but not machine instructions).
BTW: to avoid ambiguities, either expressions or register names must be
prefixed with a special character; otherwise it's impossible to tell the
difference between the symbol `r63' and the register `r63' -- and symbols
like `r63' *must* be allowed (otherwise, compilers may have trouble).
Prefixing the expression is the lesser evil, IMHO (register names are
used more often than expressions -- and I never liked the `%eax' syntax ;).
> "y" can be nothing, ".lsb" and ".msb"
> and "x" can be 0 or 1.
what about NaN? `r3.nan == 0' would be pretty confusing, and
`r3 == nan' is yet another syntactic special case.
> There is an inconsistency for
> "cmove r==1" (i'll fix that with inverting everything :*D)
> but otherwise it is written "r.lsb==0", "r==0", "r.msb==0"
> and idem for 1. the x value gives the negation directly ;-)
Why not `r != 0', `r.lsb != 0' and `r.msb != 0'? Or use a syntax similar
to C (or VHDL):
!=0 =0 =0 (alt.)
r !r not r
r.nan !r.nan not r.nan
r.lsb !r.lsb not r.lsb
r.msb !r.msb not r.msb
> and the absence of specifier means the whole register, hence
> zero or not.
> i know, it's lame, but it works :-) and it's less confusing than
> -nm for example.
If you really intend to do that, please do it a little different: use
`if <condition> <instruction>', where <instruction> can be move or jmp.
It's much cleaner that way (from both the stylistic and the syntactic
point of view). If your asm directives don't start with "." (that is,
`.if' instead of `if'), you could use `cond[itional]' as the keyword.
Or use ia64-style syntax:
(r3.lsb) move r2, r1
(!r3) jmp r2, r1
Anyway, this is still a special case that has to be handled in the
parser :(
[...]
> > I could
> > also use `n' for `NAN', but that makes the instruction encoder a little
> > more complex and is probably misleading anyway. But I'll think it over.
> > [...]
>
> 'i' for 'i'nvalid ?
Also possible. Thank you :)
> > > > - Can we please drop the `a' from `jmpa'?
> > > probably. i don't remember where it comes from, probably from Mathias.
> > It was meant to indicate an (A)bsolute jump. Since that's the only
> > one the F-CPU knows, the suffix is redundant (and it looks too much
> > like `jump always').
> ok let's go.
Done :)
> > > > - When calling functions through pointers, it would be nice to
> > > > be able to tell the F-CPU *a priori* that a register contains a
> > > > code address. While this can be done with an explicit prefetch
> > > > (load to r0) for data pointers, there is no way to specify that
> > > > a register contains a code address that the CPU will have to
> > > > visit soon.
> > > what about loadaddr(i) ?
> > Not useful.
> that's sad.
>
> Maybe... we can use one or two bits from the add/sub instructions
> so they validate and prefetch the resulting pointer ?
Umh... not really. There is not always an add/sub instruction in the
chain (expecially not when a function's address is read from memory).
> > Imagine a C++ `member' function -- the first (hidden)
> > argument is a pointer to the class, the class contains a pointer to
> > the virtual method table, and the VMT contains pointers to all the
> > members.
> i can understand english, french and german,
> i know a few words in several european langages such as
> italian, spanish, portugese, tchech, russian, polish, latvian...
> but i still believe that C++ comes from Mars.
Or worse ;) But C++ is not the only language that does things like
that.
> > To call another member, you have to
> >
> > // let r1 point to the current instance
> > load r1, r2 // get pointer to VMT (usually stored at offset 0)
> > add $offset, r2, r3 // VMT slot address
> > load r3, r4 // get member's address
> > // argument passing omitted
> > jmp r4, r5 // call member
>
> what an ugly code :-(
> is all this necessary ???
I think so. How would *you* do it?
> There is one "programming trick" in FC0 :
> you use a "software barrel of 8 registers" for data and another
> "barrel" for instructions. when you want to access data,
> use post-incremented form as possible, with an increment
> such as it points to the data that you will access in 8 L/S
> instructions. the code above is extremely short-sighted and
> utterly underefficient.
Since it is lacking any context, there is no other way, IMHO. Of course
you can keep the VMT pointer in a register if you use it more than once.
If you call a function from inside a loop, you can also keep its address
in a register (and you probably won't have the prefetch problem when
the loop is executed the second time). But that's not always the case.
> > Both r2 (data pointer) and r4 (code pointer) are loaded from memory,
> > and r3 (also a data pointer) is calculated from r2 and a constant
> > (which probably has to be loadcons'd if it is too large). But there
> > is no loadaddr[i] in that sequence, and the CPU has no way to tell
> > that r4 points to a function that is going to be called real soon
> > (that is, its code should be prefetched to avoid a stall).
> maybe we need a sort of loadaddr(i) without PC ?
> i have the feeling that i miss something here...
It need not be a load (or move) operation, just a no-op that hints the
CPU to prefetch code. A little like `cachemm', but more lightweight
and `for-the-moment' (cachemm is supposed to change the caching
parameters permanently, isn't it?).
> > > > The same is true when an absolute code address is
> > > > obtained via loadcons (which will probably be the common idiom
> > > > when a function in another object file is called, unless jump
> > > > tables are used -- which points us back to the `code pointer
> > > > in register' problem, again).
> > > if the data/code is not explicitely prefetched, the code will still work,
> > > but with the "late fetch" penalty : the CPU will perform the "fetch"
> > > operation automatically while stalling the decode stage.
> > The point is that one cannot prefetch code. `load r4, r0' will prefetch
> > the code into the D-cache, not the I-cache.
> something seems broken here.
The question is what: the argument, or the design? ;)
> > > > To cut a long story short: I'd like to have an instruction
> > > > that explicitly `tags' a register as a pointer, and probably
> > > > initiates a prefetch cycle (for code or data, depending on
> > > > the instruction's flags). It may or may not move data from
> > > > one register to another (one idea I had was a `pointer move'
> > > > instruction); if it does, it might be a good idea to let it
> > > > participate in address calculation (i.e. let it be able to
> > > > add two operands, like the `lea' instruction on Intel CPUs).
> > > this is what loadaddr is meant to do.
> >
> > But it only works with PC-relative addressing. While that's fine for
> > conditional branches, loops and local function calls, inter-module calls
> > cannot use it because the target address is resolved at link time (and
> > what's worse: it may be too far away for a 16-bit displacement unless
> > you limit the text segment size to 64 KB -- which is not a realistical
> > value at all).
>
> what's your conclusion, doctor ?
Add an instruction that says `i will soon jump to <reg>'. Nothing
more, nothing less.
> > > > - Let's clarify the suffix order, e.g. like this (? means the
> > > > suffix is currently unused, and its name is unassigned):
> > [...]
> > > wow, what a work :-)
> >
> > You should see the complete flex source for the encoder; this is only
> > a small snippet ;)
>
> i WANT to see it ;-)
Don't worry, you will :)
[...]
> > > > Since there are some unused flags, another variant might be
> > > > interesting: `storem r2, r1', where r2 is used as a mask
> > > > (bit <n> == 1 means "load/store register <n>"), and r1 is the
> > > > address of the source/destination memory area (which must be
> > > > big enough to hold all registers, just like the CMB).
> > >
> > > this mask idea is interesting. It remembers me of the 6809 by the way :-)
> > > however it means that 4x loadcons might be necessary (in arbitrary cases)
> > > to backup the whole (non-contiguous) register set.
> >
> > You can still use loadm/storem if you have only two or three contiguous
> > register blocks to save/restore. The mask is useful when a) the registers
> > to save are too scattered or b) not known at compile time (emulators,
> > debuggers, ...), and you don't want to loop over the whole register bank
> > (that is, 63 times) and loadm/storem a single register each time.
>
> in such a 'scattered' case, why not use a loop with a get/put
> to the register set ?
Loops are evil ;)
Of course, you can also use regular load/store with auto-increment to
save/restore registers (1 instruction per copy, plus setup).
> > > > Maybe it would be wiser to put the memory address into the
> > > > rightmost operand in *all* memory operations (load, store,
> > > > cachemm, loadm and storem). Some instructions will always
> > > > have the wrong operand order, though.
> > > right. but i still prefer to leave the "pointer" field in the middle,
> > > because it is the most usual case where it makes sense (at least for myself).
> > Ok, then let it stay that way. After all, it's matter of taste, and
> > it's MUCH easier to create machine code if the order of arguments
> > in an assembler instruction corresponds to the order of slots in the
> > instruction word.
> right. But using BISON, one can easily swap operand orders.
> replace "$2" with "$4" and vice versa.
I'm doing that differently. There are too many instruction variants to
handle each of them in its own grammar rule, and I wanted the parser to
be more general anyway (that is, portable to other architectures).
> > A simulator/debugger can really benefit from an elaborated binary format
> > like ELF. It will have access to symbol names, line numbers and symbolic
> > debugging information (if the compiler/assembler supports it)...
> >
> > Hexdumps? Nein danke :)
>
> "elaborated" sometimes becomes "utterly complex"...
ELF is pretty clean once you're used to it (except the part that is
responsible for dynamic linking -- but we don't need that from the
beginning). And it's the binary format of choice for Linux.
> > [...]
> > > expansion/reduction is another problem but i think that the SHL unit can do this,
> > > too.
> >
> > FP expansion is trivial, but FP reduction may trigger exceptions (or at
> > least need rounding), and therefore has to be handled separately.
> couldn't reduction be handled in a FP unit such as fadd ?
There will probably be a `normalize-and-round' subunit that is shared
by the FP units (or duplicated, for speed).
[...]
> > If there is enough room in the SHL unit, we can add a little logic that
> > does it in one operation. I suggest we define the `widen' instruction
> > as follows:
>
> the 6809 defined a funny opcode : "sex" for "sign extension" :*)
> now i understan much more about the meaning of Life :-D
Try `man sex' (if you have emacs installed ;)
> > [s]widenb[s][.b|.d|.q] r2, r1 // 8->xx
> > [s]widenw[s][.b|.d|.q] r2, r1 // 16->xx
> > [s]widenq[s][.b|.d|.q] r2, r1 // 32->xx
> > [s]widen[s][.b|.d|.q] r2, r1 // 64->xx
> >
> > that is, [.b|.d|.q] refers to the new size, `s-' means SIMD (as usual),
> > and `-s' activates sign extension. We need only a single opcode (the
> > source size can be encoded in the flag bits -- since the instruction
> > uses only two registers and no immediate operand, we have plenty of them).
> exactly.
Minor correction: the syntax should be [s]widen[s][b|d|q][.b|.d|.q],
that is, -s suffix before -b/-d/-q (source size). I already added that
to my encoder :)
CU on the bazaar,
--
Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
"All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/