[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Re: Floating-Point?



Michael Riepe a écrit :
> 
> On Wed, Aug 15, 2001 at 10:22:11AM +0200, Yann Guidon wrote:
> [...]
> > > SIMD is IMHO not reasonable for the FP units.
> > in what context are you speaking ?
> 
> I mean: I think it's unreasonable to build *variable-size* FP units.
> There are too many special cases to consider -- rounding, exceptions,
> infinities and NANs, ... (ok, go blame IEEE for it ;)
> 
> > > A reasonable approach is
> > > to build a set of pipelined 64-bit FP units, and then issue the 32-bit
> > > operations in two consecutive cycles.
> > that's vectoring, then. Scheduling might become more complex,
> > in situations such as chaining for example.
> 
> Not if it's "hidden" inside the EU.
> 
> > I have nothing to object to that, but
> >  - 1) currently we have no FP unit
> >  - 2) SIMD already works well (when it does)
> >  - 3) vectoring will be used in another core because FC0 would require too much changes
> >  - 4) if you have 1 FP unit, the hardest is done : you can duplicate it :-P
> 
> If you have enough room.  Do you have an idea how big the FP unit will be?
> 

It's big but not as a 256 Ko SRAM cache. So it will be the choice of the
user. Duplicate big unit could save a lot of computing power (because
you make a single run).

For your 16 bit fp processing, could not be better to use log number ?
For audio processing it could be enough (no DC stuff).

For the following text, it could be nice to update quickly the manual.
If whygee is ok.

nicO

> > > BTW: I think we need another instruction that converts 32-bit FP to 64-bit
> > > and vice versa (and maybe also does the mix/expand/sdup thingy for FP).
> >
> > geez, the instruction set in the current version of the manual needs a big rework...
> 
> Yep.  There are a handful of inconsistencies, typing errors, missing
> parts etc. in it.  Major things I've found so far:
> 
>         - The manual doesn't state whether `modi' is a signed operation
>           I suggest it should be signed (like `divi')
> 
>         - Complement `abs' with `nabs' (negative absolute) for
>           symmetry, and to avoid the `sign surprise' when the argument
>           is -2**(chunksize-1)
> 
>         - The syntax for the rounding mode (`l2int', `f2int') is not
>           specified. I suggest to use the following syntax:
> 
>                 l2int[r|t|f|c]
>                 f2int[r|t|f|c][x]
> 
>           with these meanings:
> 
>                 -r (round)      round to nearest (default)
>                 -t (trunc)      round towards zero
>                 -f (floor)      round towards -infinity
>                 -c (ceil)       round towards +infinity
> 
>         - `int2f' and `int2l' also need rounding modes because both
>           conversions may result in precision loss if the integer operand
>           has a large value.
> 
>         - `bitop[s|c|x|t]i' should be `bitopi[s|c|x|t]' (`i' is NOT a suffix!)
> 
>         - Assign four opcodes for bitop[i] and increase the imm6 operand
>           to imm8 (for consistency with the rop2, shift, rot, bitrevi and
>           loadcons[x] instructions).  Since bitop[i] is a ROP2 instruction,
>           change the function encoding to match that of rop2, that is:
> 
>                 fun  rop2  bitop
>                 ================
>                 000  and   btst
>                 001  andn  bclr
>                 010  xor   bchg
>                 011  or    bset
>                 100  nor   --
>                 101  xnor  --
>                 110  orn   --
>                 111  nand  --
> 
>           I guess we can get the missing four instructions for free,
>           but they aren't really useful.
> 
>         - The description of the ROP2 is obsolete (and the syntax for
>           combine/mux is unspecified) I suggest -o and -a suffixes for
>           combine, and a new `mux' instruction.
> 
>         - For the `andn' and `orn' instructions, the manual must
>           clearly state which operand is inverted.  IMHO, `andni' and
>           `orni' will be almost useless if we invert the leftmost
>           (== immediate) operand (but not completely useless, because
>           the upper bits differ when the chunk size is 16 or more).
> 
>           On the other hand, we could add a flag for sign extension of the
>           immediate operand and invert the middle (== register) operand.
>           Since the function bits have moved to the opcode field, there
>           should be a free flag.
> 
>         - There is no explicit `not' instruction, but users can write
>           `nor r0, r2, r1', `xnor r0, r2, r1' or similar.  Since this
>           may not be obvious, F-CPU assemblers should recognize `not
>           r2, r1' and convert it to one of the other forms internally.
>           The `not' instruction should, however, be documented in the
>           Instruction Set Manual.
> 
>         - In `bitrev[i]', use the formula `r1 = bit_reverse(r2) >> (size-r3-1)'.
>           That will change the useful range for r3 to [size-1;0].  In the
>           current version, it's [size;1] which is pretty ugly.
> 
>           Another possible variant is `r1 = bit_reverse(r2) >> r3', with
>           the same useful range but a nicer default (r3 == r0) which
>           makes the 2-operand short form `bitrev r2, r1' meaningful,
>           but that may cause trouble when the register size is increased
>           beyond 64 bits :(
> 
>         - `flog' and `fexp' should both take only two operands.
>           Remember that (a**b)**c = a**(b*c) = a**(c*b) = (a**c)**b.
>           That is, with a simple multiplication (before fexp / after
>           flog) you get any base you want, and the FP unit probably
>           works better with a fixed base.
> 
>         - We need a level-1 floating-point compare instruction;
>           `cmpl'/`cmple' may work with LNS (if there are no NANs),
>           but not with FP.
> 
>         - The arguments of `store[f]' are reversed (dest, src).  It's
>           ok that way (because it mirrors the `load' instruction) but
>           there should be a BIG FAT WARNING in the manual.
> 
>         - Some immediate instructions may benefit from a non-linear
>           encoding of the immediate operand (for example, 6 bits value +
>           2 bits left-shift).  At least this is an option for `loadi'
>           and `storei'.
> 
>         - The naming of the memory hierarchies in the `cachemm'
>           instruction is ambiguous (in particular, the -c and -l suffixes).
>           We can still use numeric suffixes [0-7], however.
> 
>           Again, the arguments are reversed (`cachemm addr,count').
> 
>         - In the description of `move', remove the reference to `nop'.
>           BTW: there is no need to give `cmove' a separate name and
>           opcode.  If there is a condition suffix, it's a conditional move
>           (3-operand form), otherwise it's unconditional (2 operands):
> 
>                 move[s]{cond} r3, r2, r1
>                 move[s]           r2, r1
> 
>         - We need to clarify the syntax of the `condition' suffixes for
>           `move' and `jmpa'.  I suggest
> 
>                 000  -z   (zero)
>                 001       (unassigned)
>                 010  -m   (msb == 1)
>                 011  -l   (lsb == 0)
>                 100  -nz  (not zero)
>                 101       (unassigned)
>                 110  -nm  (msb == 0)
>                 111  -nl  (lsb == 0)
> 
>         - Assemblers must accept `loadcons[x] large-number' and emit a
>           suitable series of loadcons.n (or loadconsx.n) instructions
>           instead.  This is necessary for external symbol references
>           (which are resolved at link time).  Assemble-time constants
>           may be shortened to less than 64 bits, however, and if the
>           user explicitly requests `loadcons.0' or `loadconsx.0', the
>           assembler should of course do what (s)he wants (and complain
>           if the value is too large).
> 
>         - Can we please drop the `a' from `jmpa'?
> 
>           As with `move', the presence of the condition suffix indicates
>           the form of the instruction:
> 
>                 jmp[a]{cond} r3, r2 [, r1]
>                 jmp[a]           r2 [, r1]
> 
>         - When calling functions through pointers, it would be nice to
>           be able to tell the F-CPU *a priori* that a register contains a
>           code address.  While this can be done with an explicit prefetch
>           (load to r0) for data pointers, there is no way to specify that
>           a register contains a code address that the CPU will have to
>           visit soon.  The same is true when an absolute code address is
>           obtained via loadcons (which will probably be the common idiom
>           when a function in another object file is called, unless jump
>           tables are used -- which points us back to the `code pointer
>           in register' problem, again).
> 
>           To cut a long story short: I'd like to have an instruction
>           that explicitly `tags' a register as a pointer, and probably
>           initiates a prefetch cycle (for code or data, depending on
>           the instruction's flags).  It may or may not move data from
>           one register to another (one idea I had was a `pointer move'
>           instruction); if it does, it might be a good idea to let it
>           participate in address calculation (i.e. let it be able to
>           add two operands, like the `lea' instruction on Intel CPUs).
> 
>         - Let's clarify the suffix order, e.g. like this (? means the
>           suffix is currently unused, and its name is unassigned):
> 
>                 add[c|s|?]
>                 sub[b|f|?]
>                 mul[h][s]
>                 div[m][s]
>                 mac[l|h][s]             # I suggest to allow `macl' as an alias for `mac'.
>                 scan[n][r]
>                 bitop[s|c|x|t]
>                 bitopi[s|c|x|t]
>                 mix[l|h]
>                 expand[l|h]
>                 {rop2}[a|o]
>                 {rop2i}[a|o]
>                 load[f][e][0-7]
>                 loadi[f][e][0-7]
>                 store[f][e][0-7]
>                 storei[f][e][0-7]
>                 cachemm[f|p][l][c][0-7]
>                 move[s][n][z|?|m|l]
>                 jmpa[n][z|?|m|l]
>                 serialize[s][x][m]
> 
>         - Some instructions (e.g. `mac' and `addsub') could have
>           variants with an immediate operand.
> 
>         - The loadm/storem has a surprising operand order
>           (start,src/dest,count), and it's not clear whether the
>           register *numbers* or the register *contents* serve as the
>           start/count values.  I suggest the former, and I would also
>           change the operands to (firstreg, lastreg, memaddr) which is
>           much easier to grok for humans.
> 
>           Since there are some unused flags, another variant might be
>           interesting: `storem r2, r1', where r2 is used as a mask
>           (bit <n> == 1 means "load/store register <n>"), and r1 is the
>           address of the source/destination memory area (which must be
>           big enough to hold all registers, just like the CMB).
> 
>           Maybe it would be wiser to put the memory address into the
>           rightmost operand in *all* memory operations (load, store,
>           cachemm, loadm and storem).  Some instructions will always
>           have the wrong operand order, though.
> 
>         - And finally, the most important point: the new `nop' instruction
>           is still undocumented ;)
> 
> In case you wonder: I needed a break from VHDL coding (I couldn't
> even write C any more!), so I decided to play with something totally
> different for a while.  The result is a flex-based instruction encoder
> that recognizes almost any instruction the F-CPU will have (with the
> exceptions mentioned above).  I'll probably also build an assembler
> around it. (I finally found a real use for my libelf library! Yeah! ;)
> 
> > Sure, there needs to be an expansion/reduction code for FP
> > but SDUP works for SIMD FP if the packets have the same boundaries.
> 
> That's a different kind of operation.
> 
> --
>  Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
>  "All I wanna do is have a little fun before I die"
> *************************************************************
> To unsubscribe, send an e-mail to majordomo@seul.org with
> unsubscribe f-cpu       in the body. http://f-cpu.seul.org/
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/