[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [f-cpu] Re: Floating-Point?
Michael Riepe a écrit :
>
> On Wed, Aug 15, 2001 at 10:22:11AM +0200, Yann Guidon wrote:
> [...]
> > > SIMD is IMHO not reasonable for the FP units.
> > in what context are you speaking ?
>
> I mean: I think it's unreasonable to build *variable-size* FP units.
> There are too many special cases to consider -- rounding, exceptions,
> infinities and NANs, ... (ok, go blame IEEE for it ;)
>
> > > A reasonable approach is
> > > to build a set of pipelined 64-bit FP units, and then issue the 32-bit
> > > operations in two consecutive cycles.
> > that's vectoring, then. Scheduling might become more complex,
> > in situations such as chaining for example.
>
> Not if it's "hidden" inside the EU.
>
> > I have nothing to object to that, but
> > - 1) currently we have no FP unit
> > - 2) SIMD already works well (when it does)
> > - 3) vectoring will be used in another core because FC0 would require too much changes
> > - 4) if you have 1 FP unit, the hardest is done : you can duplicate it :-P
>
> If you have enough room. Do you have an idea how big the FP unit will be?
>
It's big but not as a 256 Ko SRAM cache. So it will be the choice of the
user. Duplicate big unit could save a lot of computing power (because
you make a single run).
For your 16 bit fp processing, could not be better to use log number ?
For audio processing it could be enough (no DC stuff).
For the following text, it could be nice to update quickly the manual.
If whygee is ok.
nicO
> > > BTW: I think we need another instruction that converts 32-bit FP to 64-bit
> > > and vice versa (and maybe also does the mix/expand/sdup thingy for FP).
> >
> > geez, the instruction set in the current version of the manual needs a big rework...
>
> Yep. There are a handful of inconsistencies, typing errors, missing
> parts etc. in it. Major things I've found so far:
>
> - The manual doesn't state whether `modi' is a signed operation
> I suggest it should be signed (like `divi')
>
> - Complement `abs' with `nabs' (negative absolute) for
> symmetry, and to avoid the `sign surprise' when the argument
> is -2**(chunksize-1)
>
> - The syntax for the rounding mode (`l2int', `f2int') is not
> specified. I suggest to use the following syntax:
>
> l2int[r|t|f|c]
> f2int[r|t|f|c][x]
>
> with these meanings:
>
> -r (round) round to nearest (default)
> -t (trunc) round towards zero
> -f (floor) round towards -infinity
> -c (ceil) round towards +infinity
>
> - `int2f' and `int2l' also need rounding modes because both
> conversions may result in precision loss if the integer operand
> has a large value.
>
> - `bitop[s|c|x|t]i' should be `bitopi[s|c|x|t]' (`i' is NOT a suffix!)
>
> - Assign four opcodes for bitop[i] and increase the imm6 operand
> to imm8 (for consistency with the rop2, shift, rot, bitrevi and
> loadcons[x] instructions). Since bitop[i] is a ROP2 instruction,
> change the function encoding to match that of rop2, that is:
>
> fun rop2 bitop
> ================
> 000 and btst
> 001 andn bclr
> 010 xor bchg
> 011 or bset
> 100 nor --
> 101 xnor --
> 110 orn --
> 111 nand --
>
> I guess we can get the missing four instructions for free,
> but they aren't really useful.
>
> - The description of the ROP2 is obsolete (and the syntax for
> combine/mux is unspecified) I suggest -o and -a suffixes for
> combine, and a new `mux' instruction.
>
> - For the `andn' and `orn' instructions, the manual must
> clearly state which operand is inverted. IMHO, `andni' and
> `orni' will be almost useless if we invert the leftmost
> (== immediate) operand (but not completely useless, because
> the upper bits differ when the chunk size is 16 or more).
>
> On the other hand, we could add a flag for sign extension of the
> immediate operand and invert the middle (== register) operand.
> Since the function bits have moved to the opcode field, there
> should be a free flag.
>
> - There is no explicit `not' instruction, but users can write
> `nor r0, r2, r1', `xnor r0, r2, r1' or similar. Since this
> may not be obvious, F-CPU assemblers should recognize `not
> r2, r1' and convert it to one of the other forms internally.
> The `not' instruction should, however, be documented in the
> Instruction Set Manual.
>
> - In `bitrev[i]', use the formula `r1 = bit_reverse(r2) >> (size-r3-1)'.
> That will change the useful range for r3 to [size-1;0]. In the
> current version, it's [size;1] which is pretty ugly.
>
> Another possible variant is `r1 = bit_reverse(r2) >> r3', with
> the same useful range but a nicer default (r3 == r0) which
> makes the 2-operand short form `bitrev r2, r1' meaningful,
> but that may cause trouble when the register size is increased
> beyond 64 bits :(
>
> - `flog' and `fexp' should both take only two operands.
> Remember that (a**b)**c = a**(b*c) = a**(c*b) = (a**c)**b.
> That is, with a simple multiplication (before fexp / after
> flog) you get any base you want, and the FP unit probably
> works better with a fixed base.
>
> - We need a level-1 floating-point compare instruction;
> `cmpl'/`cmple' may work with LNS (if there are no NANs),
> but not with FP.
>
> - The arguments of `store[f]' are reversed (dest, src). It's
> ok that way (because it mirrors the `load' instruction) but
> there should be a BIG FAT WARNING in the manual.
>
> - Some immediate instructions may benefit from a non-linear
> encoding of the immediate operand (for example, 6 bits value +
> 2 bits left-shift). At least this is an option for `loadi'
> and `storei'.
>
> - The naming of the memory hierarchies in the `cachemm'
> instruction is ambiguous (in particular, the -c and -l suffixes).
> We can still use numeric suffixes [0-7], however.
>
> Again, the arguments are reversed (`cachemm addr,count').
>
> - In the description of `move', remove the reference to `nop'.
> BTW: there is no need to give `cmove' a separate name and
> opcode. If there is a condition suffix, it's a conditional move
> (3-operand form), otherwise it's unconditional (2 operands):
>
> move[s]{cond} r3, r2, r1
> move[s] r2, r1
>
> - We need to clarify the syntax of the `condition' suffixes for
> `move' and `jmpa'. I suggest
>
> 000 -z (zero)
> 001 (unassigned)
> 010 -m (msb == 1)
> 011 -l (lsb == 0)
> 100 -nz (not zero)
> 101 (unassigned)
> 110 -nm (msb == 0)
> 111 -nl (lsb == 0)
>
> - Assemblers must accept `loadcons[x] large-number' and emit a
> suitable series of loadcons.n (or loadconsx.n) instructions
> instead. This is necessary for external symbol references
> (which are resolved at link time). Assemble-time constants
> may be shortened to less than 64 bits, however, and if the
> user explicitly requests `loadcons.0' or `loadconsx.0', the
> assembler should of course do what (s)he wants (and complain
> if the value is too large).
>
> - Can we please drop the `a' from `jmpa'?
>
> As with `move', the presence of the condition suffix indicates
> the form of the instruction:
>
> jmp[a]{cond} r3, r2 [, r1]
> jmp[a] r2 [, r1]
>
> - When calling functions through pointers, it would be nice to
> be able to tell the F-CPU *a priori* that a register contains a
> code address. While this can be done with an explicit prefetch
> (load to r0) for data pointers, there is no way to specify that
> a register contains a code address that the CPU will have to
> visit soon. The same is true when an absolute code address is
> obtained via loadcons (which will probably be the common idiom
> when a function in another object file is called, unless jump
> tables are used -- which points us back to the `code pointer
> in register' problem, again).
>
> To cut a long story short: I'd like to have an instruction
> that explicitly `tags' a register as a pointer, and probably
> initiates a prefetch cycle (for code or data, depending on
> the instruction's flags). It may or may not move data from
> one register to another (one idea I had was a `pointer move'
> instruction); if it does, it might be a good idea to let it
> participate in address calculation (i.e. let it be able to
> add two operands, like the `lea' instruction on Intel CPUs).
>
> - Let's clarify the suffix order, e.g. like this (? means the
> suffix is currently unused, and its name is unassigned):
>
> add[c|s|?]
> sub[b|f|?]
> mul[h][s]
> div[m][s]
> mac[l|h][s] # I suggest to allow `macl' as an alias for `mac'.
> scan[n][r]
> bitop[s|c|x|t]
> bitopi[s|c|x|t]
> mix[l|h]
> expand[l|h]
> {rop2}[a|o]
> {rop2i}[a|o]
> load[f][e][0-7]
> loadi[f][e][0-7]
> store[f][e][0-7]
> storei[f][e][0-7]
> cachemm[f|p][l][c][0-7]
> move[s][n][z|?|m|l]
> jmpa[n][z|?|m|l]
> serialize[s][x][m]
>
> - Some instructions (e.g. `mac' and `addsub') could have
> variants with an immediate operand.
>
> - The loadm/storem has a surprising operand order
> (start,src/dest,count), and it's not clear whether the
> register *numbers* or the register *contents* serve as the
> start/count values. I suggest the former, and I would also
> change the operands to (firstreg, lastreg, memaddr) which is
> much easier to grok for humans.
>
> Since there are some unused flags, another variant might be
> interesting: `storem r2, r1', where r2 is used as a mask
> (bit <n> == 1 means "load/store register <n>"), and r1 is the
> address of the source/destination memory area (which must be
> big enough to hold all registers, just like the CMB).
>
> Maybe it would be wiser to put the memory address into the
> rightmost operand in *all* memory operations (load, store,
> cachemm, loadm and storem). Some instructions will always
> have the wrong operand order, though.
>
> - And finally, the most important point: the new `nop' instruction
> is still undocumented ;)
>
> In case you wonder: I needed a break from VHDL coding (I couldn't
> even write C any more!), so I decided to play with something totally
> different for a while. The result is a flex-based instruction encoder
> that recognizes almost any instruction the F-CPU will have (with the
> exceptions mentioned above). I'll probably also build an assembler
> around it. (I finally found a real use for my libelf library! Yeah! ;)
>
> > Sure, there needs to be an expansion/reduction code for FP
> > but SDUP works for SIMD FP if the packets have the same boundaries.
>
> That's a different kind of operation.
>
> --
> Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
> "All I wanna do is have a little fun before I die"
> *************************************************************
> To unsubscribe, send an e-mail to majordomo@seul.org with
> unsubscribe f-cpu in the body. http://f-cpu.seul.org/
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/