[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] manual 0.2.5 quirks

I'm currently busy with something different (in case you're nosy:
symbolic verification of the F-CPU execution units) but I stopped to
have a look at the PDF version of the latest manual...

... and I still found some quirks in the instruction set part:

- Register naming is inconsistent in sub and div:

	sub r3, r2, r1 => r1 = r2 - r3
	div r3, r2, r1 => r1 = r3 / r2

  The `sub' variant is more logical.

- The current ASU sets the `borrow' register to 1, not -1.
  See vhdl/eu_asu/iadd.vhdl, line 344ff.

- The description of `subi' is wrong (registers swapped):

	subi imm8, r2, r1 => r2 = r1 - imm8

  In general, <op>i and <op> should be consistent. That is, the
  instruction is supposed to calculate `r1 = r2 - imm8'.

- Registers swapped in `mod' description. Should read:

	r1 = r2 % r3

  BTW: What this instruction really computes is the *remainder* of the
  division r2 / r3, *not* the modulus (this makes a difference when the
  operands are signed numbers). It therefore should be called `rem',
  not `mod'. Analogously, `divm' should be called `divr', or maybe
  `divrem' (just like `addsub'). And while we're at it, `sort' could
  have `minmax' as an alias (since that's what it does: compute the
  minimum and maximum of its operands at the same time).

- The `alternate result register' is still named `r1+1' throughout the
  manual. Didn't we change that to `r1^1'?

- The description if `cmpl' is missing:

	r1 = r2 < r3 ? -1 : 0

- With all compare/max/sort instructions, the reference to IEEE
  floating-point is misleading. Even if the instructions work with
  signed integers, they will NEVER compare IEEE floats correctly.
  Adding signed integer comparision, on the other hand, should not be
  too hard (remember: just invert both sign bits before comparision).

- Same for cmple (except that it performs: r1 = r2 <= r3 ? -1 : 0)

- Syntax for `scmpli' and `scmplei' is wrong. Where did the immediate
  operand go? The descriptions are missing, too.

- The `bitop' instruction is still listed with the bit shuffling ops,
  and contains a reference to the SHL unit. We *could* make this true, but
  that means we'll have to add a pipeline stage to the SHL unit (or route
  the output to the ROP2 unit, which will take an extra Xbar cycle).

  IIRC we decided that the function (F) is be encoded in the opcode,
  and that the immediate for bitopi should be 8 bits wide. The correct
  descriptions are:

	bitop    r3, r2, r1 => r1 = F(r2, 1 << r3)
	bitopi imm8, r2, r1 => r1 = F(r2, 1 << imm8)

- The `bitrev' instruction performs:

	bitrev r3, r2, r1 => r1 = reverse(r2) >> (size - r3 % size - 1)

  or, if you like that better:

	bitrev r3, r2, r1 => r1 = reverse(r2) >> (~r3 % size)

  That is, you always get ((r3 % size) + 1) result bits.

  The two-operand form (r3 = 0) makes no sense; it's essentially the
  same as `andi 1, r2, r1'.

  The -o suffix is unsupported unless we add a pipeline stage to the
  SHL unit (or similar, see `bitop').

  The SIMD variant `sbitrev' is undocumented. Is it really useless?

- The double-word shifts are missing. We currently have

	dshiftl r3, r2, r1 =>
		r1   = r2 << r3
		r1^1 = r2 >> (size - r3)	(*)

	dshiftr r3, r2, r1 =>
		r1   = (unsigned)r2 >> r3
		r1^1 = r2 << (size - r3)	(*)

	dshiftra r3, r2, r1 =>
		r1   = (signed)r2 >> r3
		r1^1 = r2 << (size - r3)	(*)

	dbitrev r3, r2, r1 =>
		r1   = reverse(r2) >> (size - r3 % size - 1)
		r1^1 = reverse(r2) << (r3 % size + 1)	(*)

  (immediate and SIMD versions also available).

  (*) result will be zero if the shift count equals the chunk size.

- The sshift*/srot*/sbitrev ops are available in `full-SIMD' and
  `half-SIMD' modes. The latter performs an implicit `sdup' on the shift
  count. The manual should state which is which (and also mention the
  other variant if we're going to support it).

- In the drawings for mix/expand, it's still not clear whether `source #1'
  is r2 and `source #2' is r3, or vice versa. I suggest that the least
  significant chunk of the result should always come from r2 (that is,
  source #1 is r2 and source #2 is r3).

- In the description of the logic operators, the registers are named
  inconsistently with the rest of the manual (r3 is destination).
  F is be encoded in the opcode in 3-bit form, and the immediate for
  `logici' is 8 bits wide (see `bitop' above). It's still not clear which
  operand is inverted when `andn' or `orn' is performed (I suggest r3,
  for symmetry). And finally: there is no `not' instruction.

- Floating-point compare is completely broken. With IEEE floats, two
  numbers can be less, equal or greater with respect to each other,
  but they can also be *unordered* (if one or both of them is NaN).
  To make things even more complicated, +0.0 and -0.0 compare equal,
  although they have different representations, while NaNs *never*
  compare equal even if their representations are identical.

- `fdiv' calculates r1 = r2 / r3, NOT r1 = r3 / r2.

- `fsqrt' has only two operands: r1 = fsqrt(r2).

- `flog'/`fexp' still have three-operand form which requires you to
  preload the logarithm's base into r3. If we *really* need three-operand
  forms, they should calculate something like

	r1 = log2(r2) / r3


	r1 = exp2(r2 * r3)

  with two-operand forms that implicitly set r3 to 1.0. If you set r3 to
  log2(n), you'll get log<n>(x) and <n>**x (for n > 0.0) without making
  the implementation of log() and exp() more complex than necessary.

- `faddsub' is supposed to calculate r2 + r3 and r2 - r3 (NOT r3 - r2).

- `move r0, r0' is no longer an alias for `nop'. There is a real `nop'
  instruction (opcode 0) now, and `move' has a different opcode.
  The textual description claims that `r2 is copied to r3', but the
  target is r1.

- While I suggested that `loadcons imm, r1' should be an assembler
  shortcut for a sequence of `loadcons.n imm16, r1' instructions, I didn't
  mean that the constant to load *must* be 64 bits wide. If it's only
  32 bits wide, the assembler might use a shorter instruction sequence
  (e.g. a loadcons followed by a loadconsx). The difference between
  `loadcons imm, r1' and `loadconsx imm, r1' should be that the assembler
  sign-extends the constant in the latter case.

 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/