[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] (!) a few noteworthy things

On Mon, Jun 17, 2002 at 03:31:15AM +0200, Yann Guidon wrote:
> - the SIMD flag still creates problems.
> Partial writes to a register are handled but bypass conditions are
> a major headache, and this has a big impact on the "zero flags".
> We should not forget the potential troubles that this choice
> can make on future architectures. Here are the existing possibilities :
>  a) specify that the high part is unchanged
>    (only the low byte/word/dword/etc. is updated)
>   --> this is the current approach.

- requires partial writes
- requires additional instructions for zero/sign extension

>  b) specify that the high part is cleared --> simpler solution

+ requires no partial writes
+ saves on instruction for zero extension
+ cheap to implement:

	signal X, Y, Mask : std_ulogic_vector(63 downto 0);
	Mask <= (
		63 downto 32 => SIMD or U(2),
		31 downto 16 => SIMD or U(1),
		15 downto  8 => SIMD or U(0),
		others => '1'
	-- note that Mask is available from the decoder
	-- there's only an AND (or maybe MUX) inside the signal path
	Y <= X and Mask;

>  c) specify that the high part is sign-extended
>     (sign extension might create troubles like those of the
>      current solution

+ requires no partial writes
+ saves one instruction for sign extension
- more complex than b) because there are multiple sign bits to

>  d) specify that the SIMD flag has no effect at all and the
>    high part is updated with the rest of the word (just like a
>    normal SIMD operation would do)

+ all the world is SIMD :)
+ requires no partial writes
+ even cheaper to implement than b)
- requires additional instruction for zero/sign extension

>  e) specify that the flag return an "undefined/reserved" behaviour
>    for the MSB (could be both dangerous and safe, it would force
>    compilers to generate valid pointers all the time)

+ even cheaper to implement than b)
- worst solution ever

> Also don't forget that usually, the MSB is not critical :
> when you operate on bytes or short ints, all the operations
> on that variable will have the corresponding/correct size flag
> and the rest of the register won't matter ...
> However it is important to consider the implication on the Xbar
> and the decoding logic, when bypass is required. d) and e) simplify
> the design because we don't have to choose subword results.

> personal notes :
>  a) is possible but a bit complex.
>  b) is simpler but still requires a mux (so a) would be the same)
>  c) is a bit like b but the sign must be propagated :
>      more complex because we must choose between at least
>     3 sign bits (corresponding to a 8, 16 and 32-bit result)
>  d) is plain simple and would be a choice except that it would confuse compilers
>  e) is a "failsafe" solution that would allow the implementor to choose between
>     a), b), c) and d) on a case-per case basis. This is some more pressure on the
>     compiler but i guess it's still manageable.
> As long as the debate is not closed, e) would be a safe bet before a) is completely
> supported and implemented. However it would become a problem, for example when
> the result is a byte and the next operations needs an int -> the unknown parts
> should be explicitely extended...

e) will allow implementors to build F-CPUs that work like a), b), c), d),
or any other way. As soon as those versions exist, programmers will use
this particular `feature' (trust me - they *will*), and the resulting
code will no longer be compatible between F-CPU versions.  Therefore,
we have to avoid e). Since I don't like a), and c) is more expensive
than b), and d) is what we have in SIMD mode, I prefer b).

On the other hand, turning SIMD on unconditionally *is* tempting.
It would free one flag and streamline the instruction set (the s- prefix
will no longer be needed). That is, my second choice is d).

What about f): keep the SIMD bit but make d) the default and b) optional
behaviour. That is, when the SIMD flag is cleared, a `conforming'
F-CPU must either mask the result or trigger an `invalid instruction'
trap (this can be handled inside the decoder).  From the design and
specification point of view, this solution is much cleaner than e).

I suggest we choose f) but make any reasonable effort to implement b).

Did you think about the new loadcons[p] I suggested?

 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/