[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [f-cpu] (!) a few noteworthy things
On Mon, Jun 17, 2002 at 03:31:15AM +0200, Yann Guidon wrote:
[...]
> - the SIMD flag still creates problems.
> Partial writes to a register are handled but bypass conditions are
> a major headache, and this has a big impact on the "zero flags".
> We should not forget the potential troubles that this choice
> can make on future architectures. Here are the existing possibilities :
> a) specify that the high part is unchanged
> (only the low byte/word/dword/etc. is updated)
> --> this is the current approach.
- requires partial writes
- requires additional instructions for zero/sign extension
> b) specify that the high part is cleared --> simpler solution
+ requires no partial writes
+ saves on instruction for zero extension
+ cheap to implement:
signal X, Y, Mask : std_ulogic_vector(63 downto 0);
...
Mask <= (
63 downto 32 => SIMD or U(2),
31 downto 16 => SIMD or U(1),
15 downto 8 => SIMD or U(0),
others => '1'
);
-- note that Mask is available from the decoder
-- there's only an AND (or maybe MUX) inside the signal path
Y <= X and Mask;
> c) specify that the high part is sign-extended
> (sign extension might create troubles like those of the
> current solution
+ requires no partial writes
+ saves one instruction for sign extension
- more complex than b) because there are multiple sign bits to
consider
> d) specify that the SIMD flag has no effect at all and the
> high part is updated with the rest of the word (just like a
> normal SIMD operation would do)
+ all the world is SIMD :)
+ requires no partial writes
+ even cheaper to implement than b)
- requires additional instruction for zero/sign extension
> e) specify that the flag return an "undefined/reserved" behaviour
> for the MSB (could be both dangerous and safe, it would force
> compilers to generate valid pointers all the time)
+ even cheaper to implement than b)
- worst solution ever
> Also don't forget that usually, the MSB is not critical :
> when you operate on bytes or short ints, all the operations
> on that variable will have the corresponding/correct size flag
> and the rest of the register won't matter ...
>
> However it is important to consider the implication on the Xbar
> and the decoding logic, when bypass is required. d) and e) simplify
> the design because we don't have to choose subword results.
[...]
> personal notes :
> a) is possible but a bit complex.
> b) is simpler but still requires a mux (so a) would be the same)
> c) is a bit like b but the sign must be propagated :
> more complex because we must choose between at least
> 3 sign bits (corresponding to a 8, 16 and 32-bit result)
> d) is plain simple and would be a choice except that it would confuse compilers
> e) is a "failsafe" solution that would allow the implementor to choose between
> a), b), c) and d) on a case-per case basis. This is some more pressure on the
> compiler but i guess it's still manageable.
>
> As long as the debate is not closed, e) would be a safe bet before a) is completely
> supported and implemented. However it would become a problem, for example when
> the result is a byte and the next operations needs an int -> the unknown parts
> should be explicitely extended...
e) will allow implementors to build F-CPUs that work like a), b), c), d),
or any other way. As soon as those versions exist, programmers will use
this particular `feature' (trust me - they *will*), and the resulting
code will no longer be compatible between F-CPU versions. Therefore,
we have to avoid e). Since I don't like a), and c) is more expensive
than b), and d) is what we have in SIMD mode, I prefer b).
On the other hand, turning SIMD on unconditionally *is* tempting.
It would free one flag and streamline the instruction set (the s- prefix
will no longer be needed). That is, my second choice is d).
What about f): keep the SIMD bit but make d) the default and b) optional
behaviour. That is, when the SIMD flag is cleared, a `conforming'
F-CPU must either mask the result or trigger an `invalid instruction'
trap (this can be handled inside the decoder). From the design and
specification point of view, this solution is much cleaner than e).
I suggest we choose f) but make any reasonable effort to implement b).
Did you think about the new loadcons[p] I suggested?
--
Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
"All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/