[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] Late answer on "



I just ressurected it from my HDD...

> This may come a little late in the design and development process, but...

fortunately not :-)

> I suggest we drop the `partial write' feature.
it's what we discuss in the other thread.

> In many cases, it makes
> no sense to keep the upper part of a register when only a fraction of
> it is written. E.g. when you load a byte from memory and want to use
> it as a full-width integer (or boolean - remember that jmp/move always
> look at the *whole* register), you either have to
> 
>     - `zero out' the destination register
>     - load the LSByte
> 
> or
> 
>     - load the LSByte
>     - mask off any other bit
> 
> Additionally, partial writes make the register set and the scoreboard
> logic (zero detection) much more complex than necessary - in fact, it
> makes some promising register set implementations almost impossible. The
> only thing it makes easier is constant loading. In fact that is the *only*
> operation that needs the partial write ability *at all*. Doesn't it?

At the time when it was decided (3 years ago ? i don't remember),
it seemed like a good idea...

> Currently, we need 8 constant loading instructions: loadcons and loadconsx
> with four variants each (in order to be able to load up to 256 bits -
> if there will ever be an F-CPU with registers wider than 256 bits, we
> will need *even more* instructions!).
don't worry, with 512 bit registers, it's faster to "loadcons" the pointer
to the immediate data, or do a shift ...

> On the other hand, two slightly
> different instructions would be sufficient for *all* word sizes:
> 
>     loadcons $imm17, reg    // similar to the original `loadconsx'
>     => reg := sign_extend(imm17)
> 
>     loadconsp $imm16, reg   // `p' means `partial'
>     => reg := shift_left(reg, 16) | imm16
> 
> Values between -65536 and 65535, inclusively, can be loaded with a
> single instruction, 32-bit values need two instructions, and so on.
> This solution is more general than the original loadcons[x] instructions
> and IMHO also much more elegant.

do you meant that you include the SHL in the pipeline ?
in that case, "strings" of consecutive loadcons will have a terrific
latency ! The purpose of the previous version was clearly to allow
the programmer to issue 4 loadcons in 4 cycles, in a row.

> Since we need 8 bits for the opcode and 6 bits for the destination
> register, we can encode all variants using only a single opcode (compared
> to 8 opcodes for loadcons[x]):
given the relative usefulness of loadcons, allocating 8 opcodes is not
completely unjustified.

>          8   + 1 + 1 +   16  +  6  = 32 bits
>     +--------+---+---+-------+-----+
>     | opcode | P | S | imm16 | reg |
>     +--------+---+---+-------+-----+
> 
>         P=0 => load full register; S is the sign bit
>         P=1 => load least significant 16 bits of the register; S is ignored
> 
> In case you didn't notice it: the same encoding is used by `loadaddri[d]'.
thanks for the remark, but `loadaddri[d]' doesn't use SHL...

> Implementing the new `loadcons' is simple: the decoder sign-extends the
> immediate value and sends it along. `loadconsp' is a little more tricky
> because it needs a `feedback loop' from one of the register set's read
> ports to one of the write ports. Fortunately, the left shift and the
> `or' operations take almost no time (we need an extra mux, the rest is
> just a bunch of wires).

I am more and more reluctant to perform shifts on the Xbar.
I thought we could perform some bit-reversing there, for example,
but in practice it's too difficult to manage. And how do you
manage the bypasses ?... i don't want this to become yet
another naughty hack.

It is possible to operate on the immediates because the decode stage
leaves enough time to amplify the signals (scalar, sign extension or
SIMD modes), but later it becomes too difficult.

> Without partial writes, other (non-SIMD) instructions that operate on
> partial words shall set the upper bits of the result to zero (= simple
> AND operation). Sign extension can either be performed by `move[s]' or
> by a separate `sext' instruction; the `widen' instruction is no longer
> necessary (it was an ugly kludge anyway).
> 
> Ok, that had to be said.
do you feel better, now ? :-))

> Now it's your turn...

i don't want to use the "shift" approach. I don't know for the ALPHA,
but even MIPS uses a specific instruction to load the MSB with a constant.

The "relative" approach increases the dependencies between the operations,
while the "absolute" way does not require an order. I remember that Cedric
used loadcons optimisations to create a specific constant in his RC5 code...

the "old" loadcons can still be done without partial writes, like you
said, with another MUX in the CDP. ok.
But remember that a shift requires a certain amount of Silicon surface,
much more than a simple mux, and it depends on the number of wires to cross.


My conclusion : partial writes are being abandonned but
the "old" loadcons is still useful and easy to do.
I don't even think that there will be a problem.
It's just like a "move" instruction but with a modified
datapath.

>  Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/