[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] back to VHDL



hi !

Michael Riepe wrote:
> On Sun, Feb 10, 2002 at 05:30:12PM +0100, Yann Guidon wrote:
> [...]
> > by "align", i mean : we get a N-bit word from memory
> > but when we request a smaller chunk (a byte, for example),
> > it needs to be "aligned" so the requested data appears
> > on the LSB, whether or not the address is aligned on N bits.
> 
> That will only work in a load instruction, not in a store. On the other
> hand, if we have the hardware to do byte stores (that is, select lines for
> every single byte and so on), we can use it for byte loads as well.

here we speak about the bi-directionality of the unit, which is another
problem. One way to design the LSU is to use byte-wide write enables
and "dirty" bits, so what we have to do is shift the written word
so that it appears on the correct "alignment". If we don't
use the SHL, the shift spans 256 bits (with a maximum of 32 positions).
It becomes a rather large unit, and it can't be bidirectional,
from what i remember from my old first attempts. So it's going
to be large (use some surface) and not very fast. That's why i always
counted LSU accesses as 2 cycles (1 for shift and 1 for actual read/write).

> > I have no intention to provide "unaligned" access
> > (when a word of 2^N bits has a pointer with the N LSB not cleared)
> > because what would happen if the word crosses a page boundary ?...
> 
> *boom* Yes, I know.

Jo gojl, hä hä hä ...

> > > We should duplicate the byterev part and add it to the LSU (for
> > > loade/storee), but the rest is IMHO overkill.
> > if byterev is already done in one unit, we could spare the duplication...
> 
> We could also move the bytewise stuff (byterev, mix, expand and sdup)
> out of the SHL unit. That will make the bitwise operations faster (they're
> more timing-critical anyway) because the output mux can become smaller.

So there are 3 different units ?
 - 1 bytewise SIMD (word-slices)
 - 1 bitwise (shifts, rotations...)
 - 1 for the LSU (in fact 2 : one for each direction)
it's going to take some room...
But if you can make SDUP/EXPAND ect last 1 cycle only, why not.

> > > In general, data should be properly aligned. Applications that violate
> > > this rule are supposed to do their bit fiddling on their own.
> > of course, but what happens if we request one byte, half-word or word
> > from a 64-bit word ? this is where the SHL can help. Maybe i was not
> > specific enough...
> 
> But using the bit-shifter would be overkill. And it would slow down the
> SHL -- we need an additional input mux if we want to use it for both
> `sub-word' loads and generic shift operations. And it doesn't help
> with sub-word stores at all.

i agree that, if we want to use SHL for the LSU, we need MUXes at the SHL input.
The result will go both to Xbar and LSU, too (so it works both for load and store).
but i don't see why "it doesn't help with sub-word stores at all."

@+
>  Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/