[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Shifting theory (was Re: [f-cpu] VHDL and delay estimation)



hi,

Michael Riepe wrote:

Too much. Shifting itself takes d=LN/t=LN (wire delay not counted) but the successive calculation of `ss' is too heavy. Its delay becomes higher with every step of i, for a total of approximately d=16/t=16 when LN=6.
the wiring delays are an annoying issue.
particularly for shifters...

Currently, i use a method to approximate the "cost" of
wiring but it's not an absolute measure :
the delay is roughly proportional to the number of wires that
must be crossed.
For example, if you have to perform a AND between
vector(x) and vector(y), the "cost" is |x-y|

In practice it should be |x-y|/2 :

AND
/ \
/ \
/ \
/ \
| | ********¤******¤********* (vector)
x y

but that's the "big O" story.
The goal is to reduce the relative distance.
For a shifting block it is not always possible to place the gate
at the middle point between the sources.

There are some topologies, such as the one Michael implemented,
where all the shifting stages have the same critical datapath
but all paths have different lengths, i consider this under-performant
even though it's easier to layout. Shifters that have constant wire lengths
in each stage have a balanced CDP so there is no case against
which to optimise the layout.

Another issue is loss in the wires :
when a wire crosses more than 8 to 16 "bits",
a small buffer/inverter can boost the signal and reduce
the transmission time. Usually, this is automatically
performed by good synthesizers (and they will
even often optimise it out) and it will slow down RTL
simulation. However this issue is increasingly dominant in
modern silicon technologies and we must be careful
if we want to reach even higher clock speeds...

So, for mostly combinatorial blocks, such as addition,
there is not much to worry, but shift/rotation is another deal :
it seems to require only muxes but N wires of length L
require a surface of N x L. This comment applies to internal buses too.

For shifting, such as the normalisation, the "trick" is to create
a "generic" shifting stage. Pipelining can then be enabled
with a if ... generate that implements the pipeline latch.
the condition of the if can then be refined later, but we can
already start with ((level modulo X) == 0), where X is a
chip-specific parameter that indicate how many shifts per stage.
Later, "level" can be replaced with a more acurate cost estimation.

But before i implement this in my version of EU_SHL,
i'd need an up to date source code base.
Anybody wants to install subversion on seul.org ?

Michael.
YG

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/