[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Barrel Shifter



hi !

> "Richard E. Hartny" wrote:
> The conversation today seems related to the Barrel Shifter.
right :-)
But since Michael has done the first work, i will let him go on and
concentrate on the synthesis problems. I will soon study where "Alliance"
is written so i bet that i'll have a clue about how it works at the end
of the year :-)

>  Keep in mind that my design is M2M with a 32 Bit Shifter.  The Shifter
> uses 4-input Multiplexers and requires three logic levels, plus a 32 bit,
> two input mux on both the input and output.
it sounds like your design is a regular array that shifts 0,1,2,3 on
the  first stage, 0,4,8,12 on the second and 0 or 16 at the last stage (+ bit reversing).
I think that each gate has a fanout of 4. However it creates a problem with 64 bits and
especially at the last stage (or first depending on the sense) : imagine
all the wires crossing... it creates capacitive and inductive problems.
I am seeking a different approach, probably in a different kind of litterature.
I don't rely on compilers to do the work for me.

A book by J. Ullman "Computational Aspects of VLSI" shows that the layout
of this circuit creats problems with the surface. Yes, wires use surface
and it is linear with the length and (by inference) it increases with the number
of bit positions you want to skip. Shifting is O(n^2). It may still be
sustainable at 32 bits but the complex 64-bit shuffler can't do that
(unless you're not looking at the surface space and the speed of course).

I am looking at a scheme that does the work with more steps than usually,
hence faster and with less wires.

>  The two input mux's are for Bit Reversal to use the array for both
> left and right shifts.  A Normalize function is burried in there for
> BOTH the Normalize and Arithmetic Shift Left.
> 
> With the above in mind-----the array has an execute time of 13.0 NS.  This is
> 77 MHZ if talking about stand alone time.  It requires 181 Logic Cells
> including Buffers for circuit loading of Max 8 loads.
it sounds ok.

> For your info - an And/Or function is a Two input mux.
however there is a problem : LUT-based FPGA with 4 inputs can't
do certain things (i have had this problem at META systems).
The alternative that Michael proposes (as long as it is an
"alternative" only, that can be switched by the user) is senseful
in that case : with 4-input LUT, and provided the clocking is "clean"
you perform two AND plus one OR with the first gates (1-bit MUX)
and the rest is a balanced tree of ORs. If we used multiplexors, the
balanced tree would be 2x deeper (and slower).
But there is still a problem : decoding ! you have to generate 1 select per
input, adding to the delay. If you multiplex 8 inputs, it's OK :
each 1st level LUT decodes the 3 bits address and the 1 bit data,
and you have a balanced tree of 3 gates. But with wider MUX, the
overhead is becoming meaningful.

>  And - - -
> 
>         An 8 Bit Shifter = 1 logic level
>         An 16 Bit Shifter = 2 logic levels
>         An 32 Bit Shifter = 3 logic levels
>         An 64 Bit Shifter = 4 logic levels
>         An 128 Bit Shifter = 5 logic levels
> 
> The above uses the Quicklogic data from the QL6600 printout.

from the above, i presume that you use pseudo-TTL138 devices,
not 4-input muxes.

> Regards
> Dick Hartney
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/