[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] the wrong way (or not?)



hi,

Kim Enkovaara wrote:
> On Tue, 28 Aug 2001, Yann Guidon wrote:
> 
> > LEON serving as "service processor" for the F-CPU is
> > cool but there must be a way to handle the bandwidth and
> > clock speed difference : the F-CPU works with 256-bit wide
> > cache lines and is clocked at least 2x faster (with the
> > same silicon process, as a rough estimate). You'll have
> > to put two or four LEONs in your desktop box to make a
> > balanced system.
> 
> How do you know how well F-CPU will clock before the RTL code is done :)
> Or has someone made already some rough estimations about the logic
> deepness in each pipeline stage and estimations about the critical paths
> inside pipeline. That could give some hints about the speed.

concerning synthesis, nicolas is best placed to answer.
However, the FC0 is designed from the start around the critical datapath :
each pipeline stage must have a very reduced complexity and depth.
this eases both the design and the clock speed.

> btw. for FPGA emulation there might be a need to do the ALU differently.
> In FPGA architecture ALU must be designed around the 4-input LUT (many
> architectures use 4 input). This limitation leads to a design where the
> logical and aritcmetical ALUs are implemented separately etc. In this way
> the logic deepness can be one logic level less (at least). But I'm not a
> expert on this, there have been some discussions about this in
> comp.arch.fpga or comp.lang.vhdl or verilog areas.

Several years ago, i have used MUX-based FPGAs (Actel A1020).
the "gates" are 3 2-input multiplexors, and they could be arranged
to perform a wide range of logic funtions (sequential or combinatorial).
logic functions are often limited to 4 inputs in this context.
So yes i'm already used to live with this limitation.
I have also used 4-input LUTs with the custom chips at META.

Concerning the FC0, the original goal is to have a maximum of
6 four-input gates in the critical datapath. Add to that the fanout
problems and the wire lengths. If you consider the 6 logical gates
CDP, you see that you can't do much during a clock cycle. at least
one half of what other CPUs do.

This morning i polished the last version of the ROP2 unit and i'll
release it ASAP. the critical datapth contains (in order) :
AND3, OR4, AND/OR8, MUX4. This is in "VHDL" coding style without
using gates or synthesis. through optimisation and boolean simplifications
(choosing to use inverted levels at the proper place), i think that
the criterium will be met.

Concerning your answer to Ben, i agree : i have nothing to add.

Concerning the LEON, i have just read a few source files, concerning
the technology-dependent ("megacells") part. The PDF manual is well done
but in return the source files (those that i have read) are not
commented at all :-/

I am now concerned about the register set and its asynchrounous write
cycle. It is not clear yet but if we use asynchronous stuff, we need
a local 2x clock to generate the write strobe. Otherwise write will
require at least 2 cycles and we won't meet the goal of 2 writes / cycle.
using synchronous banks just "moves" the problem to the next cycle.
help !

> Mr. Kim Enkovaara
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/