[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [f-cpu] F-CPU architecture...
Tobias Bergmann wrote:
it puts some constraints on the LFSR algo but
it makes it more challenging and interesting :-)
You mean Reseeding?
I work on dynamic reseeding atm. Maybe something can be reused for the
so you have any reference or URL on this subject ?
Well your power supply has to be dimensioned for this worst case as
well. Makes it more expensive for no good reason.
hmmm not sure.
we'll have to "measure" the average and max activity ...
Usually power during random test is approx 4x the power in system mode
at same freq.
where does this figure come ?
for FC0, i would expect 2x max when compared with optimized code.
But the ratio depends on whether you look at a low power design or
high performance design. So we have to obtain it for F-CPU.
i had thought about defining our own VHDL data types
(instead of std_logic) so we can implement our own coverage tools.
It can also serve to create stats about activity etc...
but that would be very heavy and may not remain acurate
when we implement the core in ASIC or FPGA.
sometimes, synthesis can radically change the netlist and the low-level
If I'm not mistaken then SIGNS gets that functionality soon or already
No need to spend precious F-CPU-time on it.
Oh I forgot to mention: A collegue of mine is writing a OS tool for
circuit simulation, synthesis, ATPG, fault sim, ...
It's called signs:
that will also interest Michael Riepe.
at first quick look, it seems very useful for us.
I'm not rich but I have quite nice FPGAs at work.
such as ? :-)
A couple of prototype boards with Virtex-something and an Emu-machine
with 3 large FPGAs.
I don't synthesize usually so I don't care much about exact
size/speed/etc. But I can have a look. And I know we ordered a bigger
one for next year.
What I remember is that we can handle designs of approx 10MGates.
hmm that should be enough ;-P
that is the best point to start. x86 proves that we can always scale up
and the F-CPU model has some headroom.
scalability is good.
that was the goal ;-P
How large would the effort be to add SMT to the FC0 core? I'm thinking
of approx. 3-fold SMT.
better use core duplication.
yes, single-thread performance is quite poor for FC0 because of
FC0 works best in loops that are unrolled and interleaved, like what
would be done with
a 2- or -3way superscalar design.
SMT would be a natural choice but all the rest would explode,
particularly the register set's size which, IMHO, is the biggest limitation
if we want to increase the frequency ...
The register set read latency is absolutely critical for the FC0's
(as noted in the register renaming post) so adding a pipeline stage or two
would make it even worse.
Another problem with SMT is the increased memory access contentions.
On the frontline, the L0 memory buffers (the Fetcher and the LSU)
would need to be scaled up as well (more lines, hence larger units, so they
On top of that, a single thread can put the memory controller on its
SMT should help to interleave the access to this vital resource.
however, what happens when one thread completely saturates
the bandwidth with "vector computations" ?
All the threads disturb each others anyway.
For VSP, it is a realistic approach (VSP is slower than a SDRAM chip,
so we have plenty of bandwidth headroom). OTOH F-CPU is memory-bound.
The way i "solve" the memory bandwidth problem is by going
"multichip" (coherent NUMA) instead of "multicore".
As long as we put the available transistors on a die to good use. :-)
doesn't this always depend on the end user's application ? :-)
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/