[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] F-CPU architecture...

To: f-cpu@xxxxxxxx
Subject: Re: [f-cpu] F-CPU architecture...
From: Yann Guidon <whygee@xxxxxxxxx>
Date: Sat, 27 Aug 2005 17:57:50 +0200
Delivered-to: archiver@seul.org
Delivered-to: f-cpu-outgoing@seul.org
Delivered-to: f-cpu@seul.org
Delivery-date: Sat, 27 Aug 2005 11:42:58 -0400
In-reply-to: <43103453.2010106@gmx.de>
Organization: Freedom CPU Project
References: <20050826012228.78844.qmail@web54513.mail.yahoo.com> <430E799B.5090103@mr511.de> <430E7F99.5010904@f-cpu.org> <430EE68A.6070605@gmx.de> <430F335C.8020602@f-cpu.org> <430F3702.1050807@gmx.de> <430F48F9.9060202@f-cpu.org> <430FB40A.4090206@gmx.de> <430FCE75.8030802@f-cpu.org> <43103453.2010106@gmx.de>
Reply-to: f-cpu@xxxxxxxx
Sender: owner-f-cpu@xxxxxxxx
User-agent: Mozilla/5.0 (Windows; U; Win95; en-US; rv:1.4.1) Gecko/20031008

Tobias Bergmann wrote:

Hi Yann,


hallo,

it puts some constraints on the LFSR algo but
it makes it more challenging and interesting :-)

You mean Reseeding?


not specifically.

I work on dynamic reseeding atm. Maybe something can be reused for the F-CPU.


so you have any reference or URL on this subject ?

Well your power supply has to be dimensioned for this worst case as well. Makes it more expensive for no good reason.
hmmm not sure.
we'll have to "measure" the average and max activity ...
Usually power during random test is approx 4x the power in system mode at same freq.


where does this figure come ?
for FC0, i would expect 2x max when compared with optimized code.

But the ratio depends on whether you look at a low power design or high performance design. So we have to obtain it for F-CPU.


sure.

i had thought about defining our own VHDL data types
(instead of std_logic) so we can implement our own coverage tools.
It can also serve to create stats about activity etc...
but that would be very heavy and may not remain acurate
when we implement the core in ASIC or FPGA.
sometimes, synthesis can radically change the netlist and the low-level
architecture.

If I'm not mistaken then SIGNS gets that functionality soon or already has it. No need to spend precious F-CPU-time on it.


great !

Oh I forgot to mention: A collegue of mine is writing a OS tool for circuit simulation, synthesis, ATPG, fault sim, ... It's called signs: http://www.iti.uni-stuttgart.de/~bartscgr/signs/wiki/index.php/Main_Page


that will also interest Michael Riepe.
at first quick look, it seems very useful for us.

I'm not rich but I have quite nice FPGAs at work.
such as ? :-)
A couple of prototype boards with Virtex-something and an Emu-machine with 3 large FPGAs. I don't synthesize usually so I don't care much about exact size/speed/etc. But I can have a look. And I know we ordered a bigger one for next year. What I remember is that we can handle designs of approx 10MGates.


hmm that should be enough ;-P

that is the best point to start. x86 proves that we can always scale up
and the F-CPU model has some headroom.

scalability is good.


that was the goal ;-P

How large would the effort be to add SMT to the FC0 core? I'm thinking of approx. 3-fold SMT.

better use core duplication. yes, single-thread performance is quite poor for FC0 because of inter-instruction dependencies. FC0 works best in loops that are unrolled and interleaved, like what would be done with a 2- or -3way superscalar design.

SMT would be a natural choice but all the rest would explode, particularly the register set's size which, IMHO, is the biggest limitation if we want to increase the frequency ... The register set read latency is absolutely critical for the FC0's performance (as noted in the register renaming post) so adding a pipeline stage or two would make it even worse.

Another problem with SMT is the increased memory access contentions.
On the frontline, the L0 memory buffers (the Fetcher and the LSU)
would need to be scaled up as well (more lines, hence larger units, so they
are slower).

On top of that, a single thread can put the memory controller on its virtual knees. SMT should help to interleave the access to this vital resource. however, what happens when one thread completely saturates the bandwidth with "vector computations" ? All the threads disturb each others anyway.

For VSP, it is a realistic approach (VSP is slower than a SDRAM chip,
so we have plenty of bandwidth headroom). OTOH F-CPU is memory-bound.

The way i "solve" the memory bandwidth problem is by going
"multichip" (coherent NUMA) instead of "multicore".
As long as we put the available transistors on a die to good use. :-)


doesn't this always depend on the end user's application ? :-)

bis besser,
Tobias

YG

*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

Follow-Ups:
- Re: [f-cpu] F-CPU architecture...
  - From: Tobias Bergmann

References:
- Re: [f-cpu] F-CPU architecture...
  - From: Bogdan Petrisor
- Re: [f-cpu] F-CPU architecture...
  - From: Michael Riepe
- Re: [f-cpu] F-CPU architecture...
  - From: Yann Guidon
- Re: [f-cpu] F-CPU architecture...
  - From: Tobias Bergmann
- Re: [f-cpu] F-CPU architecture...
  - From: Yann Guidon
- Re: [f-cpu] F-CPU architecture...
  - From: Tobias Bergmann
- Re: [f-cpu] F-CPU architecture...
  - From: Yann Guidon
- Re: [f-cpu] F-CPU architecture...
  - From: Tobias Bergmann
- Re: [f-cpu] F-CPU architecture...
  - From: Yann Guidon
- Re: [f-cpu] F-CPU architecture...
  - From: Tobias Bergmann

Prev by Author: Re: [f-cpu] Offtopic...
Next by Author: Re: [f-cpu] F-CPU architecture...
Previous by thread: Re: [f-cpu] F-CPU architecture...
Next by thread: Re: [f-cpu] F-CPU architecture...
Index(es):
- Author
- Thread