[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] F-CPU architecture...


Michael Riepe wrote:

Hi F-gang,
Yann Guidon wrote:

Well, we don't need so many pins for I/O. A unidirectional 1 Gbps lane will give us throughputs of up to 100 MB/s.

too slow :-P

What about 2.5 Gbps (InfiniBand speed), then? ;-)

whatever is always too slow by any standard and by definition. but that's not really the point when we deal with the core only.

If we use e.g. 16 lanes in either direction (1.6 GB/s), then we need 64 pins on the F-CPU and 256 on the G-chip (differential signals assumed).

roughly. a larger FPGA will be more expensive, but will handle more F-CPUs,
so will have a better performance.

Are there really FPGAs that can handle signals in the 1 GHz range or above?

ask nicO :-P


Whether the DMA starts idle or waits for data from the I/O port is simply a matter of toggling control bits, IMHO.

well, DMA is usually an /explicit/ command from SW.

Maybe I should not call it DMA engine but "data moving engine" - in either case, it's an independent entity that can transfer data on its own.

cache refill is an automatic function that is expected by the core.

Well, yes. And who fills the cache with data that does not reside in memory?

So now we have definition troubles. what is what and how .....

And the interface to the DMA engine is not yet defined.
I wish something like on the SHARC is available
(DMA block descriptors are a linked list stored in RAM)
but this requires further thinking.

It doesn't have to be a list.

In the SHARC, it is. and it is pretty practical, you can make the list cyclic if you want, for example.
it's really a great piece of engineering. read the docs :-)

An array will work fine, too, and requires less hardware - incrementing a pointer is easier than loading a new one from memory.

The way i see the DMA engine is : - used by application SW to move large blocks from one address to another - configurable at will - sends an IRQ or sets a flag when one block is finished - has a queue of tasks (or a linked list, or whatever)

You speak about an array, but how and where ?

A ring buffer would also be possible with minimal overhead.

same as above : located where, and what kind of access ? a local DMA engine should be accessible from outside, and vice versa (a remote DMA should be able to be triggered)

And it has the advantage that you can easily add new requests while DMA is still running.

with a linked list too, there are suitable flags.

All we need is four parameters: address and size of the DMA descriptor table, and "head" and "tail" offsets into the table. The "head" will be maintained by the DMA engine, and when head and tail are the same (simple XOR), DMA stops until the OS moves the tail to a new position (after writing new descriptors, of course).

Other parameters are needed :
- priority (multiple DMAs should be able to take place out of order and simultaneously)
- trtigger IRQ or not
and a few others i forget.

Look at the SHARC and the Cell "SPU"s which also have DMA engines.


To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/