[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Where are the LSU and fetcher descriptions??



> Hi.
>
> Still in the investigation process, I read some of the vhdl code found in
> snapshot_jws_30_07_2002.tar.bz2 and snapshot_yg_29_07_2002.tbz.
> I see no description for LSU and a very simple description for fetcher.
> What is the status of these unit?

not very well describe :)

> Where can I found some doc to start?
>

hum in the f-cpu manual there is some hint in the instruction description.

I'm study more deeply in this 2 very complexe area. I have now many ideas
to implement it. I should learn now how to make a clean brain dump.

Here is my "result".

LSU have many pitfall. multi-level of cache, virtual memory management,
DMA management, DRAM controler. I find many simple idea in the classical
book (from D.Patterson). Very simple and fast L1 cache, very
complexe/clever L2 cache. The problem is where to put DMA : between L2 and
L1 cache (so L2 is a part of the DRAM controler) or between L2 and the
controler or DMA could be part of a big block with L2 and the controler,
(IO didn't need to be cached but we should take care of aliasing).

L1 cache use Adresse Space Number fields to avoid cash flush in context
change. L2 are a big victim buffer to avoid L1/L2 data duplication. L2
could also be used as prefetch buffer. L1 has a very low access latency,
L2 must have a very low miss rate. L2 must be physicaly mapped to avoid
duplication in case of 2 process which map the same physical pages.

We could use fields in instruction to enable differente policy in L1 cache
(write allocate or not, write thought/write back). This policy could be
also influence by VM fields.

I have also think about F-bus2, which look like CAN bus. AMBA bus are very
complexe by dealing with split and retry so the idea is to split all
request, like for the CAN bus. It complexify a lot the "client" but is a
must to hide latency and maximise the use of the bus.

This is split in 2 part : a request bus which ASK for an adresse, and a
PUT to write a data to an adresse. So a PUT is like a write. A read need
an ASK then a PUT. You could imagine a lot of tricky system with it
because you could duplicate data, and a cache could say "i have it" before
that DRAM controler fetch the data.

Fetcher are connectected to L0 and L1.
* Historical hack : To accelerate jump there is a direct link between the
register number of the register set and cache line of L0. This is trivial
for Icache but there are strong alias issue with dcache.
* instruction format are not well defined today. There are many of them,
it should be analysed in the new manual.
* There are 2 kinds of instruction scheduler : one using a fifo to
calculate  when will arrive the result to avoid conflict, one simpler
which freeze the shortest pipeline in case of write conflict.
* There is no reoder buffer but exception must be thrown in order so there
is a common pipeline for all instruction to calculate it. This is a main
problem for floating point unit which will be very slow or non IEEE
compliant.

Regards,
nicO

> greets,
> --
> Pierre
> *************************************************************
> To unsubscribe, send an e-mail to majordomo@seul.org with
> unsubscribe f-cpu       in the body. http://f-cpu.seul.org/
>

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/