[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] LSU or cache L0

On of the idea to speed up memory access is the use of a kind of L0
cache (called LSU unit by whygee).

It's a kind of associative memory like any cache. But here the idea is
to cache REGISTER number with memory content bypassing the memory
address. So in case of taken jump, the data is still there and access to
memory could be hiden.

For program flot, there is no problem. But for data there is a very big
one : aliases.

2 differents register could point the same data location but the 2 could
became uncoherent ! Wygee propose that each line of the L0 caches could
be associated with 2 or 4 registers. But i thing it's not enough. I will
introduice a very strong and dangerous coding rules : no more than 2 or
4 aliases ! Compiler writer will have headack to guaranty that !

One of the easiest way to manage this cache it a simple memory bank, 64
line (one for each reg) of 2 lines of caches (so double buffering and
prefetech could be done). It's only 2 Ko of memory, it's not a lot and
we don't need too much access port on it ;p

But an other trick could be used. In the manual, we can read that each
Load & store operation are made with "stream" number (3bits, 8 streams).
It's " la" Cray. But without further explanation. 

In fact Cray computer are ncc-numa (nonr coherent memory access). So the
data coherency should be handel by soft, it hard to manage it correctly
but it speed up a lot the job.

So each stream aren't coherent between them. The order of the access to
the main memory with different stream could exchange, invert and so one.
We guaranty to the hardware that they will not have stupid thing as read
after write to the same memory location (before caches are coherent, the
load&store must compare all adresse to have incoherent behavior).

So in our case, instead of using 64 lines memory, we need only 8 lines
memory (with longuest line if you want). So there is no more coherency
problem. If compiler have problem with pointer analysis, it will use the
same stream to avoid aliases problem.

It was for handel data. For program code, the previous trick could be
used safely.

Comments ?
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/