[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] LSU or cache L0



On Thu, 10 Jan 2002, nicO wrote:
> On of the idea to speed up memory access is the use of a kind of L0
> cache (called LSU unit by whygee).
> 
> It's a kind of associative memory like any cache. But here the idea is
> to cache REGISTER number with memory content bypassing the memory
> address. So in case of taken jump, the data is still there and access to
> memory could be hiden.
> 
> For program flot, there is no problem. But for data there is a very big
> one : aliases.
> 
> 2 differents register could point the same data location but the 2 could
> became uncoherent ! Wygee propose that each line of the L0 caches could
> be associated with 2 or 4 registers. But i thing it's not enough. I will
> introduice a very strong and dangerous coding rules : no more than 2 or
> 4 aliases ! Compiler writer will have headack to guaranty that !
> 
> One of the easiest way to manage this cache it a simple memory bank, 64
> line (one for each reg) of 2 lines of caches (so double buffering and
> prefetech could be done). It's only 2 Ko of memory, it's not a lot and
> we don't need too much access port on it ;p
> 
> But an other trick could be used. In the manual, we can read that each
> Load & store operation are made with "stream" number (3bits, 8 streams).
> It's " la" Cray. But without further explanation. 
> 
> In fact Cray computer are ncc-numa (nonr coherent memory access). So the
> data coherency should be handel by soft, it hard to manage it correctly
> but it speed up a lot the job.
> 
> So each stream aren't coherent between them. The order of the access to
> the main memory with different stream could exchange, invert and so one.
> We guaranty to the hardware that they will not have stupid thing as read
> after write to the same memory location (before caches are coherent, the
> load&store must compare all adresse to have incoherent behavior).
> 
> So in our case, instead of using 64 lines memory, we need only 8 lines
> memory (with longuest line if you want). So there is no more coherency
> problem. If compiler have problem with pointer analysis, it will use the
> same stream to avoid aliases problem.
> 
> It was for handel data. For program code, the previous trick could be
> used safely.
> 
> Comments ?

It looks like someone should setup some compiler directive paper.
The more complex the compiler will get, the longer you will wait
for one to come that is capable of doing all this stuff. A new
processor is nothing without the compiler/debugger support. The
lastest example was one coming from Germany called Hyperstone
which got mixed size instruction length thus being able to not
have a pipeline stall at context switch with just a 2 word cache.

JG

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/