[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] virtually or physically-addressed cache ?



hi,

Marco Al wrote:
> From: "Christophe" <christophe.avoinne@laposte.net>
> > Virtual or physical addressing ?
> > -------------------------------------------
> > (1) virtually-addressed caches (virtual tags)
> >
> > + do address translation only on a cache miss
> > + faster for hits because no address translation
> 
> Another plus to this method, its the only way to go with software managed
> address translation AFAICS.

?????

Currently i plan and design a physically-addressed cache (2).
It adds a translation cycle but i don't see any other problem.
And since it is pipelined and the pipeline splits the operation
(access and addressing), the TLB lookup is not inside the visible
software latency (unless you program badly).

i do not understand the (3) version.
I prefer to stick with (2) because in case of simultaneous access
of the memories by different devices, the overhead is reduced.
More precisely, the memory hierarchy in a multi-CPU system is
globally/physically addressed but distributed among the CPU :
each CPU has a "private" RAM range (allocated during bootup)
and when it wants to access a location outside of its private range,
it goes through the interconnexion network.

The CPU can thus be seen as a "node" where there is a concentration
of several data flows : interco, L1, L2, private RAM, CPU and even
(why not ?) HDD interface or some kind of I/O. All these flows must
work with the same addressing system or it is going to introduce
problems.

In the SMP (usually, dual-CPU) PCs, it makes sense to have snooped
caches because one CPU can be master of the SMP bus at a time. But
when there are tens or thousands of CPUs (some HPC monsters have 10K
CPUs), it is not possible to have MESI-like protocol because there is
no bus to snoop ! A bus is impossible to setup : we need a "network",
with packets containing the destination address in the header,
so we can route the packet through the network. Usually, it is a "flat
tree" or "omega network" or "hypercube" or whatever.

When the CPU node receives a packet that requests a memory access
(read or write), the CPU knows that he is the destination because
otherwise, the packet would be routed elsewhere. Then it has to
know where the requested address is mapped : L1, L2, SDRAM ?
working with "physical" addresses removes a TLB and all the risks
of aliasing, as well as removing all need to take countermeasures.

Concerning the cost of physical addressing from the CPU point of view,
i think i have found a way to reduce the cost to this of flat addressing
(ie in "kernel mode" when there is no TLB access). This also simplifies
the pipeline scheduler because there is no need to modify the timing :
kernel mode and vritual-addessed mode take the same time.
The big disadvantage is that it works better with only one page granularity :-/

more on this later.

> Marco
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/