[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] virtually or physically-addressed cache ?



hello,

Michael Riepe wrote:
> On Thu, Feb 28, 2002 at 03:09:23PM +0100, Christophe wrote:
> > Virtual or physical addressing ?
> > -------------------------------------------
> > (1) virtually-addressed caches (virtual tags)
> >
> > + do address translation only on a cache miss
> > + faster for hits because no address translation
> > - cache flushing on a context switch (example : local data segments will get
> > an erronous hit for virtual addresses already cached after changing virtual
> > address space, if no cache flushing).
> > - synonym problem (several different virtual addresses cannot span the same
> > physical addresses without being dupplicated in cache).
> Abgelehnt. This one causes severe problems.

however it is interesting to see that it is a characteristic divergence
between the SW and the HW guys.

> > (2) physically-addressed caches (physical tags)
> >
> > - do virtual-to-physical address translation on every access
> 
> Not necessarily. The TLB lookup can be started as soon as loadaddr (or
> one of its variants) is called, and doesn't have to be repeated in all
> cases (e.g. with a postincremented pointer, you'll perform a range check
> first and only do a full lookup if the range check fails).

right.

> > - increase in hit time because must translate the virtual address before
> > access the cache
> See above - the LSU may cache the results of the TLB lookup.

The caching does not reduce the latency, so you can be proved wrong.
However, you are right in saying that the LSU caches the TLB lookup :
the Load/Store Unit has to store both the physical and logical addresses
of each line.

But one can remark that the logical and physical addresses differ only
by the MSB, so we only store the diverging parts. And this part is
necessarily also present in the TLB ! This means that there is
a link of some sort between the LSU and the TLB, both reducing
the amount of memory (and address comparators) and the latency.

--------------------------------------------------------------------------------
Here i will explain how the LSU and the TLB work hand in hand, both
in physical and virtual mode.

I will assume (for simplicity) that we have a 4KB page system only
(it's only for the explanation, because several page sizes are
wanted).

When a new address is available (from the Xbar or the ASU output),
it is sent to both the LSU and the TLB.

The new address (physical or virtual does not matter, with this scheme)
is split into 2 parts : the index in the page (a 12-bit number) and
the page number (the rest of the address field, which is configurable
and implementation-dependent [you can configure it to 16 bits, 32 bits or
anything you need]).

The index is truncated because the LSU lines are 256-bit long, only 7 bits
are kept. This part of the address is compared with the corresponding parts
in the LSU (this means : the 8 lines). This is a rather simple part, no ?

Then it becomes a bit more complex. The remaining part of the address is
sent to both the LSU and TLB for comparison.
 - For the LSU lines, it is simple : if there is a hit on both the LSB
   (compared above) and the MSB, there is a hit on the line.
 - For the TLB : if there is a hit on a TLB entry, then we have to know
   which LSU line is "connected" to this entry. There are 8 flags,
   one for each LSU line. Each of these flags will validate the corresponding
   LSB hit on the LSU lines.

Finally, there is a MUX that selects the "hit" output, depending on the operating mode
(virtual or physical addressing).

You understand that where several page sizes are necessary, the mechanism
is a bit more complex, but it's still feasible (but at what cost ?)

I should make a drawing, now.
--------------------------------------------------------------------------------

> > + no cache flushing on a context switch
> > + no synonym problem (several different virtual addresses can span the same
> > physical addresses : a much better hit ratio between processes)
> Looks fine.
these are also important factors for me.

<snip>

> > Conclusion - which one is better ?
> I prefer (2), for the moment.
i'll stick with that, too.
but of course, this could evolve at an unknown rate.

>  Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/