[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] F-CPU fetch unit



hi,

Bogdan Petrisor wrote:

hi, sorry for the delay, there were some crazy days at work...


no worries here.
look at what i've done recently :
http://whygee.club.fr/drosephylia
http://whygee.club.fr/LiIon/pack_gateway/index.html
http://whygee.club.fr/LiIon/references/index.html
http://whygee.club.fr/LiIon/pack_ultraportable/index.html
(end of french shameless autopromotion)

ok, so let's see if I got this straight.
The fetcher has N_LINES independent lines. The PC is output from the fetcher and it represents the
virtual (or physical) address of the currently delivered instruction. Each one of the lines can
get an instruction from the cache based on the page address returned by the TLB and the internal
counter from each line. Any loop, call, or jump is marked in the responsible line and that line
should keep the specific physical addres for a while.

you seem to have understood the general picture, bravo :-)

The line is flushed after a timeout or is
there a specific instruction for that?

it works like a cache line : Least Recently Used replacement policy (LRU).
This is less complex than for a cache because of the small number of lines (8 lines => 8x 3 bit counters)
[Cache memories are too large to handle large counters so they simplify the mechanism but lose predictibility]
This way, when for example more than 8 loops are nested, the outer loop's starting instructions
are flushed from the Fetcher. The inside loop runs more often, so it minimizes penalties.


Other processors contain a call stack and other structures,
but FC0 implements it in a more "visible" and flexible way.
you can "return" from a function without needing an explicit "return" instruction (conditional jumps work !)
However, teaching compilers how to use that is not the easiest part.


Also it seems that there is a need to sometimes instruct the fetcher to begin fetchig from an
address that is in the register set. Is this right so far?


right.
This happens so that said register is "associated" or "mapped" to a Fetcher's line.
The allocation is quite complex because all the above feature work together at the same time.
unwanted side effects will maybe prove difficult to remove.
But this latency must be hidden by an explicit prefetch.


The Fetcher does not replace entries in the TLB.
It fills the cache and its lines with data that belong to the current process, as indicated in the TLB.


yes, and the TLB is controlled by the OS. So basically the fetcher only has a red interface to the
TLB?


red ? you mean : read ? if so, right, it's obvious.
I don't see the need for the Fetcher or the LSU to write to the TLB
(except for updating some statistics like average use of a page for
profiling or dynamic Virtual Memory optimisation).

I also imagine that the kernel should have R/W access to the TLB,
so the TLB state can be saved for later reuse when a context switch happens.

The "execution units" (in the "execution pipeline") are quite straightforward to design,
they have an obvious interface (data in and out, plus all the necessary flags) and they
can stand alone. However, as you see, the rest is not easy at all, because of their interactions
and collective work.


But the execution units are not all finished.


hey, somebody's got to start the hard part eventually :)


That's one reason why i started to play with the VSP :
the core is much more simpler than FC0 but the memory interface
is quite similar and keeps most complex features.

read you soon,
YG

*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/