[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] project sources and report



Pierre Tardy wrote:

http://fc.isima.fr/~tardy/fcpu

still reading the report in PDF.

- check the page numbers when they are used as a reference. i've seen "see chapter XX"
and on page 16 : "figure ?? page ??"

- s/prefetcher/fetcher/ :-)

- around page 19, some interesting thoughts about
register/LSU line associativity. Concerning the pointer copy issue,
my basic idea is that the core is already complex enough
and it would be hard to implement something reliable that
would guess if the LSU line association is copied. Think
for example about the special cases to handle in the case
of an ongoing SRB, if a "new" task copies data from a
register belonging to an "old" task : crash garanteed.

there is a simple solution however : IIRC there is an instruction
that adds an immediate to a register that is believed to be a pointer.
the "load/store" instructions already perform the read_pointer-add-check_TLB-validate
operations but the same pointer is both source and destination.
i have to check if we have included the "pointer add" instruction,
and in this case, adding 0 to the source will indeed copy the pointer,
and the usual mechanisms will check the result (taking care of any race
condition, such as the line in invalidated just after the source pointer is read).

It is a compromise between performance, ease of development/simplicity
and scalability : the following F-CPU cores could implement completely different
methods to handle memory and pointers, so we must take care not to
include features that could be difficult to implement on other kinds of cores.

Adding pointer-specific instructions is a good example of the difficult
choices that we have to make : for example, adding them will probably
be useful to make fast and relatively simple cores in the futures, that won't
have to keep track of whether a register is a pointer or not.
OTOH, including a full support for these instructions in compilers
can be difficult and if it's not done, the instructions have no reason
to be implemented in the core (typical chicken-and-egg problem).

- you could/should have spoken a bit about the case where
data resides in I-cache. Ok, it is often the case for self-modifying-code
and "SMC is bad" but it also happens in trap handlers, where the
handler has to know what type of instruction triggered the error,
hence the ability to read and write in I-cache . In FC0, this is
done by routing the specified cache line through the common bus
shared by the LSU, the Fetcher,
the I-Cache, the D-Cache and the external memory interface.
Of course there's some penalty (several cycles) but it shouldn't
happen often anyway.

- Page 21 : the global architecture does not correspond to the FC0's
structure such as seen on page 4.
In particular you show 2-ported cache but in practice,
only one-port cache is implemented. In FC0, the only multiported memories
are the LSU and the fetcher. Since they are smaller,
they are faster and can have more ports.
They are connected to the core, to the L1
AND to the external memory : it's the "central hub".

Furthermore, the "L0" (Fetcher and LSU) act as a "line construction buffer" :
the external memory usually has 64 or 128-bit wide words, but L0 and L1
are typically 256 bits wide. In FC0, the I-L1 is connected ONLY to the Fetcher,
and the D-L1 is connected ONLY to the LSU, while page 21
shows a more hierarchic system.

Both FC0's L1 arrays are simple (standard) cache arrays with
only 1 read port and 1 write port, each 256-bit wide and in almost direct
view with the core. To fill a L1 line, the algorithm is simple :
the specified L0 receive data chunks from the external memory interface,
chunk by chunk, and assembles them in a free line (in practice, this data
is immediately needed so the line allocation is not a worry).
Once completed, the whole line is sent to the L1.
Notice that there is no word-wide validity tag, only line-wide tags.
The only validity tags are byte-wide in the LSU and 32-bit wide in the Fetcher.

This simplifies a lot the coherency management. On one hand, a whole
line must be transfered through the memory interface (except if non-cachable
locations are raed). This can be seen as a overkill but most computers
behave like this and we don't have to manage the case where a line is parially
valid, which becomes a nightmare in a multi-CPU computer.

An important point to remember : the LSU and Fetcher ALWAYS
contain the most up-to-date version of the data they hold.
They are in direct contact with the core and are full-width
in order to remove any worry about validity and width.
It also allows the LSU to read and write I-Cache and eventually
update the Fetcher.

- page 23 : "Use the LOOP, Luke !"
cmpg/jmpnz is an incredible overkill when the loop count is known !

- page 26 : the [2] link is wrong :-P
replace with http://f-cpu.seul.org/whygee/pres-isima/f-cpu_isima.html

I have not dug further,
and it's time to eat.
Have fun,
YG


*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/