[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [f-cpu] CAS in FC0
Michael Riepe wrote:
> On Tue, Mar 19, 2002 at 03:29:28AM +0100, Yann Guidon wrote:
> > in fact, the real problem is to think of CAS as an "instruction".
> > Things become _so_easy_ if we use a combination of instructions !!!
> > We do not need any "CAS instruction" but a variant of the usual
> > load and store instructions, just like the locked versions in ALPHA
> > (and other CPUs).
> > * "load locked" will "tag" the line it selected for reading. it's just a bit
> > that the issue logic has to set after decoding the instruction.
> Or a bunch of bits (e.g. one per byte, or maybe 64-bit word, inside the line).
maybe 64-bits, but not bytes. byte-wise "dirty" bits are already heavy enough :-/
Plus, if the narrowest external interface is 32-bits, then the LSU needs
only 32-bit-wise dirty bits. Write enables will still be byte-wise, however,
because we can't always avoid byte operations.
> > What about the scheduling ? it's almost as fast, if there are all the
> > bypass networks !
> > load_tag [r1],r2
> > xor r2,r3,r4
> > if r4==0 store_locked [r1],r3
> > (or something like that)
I should have been more precise :
it is as fast (in terms of cycle count) as a "CAS" instruction
that does everything itself. This is not because there are more
instructions that the overall operation is slower : because all
units communicate through the "Xbar", issuing the right instructions
at the right time does the same thing. More specifically :
*** at least 1 unused cycle in FC0 ***
if r4==0 store_locked [r1],r3
if you remember the last email, i pointed that there is a little
problem with the hypothetical CAS instruction : the loaded data
needs an additional cycle (alignment/shift/endianness) and the
data from the register set is already in ROP2 (or whatever) when
the loaded data finally arrives on the Xbar. Inserting a buffer
on one of ROP2's operands is not a solution.
The problem disappears by itself when we use the split version :
the loaded data and the register (for comparison) appear at the
same time on the Xbar and ROP2 needs no modification, there
is no bizarre scheduling rule to implement. However, ROP2 takes
at least 1 cycle before completion and the data must be ORed
so we know whether the result is zero. The OR is performed
during the register write-back cycle and influences the decoding
and issue of the next instruction, so i think that the gap
is around 3 cycles between the xor and the store.
> If a task switch (or IRQ service) occurs between load_tag and
> store_locked, the tag may change behind the program's back (it might
> be cleared and set again - the ABA problem). In order to avoid that,
> a task switch should reset all tags.
i get it, it sounds obvious...
> Oh btw: store_locked must return an indication that the store succeeded
> (that is, it's a 3r1w instruction), and of course it has to clear the tag.
as i wrote before, _all_ stores clear the tag, so it's not a problem :-)
Concerning the tag write : i should reread my post more before hitting "send".
I still have to make clear how we are going to support CAS from a
remote CPU throught the I/O interface. It's not as easy as some people
However, i might have "yet another brilliant idea" which will help
solve some remaining problems... but you'll have to wait a bit,
i want to be _really_ sure that i'm not writing or thinking shit.
who knows ;-) However i hope i reassured Christophe about my
"understanding" of the problem. I wish i don't look too much displaced
and off-topic as i was before, according to some of his past posts :-))
I wish that some people now understand that if i seem to contradict
them, it's not because i'm hiding in an ivory tower. I can find some
solutions and compromises but it can only happen if the others have
the same desire and patience.
> Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
To unsubscribe, send an e-mail to email@example.com with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/