[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] CAS in FC0



hello,

Christophe wrote:
> From: Michael Riepe
> > On Tue, Mar 19, 2002 at 03:29:28AM +0100, Yann Guidon wrote:
> > [...]
> > > in fact, the real problem is to think of CAS as an "instruction".
> > > Things become _so_easy_ if we use a combination of instructions !!!
> > > We do not need any "CAS instruction" but a variant of the usual
> > > load and store instructions, just like the locked versions in ALPHA
> > > (and other CPUs).
> > >
> > > * "load locked" will "tag" the line it selected for reading. it's just a bit
> > > that the issue logic has to set after decoding the instruction.
> > Or a bunch of bits (e.g. one per byte, or maybe 64-bit word, inside the line).
> >
> > > * "store conditional" will proceed if two conditions are met : the specified
> > > condition is true (we can check for LSB, MSB, zero...) and the "tag" is still
> > > set.
> 
> Which condition ? your tag should tell us if someone else has written to this
> line before your store conditional. So there is no condition to test
> beforehand.

you forget the purpose of the comparison. On monday, you wrote :
> - a single word compare-and-swap : cmpxchg reg1,[reg2]
> 
> atomic {
>     if (eax == *reg2) {
>         *reg2 = reg1;
>         zf = 1; /* zero flag */
>     } else {
>         eax = *reg2; // update the requested value since it has changed
>         zf = 0;
>     }
> }

so the store to memory is conditioned by the result of the comparison
AND the integrity of the dirty flag.
Unless i don't understand your C code.

> Your tag is set with the instruction "load tag", but when someone write at the
> same word, you must clear this tag so the "store conditional" would be warned
> that someone else takes priority. Any "store" (conditional or not) at this word
> must clear this tag.
yes, that's it. but the store for this case has to check 2 conditions,
instead of only one.


> > > I often wondered if it was worth it to add a conditional store, but now
> > > i'm convinced. Except that we need a version that checks the "tag".
> 
> I'm pleased to hear you :). What you are speaking about is in fact "ll/sc"
> (load-linked/store-conditional), what i spoke about in the french mailing-list.

I was thinking aloud about the rationale Intel gave about
the agressive predication in the IA64 family.
They say that it reduces the pressure on the loops (no need to
unroll), but the prolog and epilog require some special treatment.
Having a conditional store simplifies the design of the prolog and epilog
(just use the loop counter as a condition for the store), because
it removes the risk of triggering traps or page misses at the extremities
of the loop.

OTOH, if someone uses post-incremented instructions, there is not
much risk either but the increment field is the same as the condition,
so we can't use them at the same time. But it's useful anyway in
some "tight" cases (reduces the need for jumps)

Using a more sophisticated conditional store makes the initial idea
plausible and even useful.

> > > What about the scheduling ? it's almost as fast, if there are all the
> > > bypass networks !
> > >
> > >   load_tag [r1],r2
> > >   xor  r2,r3,r4
> > >   if r4==0 store_locked [r1],r3
> > >
> > > (or something like that)
> 
> And store_locked should return the tag in a register to know if store has be
> done. This register can be zero if tag is clear (someone else has already
> modified our word).

yup. However, notice that most fields in the locked store are used :-/

> how to implement a software CAS :
> 
> // int CAS (int * /* pointer */ r1,int /* old value */ r2,int /* new value */
> r3)
> loadaddr 0f,r5
> 0:

this can be also written as :
   loopentry r5

> move r2,r4
> load_tagged [r1],r2         // r2 : read value and set tag for [r1]
> xor r2,r4,r6
> jump.nz r6,r0,r5               // the value is not that we expect !

oops, the pointers are in the middle, so it should be
  if r6==0 jump r5; /* YG "clear" syntax */

There is also another form : loop is more useful for creating
a timeout, with other parameters.

> store_tagged r3,[r1],r6   // r3 : value to write into [r1] if tag still set
> (tag is cleared afterward)
> jump.z r6,r0,r5                 // tag was cleared before our store !
so r6 is the result ?
And you spare a conditional store by inserting another jump
in the middle of the loop ?

> nxor r2,r0,r3                     // r3 != 0 if [r1] == r2 until writing r3
> into [r1]
what is this instruction ?

> jump r0,r63                     // return to caller with result in R1
> 
> how to implement a PUSH :
> 
> loadaddr 0f,r5
> move r2,r3 // our node address to push
> 0:

idem : "loopentry r5" is here for this purpose.

> load_tagged [r1],r2       // read top
> store r2,[r3]                    // node->link = top
> store_tagged r3,[r1],r4 // top = node
> jump.nz r4,r0,r63             // return to caller if ok
> jump r0,r5

mmm to me, it looks more like an element that is inserted
in a linked list, but who cares, it seems to do the job.

> > If a task switch (or IRQ service) occurs between load_tag and
> > store_locked, the tag may change behind the program's back (it might
> > be cleared and set again - the ABA problem). In order to avoid that,
> > a task switch should reset all tags.
> 
> Yes, that's the main backdraft.

but there is certainly a nice solution.

WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
np : Bill Laswell / Jah Wobble / Nils Peter Molvaer in RADIOAXOM.
wow maaan......
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/