[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] Preemption-safe locking and non-blocking (lock-free) algorithms



Please read this attached PDF file if you want have a global sight of what are
preemption-safe locking and non-blocking algorithms.

Some notes about the ABA problem : if a process reads a value A in a shared
location, computes a new value, and then attempts a compare-and-swap operation,
the compare-and-swap may succeed when it should not, if between the read and
the compare-and-swap some other process(es) change the A to a B and then back
to an A. A common solution is to use a version stamp, that is a counter of
modification along with the data to be modified. It is the reason why I use it
in my example of push and pop functions in my last email.

But if you use a stack for free elements for example (i.e, contents of element
don't matter), a simple CAS for "top" is enough in so far the ABA problem
doesn't invalidate.

Intel pentium have :

- a single word compare-and-swap : cmpxchg reg1,[reg2]

atomic {
    if (eax == *reg2) {
        *reg2 = reg1;
        zf = 1; /* zero flag */
    } else {
        eax = *reg2; // update the requested value since it has changed
        zf = 0;
    }
}

As you can see, it chooses in fact to update either the memory location when
the requested value matches or update the requested value if not so you don't
need to reread it for the next try and save a read memory access bu reusing the
one in the READ-MODIFY-WRITE operation :

    if requested value is equals to memory content

so an atomic push function for free integer location would be shorter because
we don't need to reread requested value :

void atomic_push (int *top,int *node) {
    // we suppose %top and %node are registers
    register int requested_top asm ("%eax");
    asm {
        mov            (%top),%eax    // requested_top = *top;

0:     mov            %eax,(%top)    // *((int **)node) =  requested_top
        cmpxchg    %node,(%top)
        jnz    0b
    }
}

but a spinlock would be if used in a kernel mode (in a user mode, it is
impossible to disable interrupt so we would need to enter a kernel trap) :

void spinlock (int *pointer) {
    // we suppose %pointer is a register
    register int requested_value asm ("%eax");
    register int new_value;
    register int eflags;
    asm {
        pushfd
        mov            $1,%new_value
        cli                                            // disable interrupts
0:     mov            $0,%eax            // requested_value = 0;
        cmpxchg    %new_value,(%pointer)
        jnz    0b
        popfd                                    // reenable interrupts
    }
}

- a limited double word compare-and-swap : cmpxchg8b [reg1]

atomic {
    if (eax == ecx && edx == ebx) {
        *(reg1+0) = ecx;
        *(reg1+1) = ebx;
        zf = 1; /* zero flag */
    } else {
        // update the requested values since they have changed
        eax = *(reg1+0);
        edx = *(reg1+1);
        zf = 0;
    }
}

so an atomic push function would be shorter because we don't need to reread
requested value :

struct stack { struct stack_node *top; int version; };

void atomic_push (struct stack *stack,struct stack_node *node) {
    // we suppose %stack and %node are registers
    register int requested_top asm ("%eax");
    register int requested_version asm ("%edx");
    register int new_top asm ("%ecx");
    register int new_version asm ("%ebx");
    asm {
        mov            0(%stack),%eax  // requested_top = stack->node;
        mov            4(%stack),%edx  // requested_version = stack->version;
        mov            %node,%ecx

0:     mov            %eax,(%node)    // node->link =  requested_top
        lea              1(%edx),%ebx    // new_version = 1 + requested_version
        cmpxchg8b (%stack)
        jnz    0b
    }
}

- a word fetch-and-add : xadd reg1,[reg2]

atomic {
    eax == *reg2;
    *reg2 = eax + reg1;
}



What could be my proposal for a modified CAS :

cas r1,r2,[r3] :
IN :
    r1 : requested value, r2 : new value, r3 : pointer
OUT :
    r1 : new requested value, r2 : 0 if CAS successes

READ:
   - read the actual value in memory location.
MODIFY:
   - if the actual value equals to the requested value, set the requested value
with the new value else set the requested value with the actual value
   - new value is set to true if it equals to requested value.
WRITE:
   - write the requested value in memory location.

atomic {
    *pointer = (requested_value = ((actual_value = *pointer) ==
requested_value) ? new_value : actual_value);
    new_value ^= request_value; // new_value set to 0 will tell us that CAS
successes
}

which looks as if you do :

atomic {
    if (*pointer == requested_value)
        *pointer = requested_value = new_value;
    else
        requested_value = *pointer;
    new_value ^= requested_value;
}

That way, we do not need to reread the requested value. The new value if not
changing between retries can be saved in another register :

        move          top,r3
        move          node,r4
        load            [r3],r1        // requested_top = *top;
        loadaddr    0f,r16
0:     move          r4,r2
        store           r1,[r3]        // *((int **)node) =  requested_top
        cas              r1,r2,[r3]
        jump.nz       r2,r0,r16



*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/