[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: spinlocks (was Re: [f-cpu] another DATE report)

On Mon, Mar 18, 2002 at 01:54:27PM +0100, Christophe wrote:
> > > Indeed, to avoid this problem when task switching, the usual workaround is
> to
> > > disable interrupts before acquiring and after releasing the spinlock. But
> such
> > > a thing is acceptable if time is very short between locking and unlocking,
> that
> > > is, not using a blocking function in-between.
> >
> > huh ? if CAS2 requires the task to turn off IRQs, then i don't see why it is
> superior
> > to CAS. Are you sure there is no such routine that does what you describe
> with a
> > series of simple CAS and without touching the IRQs ?
> !?!? what is this absurdity !? noooooooooo !!! I never told that !!!! Im not
> speaking about CAS2 !!!! but the SPINLOCK as a replacement for CAS2 as PROPOSED
> by M.Rieppe !!!!!!!

With a single "p", please. On the other hand, you may also call me Michael.

> Please reread above : "well, there is a problem with the spinlock version".  So
> the problem comes from spinlock not from CAS2 since i stated : "Such a thing
> cannot happen with CAS2".
> by the way, spinlock is a BLOCKING synchronisation and CAS/CAS2 is a
> non-blocking synchronisation. indeed, I explain how to make a spinlock with a
> CAS, that is turning a non-blocking synchronisation in a preemption-safe lock
> (if we disable interrupts too in the loop).
> in M.Rieppe version, no task switching must occur (because of the problem
> above) so unless using a more complicated preemption-safe mechanism, we disable
> all interrupts - even the IRQs which are not concerned by this synchronisation
> :
> void atomic_push (struct stack *stack,struct stack_node *element) {
>     disable_interrupts ();
>     while (!CAS(&stack->spinlock, 0, 1)); // unknown delay

With the CAS2 solution, the delay isn't known either -- you never know
how many times the CPU has to process the retry loop.

>     element->link = stack->top;
>     stack->top = element;
>     stack->version++;
>     stack->spinlock = 0;
>     enable_interrupts ();
> }
> In my CAS version, a task switch can occur without any problem and all IRQs can
> run freely too.

At the cost of having to retry the update operation if the CAS2 fails.

> In fact if we
> don't need a version stamping to prevent us from the ABA problem, a CAS would
> be enough and more efficient than a spinlock.

Depends. Consider the following examples:

	with_spinlock() {
		disable_irq_and_lock();	// may delay indefinitely long
		x = long_computation(x);

	with_cas() {
		do {
			oldx = x;
			newx = long_computation(oldx);
		while (!CAS(&x, oldx, newx));	// may repeat indefinite times

First of all, in with_cas() we silently assume that `oldx = x;' is itself
an atomic operation, and that the probably complex data structures pointed
to by `x' don't change while long_computation() runs -- which may not be
the case. That is, you may need to turn off IRQs and, in case you have an
MP system, put a spinlock around the loop body -- even with CAS (or CAS2).
And of course you must not schedule.

Second, the delay argument: with_spinlock() runs for t(lock) + t(compute)
+ t(unlock), where t(lock) is indefinite. with_cas() takes n * (t(compute)
+ t(cas)), where n (the number of tries) is indefinite. In the optimal
case (no lock/miscompare), n becomes 1, and the times are:

		t = t(lock) + t(compute) + t(unlock)
		t = t(compute) + t(cas)

Assuming that t(lock) >= t(cas), the spinlock is slightly slower. But
that doesn't matter much if t(compute) is one or more orders of magnitude
bigger than t(cas).

If n equals 2, the times are

	    t = t(lock) + t(compute) + t(unlock)
		t = 2*(t(compute) + t(cas))

Assuming that all tasks perform similar computations, we get

	0 < t(lock) <= t(compute)

or an average of t(lock) = t(compute) / 2.  That is, the spinlock variant
may be faster in this case, because it checks the lock more frequently
than the CAS solution.

Note that this does not mean that I don't like CAS. Au contraire -- we
definitely need an atomic read-modify-write operation, and this one
looks pretty good to me, although it doesn't solve all problems.

CAS2, on the other hand, is too heavy for a hardware implementation.
We can't handle 6 input operands, no matter how hard we try.

 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/