[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Rep:Re: [f-cpu] CAS in FC0



-----Message d'origine-----
De: "Christophe" <christophe.avoinne@laposte.net>
A: <f-cpu@seul.org>
Date: 21/03/02
Objet: Re: [f-cpu] CAS in FC0


----- Original Message -----
From: Michael Riepe <michael@stud.uni-hannover.de>
To: <f-cpu@seul.org>
Sent: Thursday, March 21, 2002 12:05 PM
Subject: Re: [f-cpu] CAS in FC0


> > > One more point to consider: Do we really need an extra load_tag?
> > > Tagging could as well be done for all loads.
> >
> > It is what I said to Yann. If we have no equivalent of llp/scp, I
suppose
we
> > don't need an extra load_tag in fact.
> >
> > One pdf file si speaking about a load conditional, that is a load
which can
> > fails if locked (like our store conditional). I don't know the
advantage. I
> > should reread this pdf.
>
> Me too.
>

Here is the extract where I heard about a load conditionnal :

---
Hardware Contention Control

As a further extension, a processor can provide a conditional load
instruction
or Cload. The Cload instruction is a load instruction that succeeds only
if the
location being loaded does not have an advisory lock set on it, setting
the
advisory lock when it does succeed.

With Cload available, the version number is loaded initially using Cload
rather
than a normal load. If the Cload operation fails, the thread waits and
retries,
up to some maximum, and then uses the normal load instruction and
proceeds.
This waiting avoids performing the update concurrently with another
process
updating the same data structure. It also prevents potential starvation
when
one operation takes significantly longer than other operations, causing
these
other frequently occuring operations to perpetually abort the former. It
appears particularly beneficial in large-scale shared memory systems
where the
time to complete a DCAS-governed operation can be significantly extended
by
wait times on memory because of contention, increasing the exposure time
for
another process to perform an interfering operation. Memory references
that
miss can take 100 times as long, or more, because of contention misses.
Without
Cload, a process can significantly delay the execution of another
process by
faulting in the data being used by the other process and possibly
causing its
DCAS to fail as well.

The cost of using Cload in the common case is simply testing whether the
Cload
succeeded, given that a load of the version number is required in any
case.

Cload can be implemented using the cache-based advisory locking
mechanism
implemented in ParaDiGM [8]. Briefly, the processor advises the cache
controller that a particular cache line is ``locked''. Normal loads and
stores
ignore the lock bit, but the Cload instruction tests and sets the
cache-level
lock for a given cache line or else fails if it is already set. A store
operation clears the bit. This implementation costs an extra 3 bits of
cache
tags per cache line plus some logic in the cache controller. Judging by
our
experience with ParaDiGM, Cload is quite feasible to implement.

---

Ok, so using a load conditional allows us to know if someone is already
using
our memory place, it seems good between two processors, one can wait for
the
other to end but for different tasks in the same processor ?

task A -: load_locked [r1],r2,r3 ; ok [r1] is not locked so we lock it
<<< task switch >>>
task B -: loopentry r4
task B -: load_locked [r1],r2,r3 ; failure ! some else is using it
task B -: if r3 == 0 jump r0,r4

I dislike it.

In fact to handle it well, we would need two tags : one for a lock
inter-processor and another one for a lock intra-processor.

The load conditional will test the inter-processor lock and set both
inter and
intra-processor locks.
The normal load just sets the intra-processor lock
The normal store clears both inter and intra-processor lock
The store conditional will test both inter and intra-processor locks and
clears
both

I'm not sure of what I'm speaking about :) so don't flame me ;)


>>> I don't like to count one the memory subsystem to handel state  of
some memory place. Usually, the VM is used to hgandel it. But it's too
slow most of the time. If we use it, it will be diffcult to implement it
with a NUMA système (for exemple, with 2 fcpu and each have it's own
private DRAM). In that case, we will need some wire on the bus to give
the information to the other système. We could implement such things in
wishbone BUT it will become impossible to create bridge with ohter bus
which didn't have those line.

With CAS cycle (CAS2), it's more easy because the bridge have to lock
the bus to made it's 2 (4) transfers. 
nicO



*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

 
______________________________________________________________________________
ifrance.com, l'email gratuit le plus complet de l'Internet !
vos emails depuis un navigateur, en POP3, sur Minitel, sur le WAP...
http://www.ifrance.com/_reloc/email.emailif


*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/