[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rep:[f-cpu] Hot issue : external LSU ?



* 'CAS' can be written with a pair of 'locked load/store', very easily.
- read a value with 'locked load' and compare if  it is matching with old
value, if not CAS fails.
- store the new value with 'locked store', if it fails, CAS fails too.
- elsewhere CAS succeeds.

* 'CAS2' is a little bit more difficult to do but it can. But If you want a
good one you must have a pairable 'lock load/store' (ll/sc, llp,scp), but i
think very few people want to enter in such consideration, because it is
quite difficult to handle in hardware :

'll' only set only the first lock bit.
'llp' set a second lock bit.
'scp' checks the both lock bits are set and clears the second lock bit.
'sc' checks the first lock bit is set and the second lock bit is cleared,
then clears the first lock bit.

As you can see, we need a way to link two lock bits together whereas they
are not in the same entry.

Mandatory order of execution :
ll -> llp -> scp -> sc.




----- Original Message -----
From: "Nicolas Boulay" <nicolas.boulay@ifrance.com>
To: <f-cpu@seul.org>
Sent: Friday, August 30, 2002 7:20 PM
Subject: Rep:[f-cpu] Hot issue : external LSU ?


i will reread this carefully later but juste one question : Does
lstore/cload could be replaced by CAS and CAS2 ?

nicO

-----Message d'origine-----
De: "Christophe Avoinne" <christophe.avoinne@laposte.net>
A: <f-cpu@seul.org>
Date: 30/08/02
Objet: [f-cpu] Hot issue : external LSU ?

Humm, well this discussion seems to be very hot.

The best I can do is to expose better the different ways i saw (they
must or
not have their drawbacks) :

First :
------
The locked load/store cannot be shared with the same instructions of
normal
load/store. Why ? because lstore is not a simple conditional store, we
really need to catch the test result into a register to check if a
locked
write occurs because numerous algorithms needs this information after
executing a locked store. So you must forget the idea to put the locked
load/store in a conditional load/store format.

Second :
---------
Most of data don't need to be shared between CPU, so normal load/store
without any synchronisation can be done at full speed. In fact, it is up
to
the software programmer not to abuse locked load/store operations, since
there is no real solution to speedup them (they would always be slower
than
normal load/store operations).

---

In the case of a uniprocessor :
------------------------------

locked load/store can be done easily in the LSU with a bit acting like a
token. The locked load put a token in a LSU entry, the first locked
store
must be the first to take the token in the LSU entry. It should be easy
enough. The only trouble is that you cannot handle an array of lock that
way
because of the byte-width of LSU entry (how many bytes does an LSU entry
represent ?) the fastest way : if i 'locked load' two contiguous words i
would in fact set the token in the same LSU entry. So if I 'locked
store'
the two contiguous words, the last one would fail (not what we expect in
fact, but it will just slowdown). But if you array of large node with a
lock
word well separated, this trouble should disappear.

I can see some error about using a separate address space for locking.
You
should not read 'locked load/store' as a semaphore 'acquire/release',
which
is not exactly the purpose of CAS and CAS2. Just consider an atomic
stack,
you want to push into or pop element from a stack atomically. You just
need
to change the top pointer with a CAS (so the need for 'locked
load/store' to
be able to access the same space address), instead you need to acquire
semaphore first, then modify the top pointer then release semaphore,
which
gives us not exactly the same behaviour (blocking solution).

Another problem, just imagine you need to update an array entry in a
user
array. This array is being shared. It could be used a locked load/store
to
be sure that no other cpu or task is doing something else with the same
entry meanwhile (very fine-grained synchronisation). Using a semaphore
would
force to associate one semaphore per entry...

So, please don't confuse 'locked load/store' with semaphore concept and
don't think using separate address space is the solution for 'll/sc'
counterpart. Your acquire/release suggestion, say,  is another solution
for
another locking purpose.

---

In the case of a multi-processor :
---------------------------------

having an LSU for each processor leads to a coherency problem. To share
locked entry in each LSU, is not a good idea, especially for CPU which
never
access those locked memory places. besides, it is difficult and slow to
propagate such infos between LSUs.

It is why i was wondering if using an external LSU shared for all CPU
could
be a solution. You must see it just as a suggestion that can be down or
improved.

Two cases:
- locked entries are kept both in internal and external LSUs;
- locked entries are only kept in external LSU;

I don't think coherency between internal and external LSUs is a real
matter
(I may be wrong).

locked entries are kept in both LSUs :- normal load/store don't bother
with
external LSU. Internal LSU accesses directly the memory (we can expect
it is
what the software programmer will use most time).
- locked load sets a lock into internal (for intra-cpu locking) and
external
(for inter-cpu locking) LSU entries, no matter their contents.
- locked store checks this lock into internal (for intra-cpu locking)
AND
external (for inter-cpu locking) LSU entries.

Having this lock bit in internal LSU allows us to remove the necessity
to
have an external LSU for uniprocessor (just need cpu intra-locking), so
external LSU is just an option to have inter-cpu locking capability.

If an external LSU is not present :
- locked load sets a lock into internal (for intra-cpu locking).
- locked store checks this lock into internal (for intra-cpu locking).

locked entries are only kept in external LSU :

- normal load/store don't bother with external LSU. Internal LSU
accesses
directly the memory (we can expect it is what the software programmer
will
use most time). In fact internal LSU has no locked LSU entry
(insensitive to
locked load/store).
- locked load/store always operate on external LSU entries instead of
internal LSU entries.
- locked load sets a lock into external (for intra/inter-cpu locking)
LSU
entry.
- locked store checks this lock into external (for intra/inter-cpu
locking)
LSU entry.

You need to have an external LSU even for a uniprocessor.

An external LSU entry is really shared between CPU without duplicata, so
there is no coherency problem.

A mixture :
-----------
lock load/store can have a mixture : an only intra-cpu locking (only use
internal LSU), an only inter-cpu locking (only use external LSU) or both
intra/extra-cpu locking just using suffixes to do so. That way you can
even
use only intra-cpu locks for threads in a multiprocessor for faster
execution than usual intra/inter-cou locks (i'm thinking about locks
which
are only relevant for threads of a same process in one cpu only ).

As a result :
------------

The main idea behind the external LSU is to prevent from normal
load/store
to be dependent of a global locking. An application should use mostly
this
solution. Threads in a process could use intra-cpu locked load/store if
necessary. Intra/inter-cpu locked load/store would only be used if
coherency
between CPU is needed. Inter-cpu locked load/store can be used for
situation
we know there is only one task to access but you need coherency amongst
cpus.

That's all folks.






*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/


____________________________________________________________________________
__
Pour mieux recevoir vos emails, utilisez un PC plus performant !
Découvrez la nouvelle gamme DELL en exclusivité sur i (france)
http://www.ifrance.com/_reloc/signhdell

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/