[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SR [was:Re: [f-cpu] delayed issue]



hi,

nico wrote:

On Wed, 05 Mar 2003 17:12:28 +0100
Yann Guidon <whygee@f-cpu.org> wrote:
<...>

Sur, but in one side you have 6 gates thin multiplier and in other
side you try to put a 3r2w register bank of 64 entries in the same
slot size...

don't worry ....
the decoding logic (data ready, unit ready, etc.) will probably need some more pipe stages
in "high-speed" versions (where the pipelines are correctly sliced).
It's just a matter of splitting the stages correctly.
Remember, the first FC0-OOOC had no jump latency and now it it one.
If the register set is really slow, we can't do much better but it's
not worth
adding complex renaming buffers : the register's access time will not
be faster.
Enhancing (through longer pipeline) the decoder is a better solution.

Sur but then you will have jump penalty.

c'est les vases communicants ....
nothing is for free.
either you pipeline the decoding logic, you increase the core frequency
 and the peak performance at the cost of more jump/branch latency,
either you keep it simple and slow.
Now compare this to the many more cycles of penalty for other architectures.

Additionally, more complexity means more silicon area,
more dissipation, longer wires => more heat/dissipation,
more expensive and probably slower.
And control logic is certainly the least easy thing
to test in a chip. This is why i'm satisfied with
the current FC0.

not me :) Not when you saw the 3r2w regifile because of 1 or 2
instructions (like MAC). Not when you saw the mess of "special

register">>that should be memory mapped (with conditionnal memory
movemement like>>not buffered, if needed). Not when you see the
trap/expetion mess.>>

1) 3R2W is necessary also for load and store instructions, otherwise

it's not possible to perform pointer update in the instruction.>
Sur but, it's not really a true speed up and add a raw dependancies.

??????

oh, and what about your 'code density' argument ?
if you don't allow post-increment, then you need more instructions to compute the addresses.

Humm... i think about "true" 4r2w (on liw) but with 1r1w split register
bank. The first problem is the use of 3r2w reg bank but 90% of the
instruction are 2r1w.

but not all instructions have the same latency !
so there are many cases where a "long" instruction will prevent short ones to complete.
The 2nd write port is here to remove 90% of the compiler's pressure to detect when
this situation occurs.

2) If you map SRs to memory, you will face race conditions and synchronisation problems,
and protection will not be enforced on a register or register group
granularity basis.

It's exaclty the same problem for IO register, and it's soon solved.

SRs are not for I/O (because it would become a bottleneck).
The problem, hence the solution, is not the same.

The problem is buffuring anf ordering. Like for IO registers memory
mapped.

so if you want to buffer and order access to protection and configuration related resources,
how much silicon resources are you wasting for the buffers and control logic ?
unless you have already a complex OOOE core, it's not worth it. KISS.

There is no need for a specific buses, only the use of direct
addressing had an interrest.

???

Use set/get is like a load/store using direct adressing.

oh god ....

3) what mess ?

The kind of linked-list that can't take nest interrupt that you speak
about regularly. (shadow register could be far more easy...)

???

SR was just for some constant reading, then you use it for system
control(like for trap handling).

SR was never meant "only" for "reading constants".

Each time some dust are on the design
pchout it send to SR, like a wide carpet.
????

SR are slow because serialiased.

of course. these are critical resources that control the machine,
and it's not expected to be used every 3 instructions.
Now, if you wanted to make it "faster" then the core would become more
complex and never finished.

SR can't be preserve by context switch
because it sill be a mess. So only read is premit.
nope.
A few registers are preserved through SRB and a "manual" IRQ routine
can backup other things manually.


We could also put register trap pointer, TLB ... But register mapped
seems so easier (don't forget that SR are direct mapped).

do you mean that you want to have a couple of register, one being index and the other
data, only to access the SRs ?
it's useless because there are 2 forms of GET and PUT instructions,
one with a 16-bit index in the opcode and another where the index
is stored in a register.
Or there was a communication probkem.

nicO

nicO

YG

nicO

YG again

YG

PS: time to write VHDL again.
These sterile discussions annoy me...

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/