[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] delayed issue



hi,

cyrano@nerim.net wrote:

Yann Guidon <whygee@f-cpu.org> a écrit :

hi !

nico wrote:

oops sorry i used the wrong address !

http://www.ai.mit.edu/projects/aries/Documents/Memos/ARIES-09.pdf
(it speaks about coffee)

I should watch.

you'd better do it now :-)

Simplicity also has advantages that silicon efficiency on its own
does not; simpler architectures are faster to design,
easier to test, less prone to errors and friendlier to
compilers.

This is why FC0 has no renamed registers, OOOE,
and other sophisticated control stuff.

OOOC include some difficulties, too.
but i don't think it will consume 100k transistors for FC0,
and there are less pipeline stages dedicated to scheduling.

In fact, the whole point of 000C in FC0 is that
the scheduling is partly performed in parallel with a non-OOOC
standard pipeline. This makes jump and loop overhead small
and further reduces the need for prediction and speculative
execution.

Sur, but in one side you have 6 gates thin multiplier and in other side you try to put a 3r2w register bank of 64 entries in the same slot size...

don't worry ....
the decoding logic (data ready, unit ready, etc.) will probably need some more pipe stages
in "high-speed" versions (where the pipelines are correctly sliced).
It's just a matter of splitting the stages correctly.
Remember, the first FC0-OOOC had no jump latency and now it it one.
If the register set is really slow, we can't do much better but it's not worth
adding complex renaming buffers : the register's access time will not be faster.
Enhancing (through longer pipeline) the decoder is a better solution.

Additionally, more complexity means more silicon area,
more dissipation, longer wires => more heat/dissipation,
more expensive and probably slower.
And control logic is certainly the least easy thing
to test in a chip. This is why i'm satisfied with
the current FC0.

not me :) Not when you saw the 3r2w regifile because of 1 or 2
instructions (like MAC). Not when you saw the mess of "special register"
that should be memory mapped (with conditionnal memory movemement like
not buffered, if needed). Not when you see the trap/expetion mess.

1) 3R2W is necessary also for load and store instructions, otherwise it's not possible to perform pointer update in the instruction.

Sur but, it's not really a true speed up and add a raw dependancies.

??????

oh, and what about your 'code density' argument ?
if you don't allow post-increment, then you need more instructions to compute the addresses.


2) If you map SRs to memory, you will face race conditions and synchronisation problems,
and protection will not be enforced on a register or register group granularity basis.

It's exaclty the same problem for IO register, and it's soon solved.

SRs are not for I/O (because it would become a bottleneck).
The problem, hence the solution, is not the same.


It's used by sparc and i'm pretty sur for ATM.

ATM ?

There is no need for a specific buses, only the use of direct addressing had an interrest.

???

3) what mess ?

The kind of linked-list that can't take nest interrupt that you speak about regularly. (shadow register could be far more easy...)

???

nicO

YG

nicO

YG again

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/