[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[f-cpu] Execution unit input/output
I try to write all the needed ios for an EU.
-> 2(3?) data's, register wide (or twice this size if we transform 3r2w
<- 1 (2?) data write, reg wide or twice register wide
<-write adress, if the unit are crazy, the release data must be followed
by his register bank adress, it will too complicated to duplicat this in
-> write adress, to be release with the data
<- Running, to know that calcul are on going, for power management, it
could be usefull for deguging too.
-> freeze, to stop the unit from releasing data (used to avoid
contention on the register bank write port)
-> enable, used to say to take the input into account (all 2 or3 data's
will be connected to all unit but only one should be run).
<- Ready, to say that the unit could receive more input. It's used by
the scheduler, it allow the use of crazy timed unit like pipelined loop
(imagine 4 datas runnning in the same time but the first data will be
release after 30 cycles, all multicycle unit could be done as this)
The most controversial (and new) part. The idea is to ease the canceling
of instructions. Why doing so ? 90 % the code is straitforward for that
work (good code). But a lot cases are tricky but should considered. A
long time ago, i have read that future processor must done all possible
work outside the main stream asynchronously (prefetch,...). Then the
in-order core could be as short as possible (it is done with things as
the LSU,....). So the idea behind that is to do the most work it is
possible and in case of error go back to the good part.
The behavior of the core must be simple to be easly predict, so good
code could be easier to write. The main point is the jump prediction.
The usual hint (branch taken is backward, not taken if foreward) will do
a correct job 70% of the time (with today's compiler). If we had the
cmove instruction to handel "if" clauses, and if the compiler take this
into account (and it should !) we could be very clause to the 100%
The problem with branch prediction are the miss. So we must handel
carrefully the canceling job. Whygee clame everywhere that he found a
means to use only 1 cycle with branches. I doesn't understand how it
could be possible.
I see 7 stages DSP, the first 2 stage are only to wait the data from the
memory (pipelined access), then we need a cycle to write or not the
program counter. So we lose at least 3 cycles.
The other point is regarding what must be done by the cache controler
before releasing a data (read in a ~64 Kbyte memory then check + some
muxes, i think we are far way frome 6 gate deep (far from 12, too !) )
So i propose to number the intruction with "few" bits counter (6-8?).
Then we could ask the unit to cancel all data previous to a point. So
the scheduler doesn't need to stop all unit at the decoding stage to
prevent "possible" problem that come very rarely.
-> ° of discarded instruction
Hope this help !
To unsubscribe, send an e-mail to firstname.lastname@example.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/