[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] freeze signal


nicO wrote:
> Michael Riepe a écrit :
> >
> > On Mon, Oct 08, 2001 at 01:42:54PM +0000, nicolas.boulay@ifrance.com wrote:
> > > Is it possible to add a signal to the entities to
> > > completly freeze its output. It's different from
> > > enable. The freeze signal must stop the pipeline or
> > > at least the output port of the unit, no new data
> > > should get out in the result port.
> > Currently, the EUs contain no input or output registers at all; they're
> > supposed to be added at the next higher level.
> So how do you make the pipeline, i miss some thing !

when you "assemble" the EU slices together, you implement
the "glue" with registers in the top level. For example, the
ROP2 EU is made of 2 main files : one does the predecoding
(+ fanout) and overlaps the Xbar read stage, and the other
file "does" the ROP2/mux operation.

When you create a testbench, you can "stick" the units together
with wires, to avoid the latency management. When you implement
the pipeline, you simply put pipe registers in the middle.

However, some code designed by Michael is not like that,
but allows the specification of what pipeline depth is desired.

> > > We absolutely need this kind of stuff to handel unit
> > > with a latency more than 1 (to manage the fact to
> > > have 2 data could be ready in the same time)
> >
> > The scheduler has to take care of that.  It must delay the instruction
> > if there is no free "transport slot".
> Yep, you must insert a delay so you must at least stop the current flow
the "current flow" can't be stopped once it is "issued" (sent to the EUs
and inserted in the writeback queue). The "slot" is detected at decode
stage and delayed at issue stage and before (decode and stage).
Otherwise, we would need to insert heavier and slower pipe registers
all over the chip and reduce the frequency :-/

> or you could try to predict the cas e (but i wait an algorythme for
> that).
it's a simple lookup table.
IE :
- when you see opcode = OP_ADD with 64-bit data and no carry (2r1w)
- when you see that all the required registers are available (either
in R7 or on the Xbar)
- when you see that there is at least one free slot available in 3 cycles
in the future

=> then you can issue the instruction (ask another instruction from the
fetcher + validate the ASU + mark the destination register as "dirty"
+ allocate the slot in the scheduling queue).

IF opcode=OP_ADD AND ( source or destination register is not ready
 OR no free slot in 3 cycles) then : delay 1 cycle and try again.

I don't know if you accept this as an "algorithm" but it's what
it does and you can see that we need 1 LUT and 1 scheduling queue.
For the first FC0s, i am merging the "register scoreboard" with
the queue, but i will use another technique for superscalar cores.

> In the first case, i need a signal to stop the pipeline (or stop
> to produice output) to insert an empty slot.
The "slot" is inserted at decode/issue stage. the rest of the
units can do their work without caring, as long as their latency
is deterministic (and can be encoded in the opcode LUT which
also tells the issue stage which flags are meaningful for the decision
whether to issue or not).

> nicO
WHYGEE, going to sleep, now :-)
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/