[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rep:Re: Re: [f-cpu] No latches, please !



hello,

Michael Riepe wrote:
> 
> On Sat, Feb 16, 2002 at 04:27:19AM +0100, Yann Guidon wrote:
> [...]
> > there are "shadow flags" which indicate whether a register is zero or
> > not. This is one of the hottest/nastiest things in the register set "entity".
> > in the 64-bit implementation, it takes 5 bits, one bit per "slice"
> > (there are 2*8 bit slices and 3*16-bit slices). the bits are updated
> > depending on the write mask, it is read on every cycle so we know
> > whether a condition is true or false : it is a 2W1R bank.
> 
> Yep, I remember.
> 
> The shadow flags are used when we have to check an operand for zero-ness,
> i.e. the scheduler will read it when it encounters a CJUMP, CMOVE or DIV
> instruction. If the register's value is being calculated at that time,
> the scheduler has to wait until the result is ready. Bypassing the
> register write isn't possible because it's not yet clear whether the
> next instruction is executed at all -- the condition may be false,
> or there might be a division-by-zero exception.

it is not exactly this way but rather close.

just like the register set, the flags are speculatively read, accessed
by any instruction. The opcode decoding (performed in parallel) will
tell to the next stage (decode) whether the flag is useful or not.

The bypass (which is one worrying part) for the z-flag has to provide
the issue logic with the value of all the slices' values and OR them. 
The slice values have to be selected, so a MUX is necessary.

> > The problem arises when there is a buble in the pipeline, which stalls
> > waiting for a result which conditions the issue of the currently decoded
> > instruction. During the R7 writeback cycle, the bits in a slice are ORed
> > together and sent to the 2W1R bank of 5*63 bits (each of the 5 bits are
> > conditionned by the write mask). The 1R output of the 5-bit vector is ORed
> > and gives the needed bit. If we use a transparent latch, the flow-through
> > time of the cell uses one gate delay.
> 
> I'd love to see a timing diagram.

i'll start with a structural diagram so we are sure that everybody speaks
about the same thing.

> > Otherwise, using a FF, there is the need to create a bypass path :
> > it introduces a MUX for each of the 5 bits because we have to choose
> > which of the old or new flag is read, depending on the write mask
> > and other flags. Although you might not like it, using a transparent latch
> > is an elegant solution and compromise. Using synchronous logic increases
> > the complexity of the logic and my brain has limited computational power :-(
> 
> IMHO, it's just the opposite: a mix of FFs and latches will increase
> the complexity.

why ? it is bound to the decode/issue logic stages, where the execution
pipeline is synchronised with the instruction flow.

>  Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/