[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Re: FC0 XBAR

To: f-cpu@seul.org
Subject: Re: [f-cpu] Re: FC0 XBAR
From: nicO <nicolas.boulay@ifrance.com>
Date: Wed, 08 Aug 2001 20:42:19 -0400
Delivery-Date: Wed, 08 Aug 2001 14:34:12 -0400
References: <Pine.LNX.3.96.1010802091539.8403B-100000@redwood.oekomm.de> <3B692A6B.CDCF99DA@f-cpu.org> <20010802174213.14481@thrai.stud.uni-hannover.de> <3B699C6A.76073B65@f-cpu.org> <20010802224432.49658@thrai.stud.uni-hannover.de> <3B6A047F.521AE157@f-cpu.org>
Reply-To: f-cpu@seul.org
Sender: owner-f-cpu@seul.org

Yann Guidon a écrit :
> 
> hi ! (again... i gotta sleep soon, so don't worry !)
> 
> Michael Riepe wrote:
> > On Thu, Aug 02, 2001 at 08:31:06PM +0200, Yann Guidon wrote:
> > [...]
> > > > IMHO, reading and writing "in the same cycle" means that *by the end
> > > > of the cycle* (when the clock signal rises) the new data is stored in
> > > > the register, while the reader gets a copy of its *previous* contents
> > > > at the same time.
> > >
> > > that's what i feared :-(
> >
> > I thought that's what the Xbar bypass is good for -- avoid the delay?
> 
> "avoid the delay" from one unit to another.
> however, it seems that i had overlooked the
> register set's inherent latency.
> 
> > [...]
> > > Then there is another problem ! since the result is really written
> > > during the next cycle but available only the cycle aftern, people who
> > > want to optimise the instruction flow but who don't do it "perfetcly"
> > > are hit by a 1-cycle delay. Obviously, delaying is not a good thing :
> > > people will try to schedule the instructions close to the pipeline
> > > features, but there is a large chance that, under the pression of
> > > the program, available ressources and other stuff, the delay is reached.
> >
> > Well, let's try an example to make this clearer.  Let's say we want to
> > add 3 numbers:
> >
> >         add r5, r1, r2  ; temporary result in r5
> >         add r4, r3, r5  ; final result in r4
> 
> this is written in 'classical' risc / x86 fashion, it seems :-)
> 
> > Now, the time-table (without bypassing) is:
> >
> >         cycle 0: read r1 and r2 (pass values through Xbar)
> >         cycle 1: stage 1 of ASU is working
> >         cycle 2: stage 2 of ASU is working
> >         cycle 3: pass result through Xbar (write r5 at end of cycle)
> >         cycle 4: read r3 and r5 (pass values through Xbar)
> >         cycle 5: stage 1 of ASU is working
> >         cycle 6: stage 2 of ASU is working
> >         cycle 7: pass result through Xbar (write r4 at end of cycle)
> >
> > Note that r5 is written at the end of cycle 3, but read in cycle 4; that
> > is, the new value is read (and passes the Xbar again).  With bypassing,
> > cycle 3 and 4 will overlap, resulting in a 1-cycle speed-up.
> >
> > Or did I miss something?
> 
> i think that we agree about what means bypassing the RegSet in the 'Xbar'
> (a set of mux).
> 
> however, there is another case that worries me :
> suppose that you want to add more than 5 registers, for example 20.
> This could work for any combination of other operations, of course
> (this is not specific to additions, i care about the latency).
> 
> So we have a burst of register values all over the place. The scheduler
> will take care to organise that cleanly. In order to have the fastest
> execution possible, one will "organise" the instruction ordering
> so independent operations are interleaved. That's the "usual job"
> when one optimises for RISC.
> 
> Now imagine that the register number is exhausted, or some pressure
> like that. imagine that the instruction is issued one cycle after
> the necessary source data is present on the Xbar for bypass. The instruction
> will have to wait yet another cycle, until the register set memorises and
> gives the new value.


Could you go in deeper detail, i don't see a problem. The number of
possible bypass are fixed so you could never exhausted the Xbar. What's
wrong ?

> For me, this situation (wait) is not tolerable because i guess that most
> of the "desirable" time (when the code will be optimised at a decent level)
> the 1-cycle penalty might occur often enough that optimisation might not
> be worth. If your optimisation yeilds poor speedup, you'll drop it and
> i don't want to encourage that...
> 
> yes, i know, it's a bit complex. i hope i'll find a decent solution.
> 
> >  Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
> WHYGEE
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> *************************************************************
> To unsubscribe, send an e-mail to majordomo@seul.org with
> unsubscribe f-cpu       in the body. http://f-cpu.seul.org/
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

References:
- [f-cpu] Re: FC0 XBAR
  - From: Juergen Goeritz <goeritz@oekomm.de>
- Re: [f-cpu] Re: FC0 XBAR
  - From: Yann Guidon <whygee@f-cpu.org>
- Re: [f-cpu] Re: FC0 XBAR
  - From: Michael Riepe <michael@stud.uni-hannover.de>
- Re: [f-cpu] Re: FC0 XBAR
  - From: Yann Guidon <whygee@f-cpu.org>
- Re: [f-cpu] Re: FC0 XBAR
  - From: Michael Riepe <michael@stud.uni-hannover.de>
- Re: [f-cpu] Re: FC0 XBAR
  - From: Yann Guidon <whygee@f-cpu.org>

Prev by Date: Re: [f-cpu] Register Bank - reset, partial write
Next by Date: Re: [f-cpu] Re: FC0 XBAR
Prev by thread: Re: Drawing Things (was Re: [f-cpu] Re: FC0 XBAR)
Next by thread: Re: [f-cpu] Re: FC0 XBAR
Index(es):
- Date
- Thread