[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] FC0's RTL scheduler



hi !

Michael Riepe wrote:
> 
> On Wed, Dec 19, 2001 at 03:56:06AM +0100, Yann Guidon wrote:
> [...]
> > maybe we can "cheat" with the scheduler : swap the register number,
> > instead of the data, and validate one or another ? it can save one cycle
> > at the "cost" of missing some issue opportunities but they can be recovered
> > with a careful coding uideline. what do you think ?
> 
> Not sufficient. E.g. for a 16-bit operation, the result chunks come out
> like this:
> 
>         high part:      33221100
>         low  part:      33221100
> 
> For `macl.16' the low part *should* be "11110000" ("33332222" for
> `mach.16'), according to the manual. That's one of the reasons why I
> never liked the original definition of the mac instruction.

if you can't make it, then don't worry : we'll modify the spec.
however in some circumstances the overhead of the pack/unpack
instructions that shuffle the subword can become a limiting factor...
it's either one way or another and either way has limits...
if we had both it would be cool :-)

> > And can you explain a bit how the output ports work ?
> > how many do you need ? And what about the normal operations ?
> 
> You need one or two of them for each operation. The low part is used for
> both `mul' and `mulh', the high part for `mulh' only (and for `macl/`mach'
> you'll have to take both and mix them -- or I'll have to add more ports).
> 
> Results always arrive on their corresponding ports, that is, 32-bit
> results will be available at the 32-bit outputs after 5 cycles. Due to
> the `forked' pipeline and the different latencies, multiple results
> may appear at the same time, e.g. at t=0 you may get a 64-bit
> result (instruction issued at t=-6), a 32-bit result (started at
> t=-5), and upper or lower parts of 16-bit and 8-bit computations.
> Of course the scheduler can avoid that if the Xbar is unable to handle
> the traffic.
of course. I'm dealing with that now.

> I guess I'll have to delay the 8- and 16-bit low parts by 1 cycle (to
> make high and low parts arrive at the same time). Otherwise, scheduling
> becomes a nightmare.
it IS possible and i don't think it will be a nightmare,
but it will certainly be more complex. However,
from the programming point of view, it is more logical
to have both results at the same time (IMHO).

At the end of your design, could you please write an updated version
of the programming model of the multiply unit ? (for the manual).

>  Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/