[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] dmove [Was: reg. rotation]



On Thu, 28 Nov 2002 14:01:08 +0100 (CET)
devik <devik@cdi.cz> wrote:

> > > I was thinking that we currently need a mecanism to read r1 and
> > >r1^1 and to write r2 and r2^2 at the same cycle,
> >
> > so this will require to discard the condition field.
> > do you remember that the "move" instructions are conditional ?
> 
> Why to discard it ? There only would be different opcode which
> would change it from 2r1w ro 3r2w.
> However original intention was to make "unconditional" 2w2w
> move and use all 24bit as 4 register nums.
> 

The instruction world are 32 bits. The 3*6 bits field are fixed for
direct acces without any shift to extract register adress for using it. 

The 32 bits are really tight for real 3r2w. I had proposed a 4r2w
versions. Your double move is only 2 move paired as it's maid in
superscalar cpu.

"My" 4r2w version use 64 bits world with true 4r2w (6 adress field). So
there is no more register pairing, you only add one read port to the
register bank but you could executed 90% twice the number of
instruction (most fcpu instruction are 2r1w and some instruction as
MAC is tricky to use). It is a"little" vliw. So it's 2*32 bits
instruction or 64 bits one. The 64 bits version must behave as 3 one
(by using output of 2 units that goes to another).

For whygee, it's for FC1. :)

I must write a draft of it.

> > it's not orthogonal with the move instructions.
> 
> I take this argument (if you read the original post
> I was worrying about it). But others arguments .. see below.
> Also you already have immediate form of majority of insns
> which is again not strictly orthogonal. But you optimize for
> common case - dmove it the same kind of animal.
> 
> > try to think about the decoding+issue logic
> > of a more complex implementation of F-CPU,
> > for example one that executes 2 (or more)
> > instructions per cycle.
> 
> And ? The insn would the like insn "sort". Or should
> all 2r2w (or 3r2w if any) be dropped from FC1 ?
> 
> > 2) you can split the "double move" into 2 simple instructions.
> >    There are a lot of scheduling issues with this :
> >       - first the destination will have to be paired.
> >         it is probable that in certain (annoyingly
> >         useful) situations, it is not possible.
> 
> with unconditional variant with 4 regs this is not problem IMHO.
> 
> >       - the 2 sources are not likely to be ready/available
> >         at the same clock cycle. This means that a MOVE
> >         of one data can be easily blocked by another operand
> >         that is not ready.
> 
> this is THE SAME for all 2r insn. Only difference from
> 2r2w mul or sort is different outcome of dmove.
> Compiler has to schedule it as other 2r2w insns.
> Only it'd be twice faster if used appropriately.
> 
> > Clearly the intent is to increase coding density
> > at the cost of scheduling and flexibility.
> 
> yes. I agree that increasing density and doubling throughtput
> was the main reason. But not at cost of scheduling nor flexibility.
> With multiisue FC1 with enough ports if will still make it faster
> when placed appropriately by compiler.
> 
> I'd really like to understand where is problem with the instruction
> other than orthogonality. I'm not egoist I don't want the insn so
> much but I'm sad if I don't understand where is the problem :-(
> 
> devik
> 
> *************************************************************
> To unsubscribe, send an e-mail to majordomo@seul.org with
> unsubscribe f-cpu       in the body. http://f-cpu.seul.org/
> 
> __________________________________________________
> Modem offert : 150,92 euros rembours_s sur le Pack eXtense de Wanadoo
> ! Haut d_bit _ partir de 30 euros/mois :
> http://www.ifrance.com/_reloc/w
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/