[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] dmove [Was: reg. rotation]



> > I was thinking that we currently need a mecanism to read r1 and
> >r1^1 and to write r2 and r2^2 at the same cycle,
>
> so this will require to discard the condition field.
> do you remember that the "move" instructions are conditional ?

Why to discard it ? There only would be different opcode which
would change it from 2r1w ro 3r2w.
However original intention was to make "unconditional" 2w2w
move and use all 24bit as 4 register nums.

> it's not orthogonal with the move instructions.

I take this argument (if you read the original post
I was worrying about it). But others arguments .. see below.
Also you already have immediate form of majority of insns
which is again not strictly orthogonal. But you optimize for
common case - dmove it the same kind of animal.

> try to think about the decoding+issue logic
> of a more complex implementation of F-CPU,
> for example one that executes 2 (or more)
> instructions per cycle.

And ? The insn would the like insn "sort". Or should
all 2r2w (or 3r2w if any) be dropped from FC1 ?

> 2) you can split the "double move" into 2 simple instructions.
>    There are a lot of scheduling issues with this :
>       - first the destination will have to be paired.
>         it is probable that in certain (annoyingly
>         useful) situations, it is not possible.

with unconditional variant with 4 regs this is not problem IMHO.

>       - the 2 sources are not likely to be ready/available
>         at the same clock cycle. This means that a MOVE
>         of one data can be easily blocked by another operand
>         that is not ready.

this is THE SAME for all 2r insn. Only difference from
2r2w mul or sort is different outcome of dmove.
Compiler has to schedule it as other 2r2w insns.
Only it'd be twice faster if used appropriately.

> Clearly the intent is to increase coding density
> at the cost of scheduling and flexibility.

yes. I agree that increasing density and doubling throughtput
was the main reason. But not at cost of scheduling nor flexibility.
With multiisue FC1 with enough ports if will still make it faster
when placed appropriately by compiler.

I'd really like to understand where is problem with the instruction
other than orthogonality. I'm not egoist I don't want the insn so
much but I'm sad if I don't understand where is the problem :-(

devik

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/