[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Supported Instructions



hi !

Christophe wrote:
> ----- Original Message -----
> From: Yann Guidon <whygee@f-cpu.org>
> To: <f-cpu@seul.org>
> Sent: Saturday, April 06, 2002 7:02 PM
> Subject: Re: [f-cpu] Supported Instructions
> 
> > the emulation must be done in SW, so a few shifts and masks
> > do the job. I wonder why you imagined.
> 
> Are you really sure ? you are too optimistic about such a thing.
> 
> Well, let us try an example :
> 
> mac r1,r2,r3 // r3 = r3 + r1*r2
> 
> So you think to get such fields is only a matter of a few shifts and masks ?
> now I get their fields, what am I supposed to do with them ?
> 
> A code with a switch with direct registers access like it ?
> 
> switch (reg3)
> 
>     case 0:
>         switch (reg2) {
>            case 0;
>                 switch (reg3) {
>                     case 0: { save r1; r1 = r0*r0; r0 += r1; restore r1;
> break; }
>                 }
>            case 1:
>                 switch (reg3) {
>                     case 0: { save r2; r2 = r0*r1; r0 += r2; restore r2;
> break; }
>                 }
>             ... // 64 registers !!!!
>         }
>     case 1:
>         switch (reg1) {
>            case 0;
>                 switch (reg3) {
>                     case 0: { save r2; r2 = r0*r0; r1 += r2; restore r2;
> break; }
>                 }
>            case 1:
>                 switch (reg3) {
>                     case 0: { save r2; r2 = r0*r1; r1 += r2; restore r2;
> break; }
>                 }
>             ...
>         }
>     ...
> 
> Of course not !

i apreciate your good sense :-)

> Ok, you said registers are in fact saved in CMB, so we should get something
> like :
> 
> cmb->r[reg3] += cmb->r[reg1] * cmb->r[reg2];

so you understand how it works : cool :-)

> where cmb is the pointer of CMB of the task which triggers the invalid
> instruction trap and reg1, reg2 and reg3 the three fields for register for our
> "mac" instruction.

the "shifts" i was speaking about were the ones used to extract reg3, reg2 and reg1
from the instruction word.

(mixed C/asm, sorry)

inst = *trap_IP;
reg1 = (inst >> 12) & 63;
reg2 = (inst >> 6) & 63;
reg3 = reg3 & 63;
data1 = fp(0);
data2 = fp(0);  // clear the registers
data3 = fp(0);
if reg1!=0 load [reg1+CMB], data1; // i'm lazy to do the prefetch
if reg2!=0 load [reg2+CMB], data2; // i'm lazy to do the prefetch
if reg3!=0 load [reg3+CMB], data3; // i'm lazy to do the prefetch

so now we have the data and we can start the emulation code.

and we must not forget to increment the IP.

there is a problem, however : if the code is faster than the SRB mechanism,
we'll read the old value from the CMB. worst, we could interrupt the SRB
to resume the task, but the result can reside in a register which has not
been "touched" by our routine. The effect is that upon return, the result
might not be updated.

This means that if SRB is used, there must be a mean to control the
SRB tags : on entry, the tags must synchronize the reg1/reg2/reg3
entries of the CMB, so we don't read the old value.
This could be avoided in a "elegant" way (these values are known
at decode stage, so we just have to "latch" them somewhere)
but the "where" is a problem (there is not enough "bandwidth"
for communicating with the SRs or the LSU).

Updating the SRB tags to force the read of the result register
is not as difficult.

> But don't forget we must also read the opcode first to know what we must to do.
> Of course, if our opcode has few fixed bits it will help much more the job with
> a lookup table.

good sense again :-)

> > FP itself is at maybe 10x faster than emulated instructions, and FP is
> > pipelinable.
> 
> Anyway, there's plenty of memory accesses and branching so don't think FPU will
> just be 10x faster than a FPU emulator !!!

i was just being "conservative" ;-)

> > so if you don't have a FPU, it's _necessarily_ slow. come on.
> 
> NECESSARILY, so we should aggree about the fact that FCPU brings us nothing new
> or wonderful for emulation of missing instructions
this was not the intent. emulation is a mean to provide binary compatibility
between different implementations, not a mean to increase speed.
And have a look at other CPU families and look at how they handle traps.
in the case of MIPS, it's "rather bare" and it works anyway.

> (in fact I don't see the interest for an emulation, if you need some
> float operations, use a FCPU with a FPU embedded).

good sense rules forever :-)

but this becomes more a problem during TLB misses.

let me think about this for a while, when i'm installing a new LFS :-)

WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/