[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] freeze signal



hello,

nicO wrote:
> Yann Guidon a écrit :
> <...>
> > > or you could try to predict the case (but i wait an algorythme for
> > > that).
> > it's a simple lookup table.
> > IE :
> > - when you see opcode = OP_ADD with 64-bit data and no carry (2r1w)
> > - when you see that all the required registers are available (either
> > in R7 or on the Xbar)
> > - when you see that there is at least one free slot available in 3 cycles
> > in the future
> 
> That's the last point the problem, how do you do that ? For me it's an
> allocation (schedule) algorythme, a typical np hard problem

for the current state of the core, the complexity is not overwhelming.
If there are extremely long units (such as a 20-cycle multiply, for example),
the problem gets heavy but the multiplier takes "only" 6 cycles, it's
reasonable. I guess that we will introduce the FP units much later.

> > => then you can issue the instruction (ask another instruction from the
> > fetcher + validate the ASU + mark the destination register as "dirty"
> > + allocate the slot in the scheduling queue).
> 
> So you continue or stop the flow :D

The flow is stopped at the decoding and issue stage if the issue conditions
are not met.

> > IF opcode=OP_ADD AND ( source or destination register is not ready
> >  OR no free slot in 3 cycles) then : delay 1 cycle and try again.
> >
> > I don't know if you accept this as an "algorithm" but it's what
> > it does and you can see that we need 1 LUT and 1 scheduling queue.
> > For the first FC0s, i am merging the "register scoreboard" with
> > the queue, but i will use another technique for superscalar cores.
> 
> Yes it's an algorythme but not enough precise !

the LUT contains data about :
 - whether the instruction requires 0, 1 or 2 write slots
 - the instruction "uses" the source registers 1, 2 and 3
 - the latency
 - if the instruction uses the condition
 - *************************** pointer
 - ****************** is valid
etc.

You have 8 or 10 bits as input and it outputs some tens of wires,
each selecting the associated ressource. During the issue cycle,
all the conditions are ORed (combined) together and the result
commands the issue signal. When this signal is active,
- the next instruction is fetched
- the slot is allocated in the scheduling FIFO/queue
- the apropriate signal is sent to the EU

i guess that it is still cryptic for you. Unfortunately i'm currently
sick, tired and have a big headache, so i can't make drawings.
I hope to meet you in a few weeks.

> > > In the first case, i need a signal to stop the pipeline (or stop
> > > to produice output) to insert an empty slot.
> > The "slot" is inserted at decode/issue stage. the rest of the
> > units can do their work without caring, as long as their latency
> > is deterministic (and can be encoded in the opcode LUT which
> > also tells the issue stage which flags are meaningful for the decision
> > whether to issue or not).
> 
> So you'r enter all the information in the lookup table (wich register
> are in use at what level, the different latency of the unit, the
> different thoughtput, the type of unit,... ). I think it's a bit to much
> !

the LUT is hardwired : you can't say what registers are currently in use.
It is the purpose of the scheduling queue. I don't think it's too much either
because "if" the instruction set is well designed (when all the OPCODEs are
allocated), the decoding must be rather easy (not more than 4 logic levels).

> nicO
WHYGEE (krank wie ein Tier)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PS: in the CD i last gave you, there is the file named
distro/f-cpu/QDCPOC/qdcpoc2.h which contains the following
descrition :


/********************
   The opcode LUT :
 ********************/

/* this structure describes the properties and
   needed informations for each opcode. IT is
   only a first preliminary version, a lot of
   things are missing. */
typedef struct {

  /* latency :
  indicates which queue entry will be filled */
  unsigned int latency_zero : 1;      /* nop (could be removed ?) */
  unsigned int latency_direct : 1;    /* move, loadcons */
  unsigned int latency_cycle_1 : 1;   /* rop2, inc */
  unsigned int latency_cycle_2 : 1;   /* ASU */
  unsigned int latency_multiply : 1;  /* imul (more ports ?) */
  unsigned int latency_idiv : 1;      /* idiv */

  /* opcode format :
  indicates which fields are necessary before we issue the instruction */
  unsigned int need_src0 : 1;
  unsigned int need_src1 : 1;
  unsigned int need_src2 : 1;
  unsigned int need_rw2  : 1;
  unsigned int need_cond : 1;

  /* queue reservation : */
  unsigned int need_w1 : 1; /* indicate that we want 1 write slot
  (in this case, src2 is written to either slot0 or slot1) */
  unsigned int need_w2 : 1; /* indicate that we want 2 simultaneous write slots
  (in this case, src2 and rw2 are written to the 2 slots) */

  /* use of the constants :
   it directly drives the Xbar */
  unsigned int op_imm8 : 1;
  unsigned int op_imm16 : 1;

  /* more will be added as special cases are discovered */

} opcode_lookup_type;

opcode_lookup_type opcode_LUT[256], opcode_type;
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/