[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] new cjump instruction

To: <f-cpu@seul.org>
Subject: Re: [f-cpu] new cjump instruction
From: devik <devik@cdi.cz>
Date: Mon, 14 Apr 2003 10:29:31 +0200 (CEST)
Delivered-to: archiver@seul.org
Delivered-to: f-cpu-outgoing@seul.org
Delivered-to: f-cpu@seul.org
Delivery-date: Mon, 14 Apr 2003 04:30:36 -0400
In-reply-to: <20030413132105.23cb9d48.nico@seul.org>
Reply-to: f-cpu@seul.org
Sender: owner-f-cpu@seul.org

A few ideas from sw point of view. 12 bit "replacement"
part would force jump to fixed point within page unless
linker would fix it.
GCC could generate page aligned loop bodies but it will
need rewrite of BB reorg pass and will lead to suboptimal
branches sometimes.
Regarding MD's note about assembler labels - gcc has
provision for handling dynamic code size in order to
fix jumps in last compile stage.

On other side, register based jumps are pain in code with
many branches as they consume 1/3 of all insns.
I'd really like to see some kind of fast forward jump
in range about 8-32 insns.
For loops, register based jump is not so bad because
it can be loaded once and used many times. It doesn't
hold for small "switches" or frequent conditional branches
linked end to end (like in GCC source code generated from
.md).
Maybe jump within insn buffers ?

devik

On Sun, 13 Apr 2003, nico wrote:

> On Sat, 12 Apr 2003 21:16:51 +0200
> Yann Guidon <whygee@f-cpu.org> wrote:
>
> > hi,
> >
> > nico wrote:
> >
> > >On Sat, 12 Apr 2003 02:54:07 +0200
> > >Yann Guidon <whygee@f-cpu.org> wrote:
> > >
> > >>huh, i don't think that it's a good answer ....
> > >>
> > >>on top of that, this technique poses new problems in FC0's pipeline.
> > >>sure, addresses are computed fast, but what about their validation,
> > >>their fetch, their lookup in the buffers ......
> > >>
> > >>
> > >
> > >Validation are usefull because you are inside a pages.
> > >
> > >
> > validation seems ok BUT how do you validate that the new address is
> > already present and available ? you were the first to mention that
> > associative logic is slow ....
> >
>
> it is. But where do you see any associative logic ?
>
> > >fetch are ligthening fast.
> > >
> > i do not agree.
>
> I would say : it will be the fastest in this case comparre to other.
>
> >
> > >You have an entire clock cycle to do it (no
> > >adder or register read before accessing L1 caches).
> > >
> > >
> > worse :
> > you can check a 64-input associative logic (corresponding to the
> > registers) faster than the "hit" logic of the cache (shorter wires
> > etc....), just in case the other LSU logic says "no" (hey : one has to
> > multiplex the 12-bit LSB path from the instruction with the XBAR's
> > output with leads to the LSU address comparators) [a multiplexor is
> > not 'free gate' today]
> >
> > So let's imagine there is a "dedicated fast path" for this 12-bit
> > address to the L1 cache,
> > and it works in 1 or 2 cycles (well, this adds one other address
> > comparator that
> > will be active EVERY DAMN CYCLE).
> > Then, data must be sent from the cache to the LSU (if absent from the
> > LSU), which takes easily one cycle. Then goes the usual pipeline thing
> > : decode, then issue.
> > so in fact it is not faster.
> > Of course the classical jump is more complex, but it is more flexible.
> >
>
> All of this is part of the beginning of the pipeline (fetch stages).
> There is nothing about it in the manual or elsewhere.
>
> So i use the usual paradigm of a memory bus (adresse/data). So fetch
> send an adress and receive a data from the L1 cache memory. The
> adress must be available as soon as possible. So 12 lsb don't need any
> adder, so it's the fastest. (beside that fast L1 have 2 or 3 cycles
> latency but a thoughtput of 1)
>
> If you want to add multiple buffer, and bunch of decoder, that's up to
> you (this also add latency). But you will always need somewhere a memory
> bus, and it will be the slowest part.
>
> > >But the real problem is for the compiler. What is the opinion of the
> > >compiler writter ?
> > >
> > it's a useless constraint ....
>
> no it's not useless. It permit 0 cycle jump (without jump prediction).
> So unroll loops will be far less interresseting, you will save L1 cache
> space. In the worst case,(1 fonction by .c file) such file could avoid
> using it. Tight loop could be much faster.
>
> So devik, what do you think of it ?
>
> >
> > >nicO
> > >
> > >
> > YG
> >
> > *************************************************************
> > To unsubscribe, send an e-mail to majordomo@seul.org with
> > unsubscribe f-cpu       in the body. http://f-cpu.seul.org/
> *************************************************************
> To unsubscribe, send an e-mail to majordomo@seul.org with
> unsubscribe f-cpu       in the body. http://f-cpu.seul.org/
>

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

Follow-Ups:
- Re: [f-cpu] new cjump instruction
  - From: Yann Guidon <whygee@f-cpu.org>

References:
- Re: [f-cpu] new cjump instruction
  - From: nico <nico@seul.org>

Prev by Author: Re: [f-cpu] IDU News; synthesis report
Next by Author: Re: [f-cpu] Yet Another Upload
Previous by thread: Re: [f-cpu] new cjump instruction
Next by thread: Re: [f-cpu] new cjump instruction
Index(es):
- Author
- Thread