[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] use of 1r1w regfile for our 3r2w regfile



> > > But is that good or not ? 4x 1r1w regifile will be ~30% faster than a
> > > true 3r2w.
> >
> > The worst case is, by definition, always bad (or even worse ;).
>
> There is a miss understanding. The worst case the case where it slower. 3r2w
> SRAM memory are ~30% slower than 1r1w SRAM. But if we use 1r1w SRAM bank
> there is some collision that "could" be avoid by the compiler (otherwise you
> loose one cycle).
>
> I can't say was is worst. It fully depend on compiler.

I'm afraid nicO that you initiated the discussion by query which
has no answer unless exhaustive simulation of both hw and sw is
done.
When I did gcc tests for single issue f-cpu I found that there is
average 1.4 insn in-progress at any time (for well scheduled code)
with peaks of 3 insn at time (often slow add+move+mul).

So that one could say that with 4 banks it should be ok. Unfortunately
read targets are not tightly corelated with write targets - if you
force gcc to alter register allocation to minimize read port bank overlaps
it forces write port bank overlaps.
1.4*3(2w1w insn) = 4.2 - so there is chance that they will fit into
4 banks - but there could be ties from earlier code which already
dictated bank assignments to prevent its own stalls.

I can imagine 4 1r1w banks to be 30% faster because of their natural
speed or they can also be much slower because much of code is
"overconstrained" and you can't allocate banks are freely as you
might imagine.
I'd suggest you to look at our gcc port - it is not "so" hard and
will give you stronger insight into compiler limits and also gives
you tool to quickly evaluate/validate your ideas.

BTW,
I just created design which uses SDRAM module and found interesting
thing. The beast can deliver data at 133 MHz clock (of course there
is still row precharge and other latencies but eliminated by banking)
the same rate as my xilinx design can go.
When one creates cpu within fpga then bursting SDRAMs are relatively
so fast that one can eliminate all caches but L0 (or L1, depends on
naming) ...

devik

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/