[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [f-cpu] Register set revised
> >oh well. As strongly sw oriented guy I started to realize
> >such consequences just now. The architecture seemed rather
> >simple to me but recently I started to see how complex it
> >can be at logic level.
> >
> >
> heh :-)
>
> we did some serious work to make people believe that "F-CPU is simple",
> just to get them interested and induce them to do some work ;-)
It works ! :-)
> >it would be interesting. unfortunately it is far from being
> >simple to test more than 2 split sources ...
> >
> really ?
> i think that for our case, that is : decoding whether 2 writes occur to
> [...]
you misunderstood me - I wanted to say that it is more complex
to convince GCC to try to optimize register allocation to minimize
hazards of simultaneous writes to the same bank.
We can learn scheduler to exchange some insns to be scheduled
at different time to avoid it but there is still possibility
to do something at register allocation stage.
Maybe we could rename temporary registers in postprocessing pass
after scheduling when we detect bank collision.
> outputs and 1 write input. welcome to the HW world ;-)
it is not SO bad with my hw experiance :) I know that from
HW side it is not so complex.
By the way have you seen ETRAX chip especially CRIS CPU ?
They have nicely compact set :) Well I agree it is not
good for superscalar but it is nice to see how compiled
code become smaller thasn x86 (but not as small as for ARM).
As you hinted me, I looked at other CPUs usable for small
linux system and ETRAX is nice. $40 when you buy 1pcs, direct
SDRAM/SRAM/EDO connection, 100Mbit eth, sync/async serial, wide
buses, 100MIPS ...
> i did quite a bit of Pentium MMX coding and even discussed with
> one of the architects. They had to make some early design decisions
> and ISA definitions based on the then available resources and limitations.
> This is why the speed increase of MMX (only 2x in average compared
> to normal "scalar" code) is so marginal. The widening MAC
> is only one example : it takes 3 cycles to compute and it can be pipelined,
> but the "butterfly" eats up more cycles. that's dumb !
> This is one of the reasons why 3r2w is desirable.
I agree that IDCT in MMX is quite large with all these PPACKxxx insns.
I'm interested in 8x8 DCT because it is quite useful (jpeg,mpeg..).
By the way F-CPU is missing SAD insn which is very useful for video.
And it can't be coded effeciently in f-cpu. At least sum of bytes
inside register would be needed.
> i have seen an increadible 32 bit core on fpgacpu.org,
> i think it's the XSOC. It is very small and the instruction
> set is really limited but it can compile some things.
> it has no protection at all, but it fits into a few hundreds
> of cells....
sound interesting. I should at least look at it :)
> i have found some chip Europa-format i486-DX66 boards in Berlin
> and that's probably what you need (like a PC104 board or something
> like that). And because it's almost a PC, development is much faster.
I like ETRAX (above) - they natively support linux and it is
almost complete server system with 1W of power at 1.8V ..
devik
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/