[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Register set revised



> if you want to make small CPUs, F-CPU is not a good target.
> Furthermore there exists already a looot of 32-bit CPUs,
> which are better suited for low power and small footprints.
> The fpgacpu.org site is a good place to look at, for example,
> because it gives several good tricks.

oh well. As strongly sw oriented guy I started to realize
such consequences just now. The architecture seemed rather
simple to me but recently I started to see how complex it
can be at logic level.

> i would rather split the register set in more sub-banks,
> in order to increase the associativity and reduce the hit
> to a smaller fraction. Maybe you should try with 16 or 8 banks,
> and give us comparative results. In this case, my gut feeling
> is that the hit is only marginal.

it would be interesting. unfortunately it is far from being
simple to test more than 2 split sources ...

> >I also saw mac a few times. MMX like pmadd might be better
> >because it is 2r1w and imposes no RAW interdependency between
> >subsequent ones. But yes it might be seen as "different"
> >because changes chunk size on the fly. On other side it
> >supports widening multiply.
> >
> widening multiply means that there is no need for scheduling a couple of MAC
> instructions AND later combining the result back into a single register,
> hence at least 2 clock cycles that are saved, it's particularly important
> in these small computational-intensive loops for 3D, sound and video...

yes I agree I overlooked that MAC is widening. I came to
conclusion before that it is not.

> >with ROP2 available only from two banks compiler can always
> >rename r3 to whichever is available as source for xor.
> >
> ouch, that's ugly ....

yes, isn't it ? ;-) Ok back to be serious. I sometimes try
to think in non-conventional ways and often reinvent wheels ...

> my opinion is to perform this "transparently" with the core,
> using more banks to reduce contentions and inserting "penalty cycles"
> automatically to ensure that other cores can implement the register
> set in a way that is more suitable to their particular case.

you are right. My latest screams into dark was caused
by my interest having ultra-simple design with reasonable
performance for SoC design. OpenRisc 1200 is still too
complex for me :-\ I even didn't find suitable one on
fpgacpu.org.
I'd like to have small linux servers in fpga to act as boundary
routers, net-connected video grabbers etc.

> >MR emulator. I've first do it and then back my claims by
> >some specINT numbers ..
> >
> are you really going to run SPEC2K ? .... :-)

if someone will be willin to privately donate copy ;-)
But probably I'd measure mix of some utils like gcc, grep
and compare with Intel system with known specint.

devik

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/