[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Manual 0.2.6



On Thu, Aug 01, 2002 at 05:11:25PM +0200, Yann Guidon wrote:
> hi,
> 
> just a little detail :
> 
> Cedric BAIL wrote:
> > The idea is to have this capability for every register size (8, 16, 32, 64).
> > We need this instructions to load register that are bigger than 64 bits (the
> > shifter only work on 64 bits, and it isn't its job to work in a inter chunk
> > capability).
> 
> beware : Michael did not implement inter-chunk shifts, but it is not a norm.
> i would like to implement a different shifter structure which could shift
> 64 bits, even between two neighbouring chunks.

It's not a problem to extend the shifter to 128, 256 or even more
bits. But it will affect the latency, of course. The same is true for
the ASU and IMU units; the only EU that is mostly immune to word size
changes is ROP2.

A 256-bit F-CPU core that supports full-word operations will have longer
pipelines or more delay per pipeline stage (or both). Unless we use
variable latency EUs, almost all operations will be slower than in a
64-bit version.

The question is whether e.g. a 256-bit add, multiply or shift instruction
is really useful. Most of the time, applications use small integers that
fit into 32 (or less) bits, and IEEE single or double floats. That is,
wider registers will be used for SIMD operations a lot, but rarely for
`wide' operations.

How to move data to/from wide registers? Data that is stored in memory
can be handled with load/store, immediate data will be handled by the
loadcons* instructions, and the mix, expand, byterev and sdup instructions
(which may handle wider chunks than 64-bit without significant performance
loss) can be used to move data between different chunks.

If we're going to implement a `monster shift' anyway, I suggest a `chunk
shift' instruction that operates on full-size registers (that is, there
will be no SIMD mode) and always shifts 64-bit quantities:

	cshiftl r3, r2, r1	// r1 = r2 << (64 * r3)
	cshiftr r3, r2, r1	// r1 = r2 >> (64 * r3)

This can probably be integrated with the SHL execution unit.

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/