[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] I'm still in the warmup phase ;-)

hi again,

Beat Steiner wrote:

Other suggestion I would like to make:
One register has already been reserved for constant 0

note that it is for "cultural reasons".
it is still used in some cases, however,
such as "always" conditions or for memory pointer prefetch.

The same way, a register for +1 and -1 could be reserved.
This way, increment and decrement could be replaced by additions.
The -1 constant (0xFFFFFFFFFFFFFFFF) is also interesting in conjunction with EXOR
to negate a value. Maybe the +1 register is even not necessary (replace by subtract -1).

All these cases are covered by the ASU, the INC and the ROP2 unit.
bit negation alone is performed in many ways,
such as xorn r0, rs, rd.

The manual has such a nice increment unit. It would be almost a pity to replace it by add units.
you notice that the nature of the operations is different, so INC does not replace,
but supplements ASU.otherwise it would be a waste.

But two add units may be better than one add and one increment unit. Is the find-first-lsb relevant
to any software? Hash/compression algos or such?
it is relevant to several algorithms, such as normalise in software FP,
bitstream manipulation (hence compression) or signal quantisation.
It exists in the 386+ CPUs but is usually microprogrammed,
resulting in a non-constant execution time. Worst, this microprogram
is plit into many micro-ops in recent x86s.

Do we need more pseudo-registers for memory access?
what do you mean here ?

Is there already a doc available about the f-bus?
This interface is still in limbo.
we can't design everything at once as well.

It would be nice to issue read requests asynchronously to the data returned. Responses must be priorized
over requests.

Example (T=start time, CC=clock cycle):
T+00CC: R01 := [0x000000da00012000]
T+01CC: R02 := [0x0000fe00263785000] does not wait for the above instruction to complete and already issues
the next read request. Every Memory module will need a queue for that. Banking of RAM chips no more
required. All may have different size (and even speed).
T+16CC: Result for R01 := [0x000000da00012000] drops in and is stored in R01
T+20CC: Result for R02 := [0x0000fe00263785000] drops in and is stored in R02

Is such an OOO (out-of-order) bus design already patented as DDR, RAMBUS, Chipkill or whatsoever?
hmmm i don't think that DDR is OOO. but some other buses (non-memory, rather NUMA)
are inherently OOO. This trend started long ago with message-passing network, since
a read request is just a "read" message that is answered by a "data block" message.

Sorry for the above opcode. But I somehow prefer R01 := R02 + R01 to ADD R01, R02.
but F-CPU is a "3-address" computer :-)

Some things to remember:
* Most important design rule: keep it simple.
"Make it work, then make it work fast".

* The majority of computer users needs rather data movers than computers (i.e. machines performing calculations).
but did you hear about the "memory bandwidth barreer" ?
F-CPU is designed to be good at computing, it's a fact.
but we can't do much about the communication bandwidth, which
depends on technology (and price) and not architecture.
We can allow the implementers to use very wide buses
(ie up to 256 bits for FC0) but in practice, it's not
realistic outside the silicon. Do you get the picture ?

btw, welcome aboard :-)


To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/