[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] GCC and jmpz vs. jmpl



hi,

Michael Riepe wrote:

On Thu, Jan 09, 2003 at 10:54:31AM +0100, devik wrote:

for inc/dec/neg/cmp.....
Who added next cycle ??

The INC unit needs 2 cycles because it became too complex.
Life's a bitch, you know...

hehehe, what irony ! Incrementer unis has its name after
inc/dec. But now inc/dec are superfluous because they do
the same job as addi $1 with the same time - addi can be
even faster for 8bit chunks.

Not my fault. Whoever designed the original INC unit underestimated the
cost of a SIMD-enabled `and' reduce tree.

is it caused by the SIMD ?
without SIMD, i think it's ok.
and i seemed to rememer that
adding SIMD was not that expensive.

But if INC is going to last 2 cycles,
then it should be possible to shove a binary
encoder at the end of the unit....

that would be cool and this would help
for finding the first char in a strcmp, for example :

; we get here because the loop exited with a non-zero match
; in r1 ( bytes either 00 or FF)
LSB1 r1, r2 ; the mask is first priority encoded, to remove trailing bits,
; and the position of the LSB that is set is put into r2
shri 3, r2, r2 ; shift 3 LSB out (because we deal with bytes)
or r2, r3, r3 ; the pointer in r3 (where the match occured) is adjusted
; it is normally 64-bit (or more) aligned, and the OR added the offset.

i just realized that this nice code seems to be independent from the register size :-)

BTW, the binary encoding is already in my mind because the
LSB0/1 and MSB0/1 instructions were not always precise about this :
is the output the rank of the bit, or simply a single bit ?
maybe the binary encoder can be bypassed with a flag bit
in the instruction ? So we can have the raw priority encoded
result, or the binary encoding, on demand. Of course,
because the binary encoding is just a bunch of ORs,
the priority encoding is necessary to avoid that more than
one bit is set at the input of the binary encoder.


YG

PS : i hope this mail is not too confusing ....
<headache mode="on">

PS2 : here is some pseudo-VHDL that describes
a (non-SIMD) binary encoder :

generic : N

in : vector ((2**N)-1 downto 0);
out : vector (N-1 downto 0);

assert bitcount(in) < 2

for i in 0 to N-1 do -- scan all the output bits
tmp=0;
for j in 0 to (2**N)-1 do -- scan all the input bits
if (( j >> N) & 1 == 1) then -- generate ?
temp= temp OR in(j);
endif
endfor
out(i) <= temp;
endfor

The logic depth corresponds (for a 64-bit input) to a 32-bit input OR,
which is ok. It's probably a bit heavier for SIMD but not much,
just a few ANDs here and there.


*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/