[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[f-cpu] Manual problems and strchr optimized routine
- To: <f-cpu@seul.org>
- Subject: [f-cpu] Manual problems and strchr optimized routine
- From: devik <devik@cdi.cz>
- Date: Thu, 28 Nov 2002 16:51:14 +0100 (CET)
- Delivered-To: archiver@seul.org
- Delivered-To: f-cpu-outgoing@seul.org
- Delivered-To: f-cpu@seul.org
- Delivery-Date: Thu, 28 Nov 2002 10:58:15 -0500
- Reply-To: f-cpu@seul.org
- Sender: owner-f-cpu@seul.org
Hi,
regarding manual 0.2.7b from point of casual reader:
** logic
There might be explained what is meant by andn. These
can be found at page 54 but is missing in OPs reference page.
** cand/cor
There might be mentioned that is sets ALL bits of word to
the result of and/or of bits. Because problem with mux insn
I wasn't even understand it from examples :(
** mux
The description doesn't make sense as English sentence to me.
Also from ROP2-YG-2001201.tgz sources it seems that mux is
done bitwise, so that what is a size flag here for ?
-------
I also tried to make strchr function to learn more about
f-cpu ISA. The funtion is attached and seems to have only
one stall while testing 1.3 characters per cycle.
If someone would be looking at it tell me please whether
I coded it optimaly or whether I understand fc0 scheduling
badly ...
thanks,
devik
; FC0 optimized strchr by devik@cdi.cz
; It should be faster that byte-wide loop from 10 bytes upwards
; as it tests 16 characters each 12 cycles
strchr:
andi 7,a0,t10 ; byte offset
bseti 4,r0,t9 ; prepare constant 16
shiftir 3,t10,t11 ; offset*8 for mask creation
msub t10,a0,a0 ; align a0 to 64bit boundary
bset t11,r0,t11 ; create mask seed
sdup.b a1,t0 ; make char-compare mask
loadaddri fnd-$,t16 ; address of fnd label
madd t9,a0,t3 ; prepare secondary pointer
dec t11,t13 ; make first-test mask
loadi1 8,a0,t1 ; first (incomplete) block
loadi1 16,t3,t2 ; cycle prolog
scand.xnor.b t0,t1,t6 ; test for chars
scand.xnor.b r0,t1,t7 ; and zeros
loadi1 16,a0,t1 ; cycle prolog
or t6,t7,t8 ; found anything ?
bseti 3,t9,t14 ; prepare constant 24 = 16(t9)+8
and t8,t13,t8 ; keep only interesting part [stall]
bseti 5,r0,t15 ; prepare constant 32
jmp.nz t8,t16 ; jmp to fnd:
loopentry t5
loadi1 16,a0,t1 ; big endian loads
scand.xnor.b t0,t2,t8
scand.xnor.b r0,t2,t9
scand.xnor.b t0,t1,t6
scand.xnor.b r0,t1,t7
or t8,t9,t11
or t6,t7,t10
loadi1 16,t3,t2
or t10,t11,t12
loadi1 16,a0,t1
jmp.z t12,t5
move.z t10,t8,t6
move.z t10,t9,t7 ; select correct half
fnd:
cmpl t6,t7,t8 ; found \0 or char ?
msb1 t6,t7 ; prepare byte offset
move r0,rv
jmp.nz t8,ra ; \0 found, return NULL
shiftir 3,t7,t7 ; finish byte offset
move.z t10,t14,t15 ; final constant in t15 (24 or 32)
sub t7,t15,t7 ; final offset
madd a0,t7,rv ; update pointer [stall]
jmp ra ; finish (madd still running)