[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] Manual problems and strchr optimized routine

regarding manual 0.2.7b from point of casual reader:

** logic
There might be explained what is meant by andn. These
can be found at page 54 but is missing in OPs reference page.

** cand/cor
There might be mentioned that is sets ALL bits of word to
the result of and/or of bits. Because problem with mux insn
I wasn't even understand it from examples :(

** mux
The description doesn't make sense as English sentence to me.
Also from ROP2-YG-2001201.tgz sources it seems that mux is
done bitwise, so that what is a size flag here for ?

I also tried to make strchr function to learn more about
f-cpu ISA. The funtion is attached and seems to have only
one stall while testing 1.3 characters per cycle.
If someone would be looking at it tell me please whether
I coded it optimaly or whether I understand fc0 scheduling
badly ...

; FC0 optimized strchr by devik@cdi.cz
; It should be faster that byte-wide loop from 10 bytes upwards
; as it tests 16 characters each 12 cycles
		andi 7,a0,t10		; byte offset
		bseti 4,r0,t9		; prepare constant 16
		shiftir 3,t10,t11	; offset*8 for mask creation
		msub t10,a0,a0		; align a0 to 64bit boundary
		bset t11,r0,t11		; create mask seed
		sdup.b a1,t0		; make char-compare mask
		loadaddri fnd-$,t16	; address of fnd label
		madd t9,a0,t3		; prepare secondary pointer
		dec t11,t13			; make first-test mask

		loadi1 8,a0,t1		; first (incomplete) block
		loadi1 16,t3,t2		; cycle prolog
		scand.xnor.b t0,t1,t6	; test for chars
		scand.xnor.b r0,t1,t7	; and zeros
		loadi1 16,a0,t1		; cycle prolog
		or t6,t7,t8			; found anything ?
		bseti 3,t9,t14		; prepare constant 24 = 16(t9)+8
		and t8,t13,t8		; keep only interesting part [stall]
		bseti 5,r0,t15		; prepare constant 32
		jmp.nz t8,t16		; jmp to fnd:		

		loopentry t5
		loadi1 16,a0,t1	; big endian loads
		scand.xnor.b t0,t2,t8
		scand.xnor.b r0,t2,t9
		scand.xnor.b t0,t1,t6
		scand.xnor.b r0,t1,t7
		or t8,t9,t11
		or t6,t7,t10
		loadi1 16,t3,t2
		or t10,t11,t12
		loadi1 16,a0,t1
		jmp.z t12,t5
		move.z t10,t8,t6
		move.z t10,t9,t7	; select correct half
		cmpl t6,t7,t8		; found \0 or char ?
		msb1 t6,t7			; prepare byte offset
		move r0,rv
		jmp.nz t8,ra		; \0 found, return NULL

		shiftir 3,t7,t7		; finish byte offset
		move.z t10,t14,t15	; final constant in t15 (24 or 32)
		sub t7,t15,t7		; final offset
		madd a0,t7,rv		; update pointer [stall]
		jmp ra				; finish (madd still running)