[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[f-cpu] strchr() again
Hi F-gang,
here is my version of strchr(), in `mr-as' syntax as usual. It's similar
to deviks version, but I use my optimized 5-cycle main loop, and I handle
some things differently. As a result, the prologue is slightly shorter.
I hope that I got the matching conditions right - there may be several
matches with both the target character and 0, and only the very first
one is valid. If there only is a match with zero, the function will take
the emergency exit (note that the sequence of the jump instructions is
important). The epilogue contains a little speculative execution, in
order to avoid stalls (if you can still find some, or you find a bug,
please tell me).
If there is no match, the routine takes approximately 13 + 5/8 * n
instructions (n = string length). An empty string is scanned in 16
cycles. If a match is found, there are approximately 20 + 5/8 * n
instructions to execute (n = relative position of the match). Compared
to a bytewise search, which will take at least 3 cycles per byte (load,
compare and jump), the overhead amortizes for n >= 6...9 bytes, depending
on whether there's a match and where it is located. At n >= 35...54 bytes,
speed increases above 1.0 bytes/cycle, asymptotically approaching 1.6
bytes/cycle (for a 64-bit CPU - double width, double throughput!).
.align 32 // align to cache line boundary
.global strchr
strchr:
// prologue
andni $7, r1, r3 // aligned `long*' pointer
andi $7, r1, r4 // byte offset
loadi $8, r3, r5 // load first data word
shiftli $3, r4, r7 // 8 * byte offset
orn r0, r0, r8 // -1L
sdup.b r2, r6 // duplicate character value
shiftl r7, r8, r9 // mask
loadaddri $found-.-4, r31
move r0, r1 // preload NULL result
// first compare
xnor.and.b r6, r5, r10 // compare with character
xnor.and.b r0, r5, r11 // compare with 0
loadi $8, r3, r5 // load second data word
and r9, r10, r12 // mask off unused bytes
and r9, r11, r13 // mask off unused bytes
jmpnz r12, r31 // found character
jmpnz r13, r63 // found 0, terminate
// main loop
loopentry r30
xnor.and.b r6, r5, r12 // compare with character
xnor.and.b r0, r5, r13 // compare with 0
loadi $8, r3, r5 // load next data word
jmpnz r12, r31 // found character
jmpz r13, r30 // loop unless a 0 occurred
jmp r63 // terminate
// XXX: align the epilogue? A single NOP would do it.
// epilogue
found:
// Found a matching character, but there may also be a matching
// zero. Check which one comes first (if both are in the same
// place, we must have been looking for the 0).
lsb1 r12, r14 // position of first matching byte
lsb1 r13, r15 // position of first zero byte (if any)
subi $16, r3, r18 // roll back pointer (it's 2 words ahead)
shiftri $3, r14, r17 // offset of matching byte
cmple r15, r14, r16 // match is valid if r14 <= r15 (or r15 == 0)
add r17, r18, r1 // pointer to matching byte
jmpz r15, r63 // return pointer if there was no matching 0
movez r16, r0, r1 // if terminating 0 came before the match
jmp r63
For ultra-fast execution, we can define a variant that requires strict
8-byte alignment. We may also combine both versions (check the alignment
at the start of the general routine, and branch to the slow version if
it's not ok). The epilogue is the same, therefore I omit it here (the
epilogue code may even be shared by both versions). The main loop also
is the same, but I wanted to avoid the unnecessary loadaddr and jmp
instructions required for jumping to the existing loop :)
aligned_strchr:
// prologue
loadi $8, r1, r5 // load first data word
sdup.b r2, r6 // duplicate character value
loadaddri $found-.-4, r31
move r1, r3 // move pointer away
move r0, r1 // preload NULL result
// XXX: better align the main loop
// (can be done by moving the entry point above)
// main loop
loopentry r30
xnor.and.b r6, r5, r12 // compare with character
xnor.and.b r0, r5, r13 // compare with 0
loadi $8, r3, r5 // load next data word
jmpnz r12, r31 // found character
jmpz r13, r30 // loop unless a 0 occurred
jmp r63 // terminate
Guess what? This was fun! :)
--
Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
"All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/