Christopher Layne wrote: > On Sat, Feb 17, 2007 at 04:04:04PM -0500, Nick Mathewson wrote: >> On Fri, Feb 16, 2007 at 10:12:20PM -0500, Watson Ladd wrote: >> [...] >>> 2: Won't strlcpy run in time proportional to n? strncpy can't be much >>> faster as both functions need to loop from 0..n, load the character and >>> check if its null, and then copy it. The big difference is strncpy keeps >>> on going. >> You're confused about how strlcpy works when n is much smaller than >> the length of the source string. Look at the return value of strlcpy: >> it's the length of the original string. To find the length of the >> original string, strlcpy basically has to walk over the whole thing. >> So while the running time of strncpy(dst,src,siz) is O(siz), >> strlcpy(dst,src,siz) is O(strlen(src)). That's a big problem for >> tor_strndup: its most performance-relevant use case is copying a small >> portion of a much longer string, as when we're parsing router >> descriptors. In this use case, strncpy is obviously better. > > Technically though shouldn't strlcpy() not have to do this? It already > knows thelength, as it is provided to it. The only anecdote to this is > that it will supercede an 'actual' length if it detects a '\0' along the > copy. This isn't typically bad as it's already walking the string > anyways. It provides the length of the source string, not the destination string. This lets you compute the space you need, but is slower. I made the same mistake. > > e.g. OpenBSD copy (same as Tor's): > /* > * Copy src to string dst of size siz. At most siz-1 characters > * will be copied. Always NUL terminates (unless siz == 0). > * Returns strlen(src); if retval >= siz, truncation occurred. > */ > size_t strlcpy(char *dst, const char *src, size_t siz) > { > register char *d = dst; > register const char *s = src; > register size_t n = siz; > > if (n == 0) > return(strlen(s)); > while (*s != '\0') { > if (n != 1) { > *d++ = *s; > n--; > } > s++; > } > *d = '\0'; > > return(s - src); /* count does not include NUL */ > } > > Provided the string 'onion\0routing\0', with n being 13, only 6 loop > cycles will actually happen. No, OpenBSD strlcpy is different. It returns strlen(src) which Tor copy doesn't. > >> This isn't one of those hypothetical armchair optimizations, by the >> way: we profiled Tor before and after, and found out that the time >> spent copying strings went way down after we made the change. (I'm >> afraid I don't still have the numbers; this change came in the >> 0.0.9pre6 develoment, was back in November of 2004.) > > Unfortunately this may be due to the fact that strlcpy() vs strncpy()s > implementation may have been one of C vs asm. :-/ > > I've always been on the fence about the whole strlcpy vs strncpy vs > snprintf vs sprintf("%.*s", len, str) issue. For one, I tend to not > use many str* functions in general. Secondarily, the "bonus" of strncpy > zero-filling the remaining end of the dst buffer as some kind of security > benefit that others have brought up in the past, I find to be completely > baseless as it's depending on defensive programming and the possibility of > a buffer overflow happening in the first place. > > Also, when it comes to strlcpy, you're actually better off using: > > /* provided we've allocated dst as s_len + 1 bytes */ > > l = MIN(d_len, s_len); > memcpy(dst, src, l); dst[l] = '\0'; I don't think we need s_len+1 bytes. I think we need l+1 bytes as we could be copying a small section of a giant string, so allocating s_len+1 bytes will be too much most of the time. Isn't the basis for using stncpy that we are only copying a small section of huge strings? > > The rationale here being that you've handled the length issues, you already > know the length (typically) of the source ahead of time (how else would you > pre-allocate for dst? if it's an auto or static buffer, then you know the > length as well). > > In most asm implementations of str* functions the maximum length they will > operate on, chunk-wise, is a 4-byte integer. The reason being that they > will do the old 'is there a null in 4 bytes' trick, check it's alignment, > and if everything looks good, copy off the equivalent space of an int in > one call. But the bonus of using memcpy() is that memcpy() knows the entire > length ahead of time - and if it can make use of, provided the asm backend > code is there, it will use higher order optimizations such as mmx and/or > sse ops to copy much more than 4 bytes at once. It also doesn't have to > deal with even checking for nul's at all. It's entire goal is to copy data > as fast as possible, especially since it's use is widespread inside the > kernel and networking code. > > This is from Solaris. It does not do any mmx or amd-specific tricks to > make things even faster, but it is relatively optimized and nothing out > of the ordinary. > > memcpy.s: > 40 ENTRY(memcpy) > 41 movl %edi,%edx / save register variables > 42 pushl %esi > 43 movl 8(%esp),%edi / %edi = dest address > 44 movl 12(%esp),%esi / %esi = source address > 45 movl 16(%esp),%ecx / %ecx = length of string > 46 movl %edi,%eax / return value from the call > 47 > 48 shrl $2,%ecx / %ecx = number of words to move > 49 rep ; smovl / move the words > 50 > 51 movl 16(%esp),%ecx / %ecx = number of bytes to move > 52 andl $0x3,%ecx / %ecx = number of bytes left to move > 53 rep ; smovb / move the bytes > 54 > 55 popl %esi / restore register variables > 56 movl %edx,%edi > 57 ret > 58 SET_SIZE(memcpy) > > > strncpy.s: > 65 ENTRY(strncpy) > 66 pushl %edi / save register variables > 67 pushl %esi > 68 > 69 movl 16(%esp), %eax / %eax = source string address > 70 movl 12(%esp), %edi / %edi = destination string address > 71 movl 20(%esp), %esi / %esi = number of bytes > 72 > 73 testl $3, %eax / if %eax not word aligned > 74 jnz .L1 / goto .L1 > 75 .L8: > 76 cmpl $4, %esi / if number of bytes < 4 > 77 jb .L4 / goto .L4 > 78 .align 4 > 79 .L2: > 80 movl (%eax), %edx / move 1 word from (%eax) to %edx > 81 movl $0x7f7f7f7f, %ecx > 82 andl %edx, %ecx / %ecx = %edx & 0x7f7f7f7f > 83 addl $4, %eax / next word > 84 addl $0x7f7f7f7f, %ecx / %ecx += 0x7f7f7f7f > 85 orl %edx, %ecx / %ecx |= %edx > 86 andl $0x80808080, %ecx / %ecx &= 0x80808080 > 87 cmpl $0x80808080, %ecx / if null byte in this word > 88 jne .L3 / goto .L3 > 89 movl %edx, (%edi) / copy this word to (%edi) > 90 subl $4, %esi / decrement number of bytes by 4 > 91 addl $4, %edi / next word > 92 cmpl $4, %esi / if number of bytes >= 4 > 93 jae .L2 / goto .L2 > 94 jmp .L4 / goto .L4 > 95 .L3: > 96 subl $4, %eax / post-incremented > 97 .align 4 > 98 .L4: > 99 / (number of bytes < 4) or (a null byte found in the word) > 100 cmpl $0, %esi / if number of bytes == 0 > 101 je .L7 / goto .L7 (finished) > 102 movb (%eax), %dl / %dl = a byte in (%eax) > 103 decl %esi / decrement number of bytes by 1 > 104 movb %dl, (%edi) / copy %dl to (%edi) > 105 incl %eax / next byte > 106 incl %edi / next byte > 107 cmpb $0, %dl / compare %dl with a null byte > 108 je .L5 / if %dl is a null, goto .L5 > 109 jmp .L4 / goto .L4 > 110 .align 4 > 111 .L1: > 112 / %eax not aligned > 113 cmpl $0, %esi / if number of bytes == 0 > 114 je .L7 / goto .L7 (finished) > 115 movb (%eax), %dl / %dl = a byte in (%eax) > 116 decl %esi / decrement number of bytes by 1 > 117 movb %dl, (%edi) / copy %dl to (%edi) > 118 incl %edi / next byte > 119 incl %eax / next byte > 120 cmpb $0, %dl / compare %dl with a null byte > 121 je .L5 / if %dl is a null, goto .L5 > 122 testl $3, %eax / if %eax word aligned > 123 jz .L8 / goto .L8 > 124 jmp .L1 / goto .L1 (not word aligned) > 125 .align 4 > 126 .L5: > 127 movl %esi, %ecx / %ecx = length to copy null bytes > 128 xorl %eax, %eax / clear %eax > 129 shrl $2, %ecx / %ecx = words to copy null bytes > 130 rep ; sstol / rep;sstol is optimal > 131 andl $3, %esi / %esi = leftover bytes > 132 .L6: > 133 cmpl $0, %esi / if number of bytes == 0 > 134 jz .L7 / goto .L7 (finished) > 135 movb $0, (%edi) / move a null byte to (%edi) > 136 decl %esi / decrement number of bytes by 1 > 137 incl %edi / next byte > 138 jmp .L6 / goto .L6 > 139 .align 4 > 140 .L7: > 141 movl 12(%esp), %eax / return the destination address > 142 popl %esi / restore register variables > 143 popl %edi > 144 ret > 145 SET_SIZE(strncpy) > > > Both functions are in optimized assembly - but you don't even have be to be > very knowledgable about asm (which I am not that much) to know that memcpy() > will be faster than strncpy(), period. > > > > So consider this simple little test program on Linux 2.6.20 w/ gcc-4.1.1: > > $ cat strn.c > #include <string.h> > #include <stdlib.h> > > static char buf[] = "an example string to use for copying bytes"; > /* word offsets: 0 1 2 3 4 5 6 7 8 9 0 */ > > static size_t strlcpy(char *dst, const char *src, size_t siz) > { > register char *d = dst; > register const char *s = src; > register size_t n = siz; > > if (n == 0) > return(strlen(s)); > while (*s != '\0') { > if (n != 1) { > *d++ = *s; > n--; > } > s++; > } > *d = '\0'; > > return(s - src); /* count does not include NUL */ > } > > int main(int argc, char **argv) > { > char *q; > int i, l = sizeof buf - 1; > > if (argc <= 1) > return EXIT_FAILURE; > if ((q = malloc(l * sizeof *q + 1)) == NULL) > return EXIT_FAILURE; > > switch (*argv[1]) { > case 'm': > for (i = 16777216; i--; ) { > /* one doesn't typically copy nul */ > memcpy(q, buf, l); q[l] = '\0'; > } > break; > case 'l': > for (i = 16777216; i--; ) > strlcpy(q, buf, l); > break; > case 'n': > for (i = 16777216; i--; ) > strncpy(q, buf, l); > break; > case 's': > for (i = 16777216; i--; ) > strcpy(q, buf); > break; > default: break; > } > > free(q); > > return 0; > } > > $ gcc -static -O3 -W -Wall -pedantic -o strn strn.c > $ for i in l m s n; do TIMEFORMAT="$i r: %3lR u: %3lU s: %3lS"; for n in 0 1 2 3 4 5 6 7; do time ./strn $i; done; echo; done > l r: 0m2.593s u: 0m2.380s s: 0m0.004s > l r: 0m2.491s u: 0m2.436s s: 0m0.004s > l r: 0m2.494s u: 0m2.408s s: 0m0.000s > l r: 0m2.425s u: 0m2.364s s: 0m0.012s > l r: 0m2.494s u: 0m2.408s s: 0m0.008s > l r: 0m2.432s u: 0m2.380s s: 0m0.000s > l r: 0m2.530s u: 0m2.420s s: 0m0.000s > l r: 0m2.440s u: 0m2.388s s: 0m0.000s > > m r: 0m0.546s u: 0m0.508s s: 0m0.008s > m r: 0m0.526s u: 0m0.508s s: 0m0.000s > m r: 0m0.545s u: 0m0.512s s: 0m0.000s > m r: 0m0.545s u: 0m0.512s s: 0m0.004s > m r: 0m0.545s u: 0m0.508s s: 0m0.000s > m r: 0m0.534s u: 0m0.508s s: 0m0.000s > m r: 0m0.547s u: 0m0.508s s: 0m0.004s > m r: 0m0.526s u: 0m0.508s s: 0m0.004s > > s r: 0m1.738s u: 0m1.688s s: 0m0.000s > s r: 0m1.760s u: 0m1.692s s: 0m0.004s > s r: 0m1.813s u: 0m1.712s s: 0m0.004s > s r: 0m1.736s u: 0m1.700s s: 0m0.004s > s r: 0m1.764s u: 0m1.700s s: 0m0.004s > s r: 0m1.757s u: 0m1.696s s: 0m0.004s > s r: 0m1.745s u: 0m1.688s s: 0m0.008s > s r: 0m1.749s u: 0m1.692s s: 0m0.000s > > n r: 0m1.913s u: 0m1.860s s: 0m0.004s > n r: 0m1.945s u: 0m1.872s s: 0m0.000s > n r: 0m1.920s u: 0m1.868s s: 0m0.000s > n r: 0m1.951s u: 0m1.876s s: 0m0.004s > n r: 0m1.973s u: 0m1.868s s: 0m0.036s > n r: 0m1.918s u: 0m1.864s s: 0m0.000s > n r: 0m1.922s u: 0m1.864s s: 0m0.004s > n r: 0m1.935s u: 0m1.864s s: 0m0.004s > > Regardless of the differences between str, strn, and strl, memcpy just blows > them out of the water. I even included the terminating nul for realistic use. > > -cl > >
Attachment:
signature.asc
Description: OpenPGP digital signature