[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #12464 [Tor]: When Tor 0.2.6.x is closer to done, profile relays running Tor 0.2.6.x and optimize accordingly
#12464: When Tor 0.2.6.x is closer to done, profile relays running Tor 0.2.6.x and
optimize accordingly
------------------------+-------------------------------------------------
Reporter: nickm | Owner: dgoulet
Type: defect | Status: assigned
Priority: normal | Milestone: Tor: 0.2.6.x-final
Component: Tor | Version:
Resolution: | Keywords: tor-relay performance 026-triaged-1
Actual Points: | Parent ID:
Points: |
------------------------+-------------------------------------------------
Comment (by yawning):
Replying to [comment:8 tmpname0901]:
> The reference to __memcpy_sse2_unaligned() above reminds me that data
should always be aligned for more efficient read/write.
>
> There are tools (Valgrind?) that can report this. For x86(_64), buffers
should always be aligned to at least mod 16.
Note: Discussing Intel and compatible processors here.
To be pedantic:
> **Assembly/Compiler Coding Rule 46. //(H impact, H generality)** Align
data on natural operand size address boundaries. If the data will be
accessed with vector instruction loads and stores, align the data on
16-byte boundaries.//
The performance hit for non-vector access comes when an access straddles a
cache line boundary (64 bytes), unless the processor is an iPotato86 that
should have been retired a long time ago (P1, PMMX, AMD <= K8). Vector
access needs to be 16 byte aligned, unless you are using AVX where it
doesn't matter (So looking towards the bright future, this will matter
less and less).
The memcpy that I imagine dominates memcpy's runtime would be the one in
buffers.c:`write_to_buf()`, invoked from `connection_write_to_buf()` from
the `RELAY_COMMAND_DATA` handler logic (a quick skim shows that everything
else is infrequent, should be reasonably aligned, or the copy is too small
to be interesting).
Here the destination will only always be nicely aligned when a new chunk
is allocated for the buffer (or if the data in the buffer happened to end
on a 16 byte boundary), and the source will never be aligned correctly
(`cell->payload + RELAY_HEADER_SIZE`).
I'm currently of the opinion that before messing with this, faster crypto
will gain more mileage for our development time.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/12464#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs