[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-bugs] #15901 [Tor]: apparent memory corruption -- very difficult to isolate



#15901: apparent memory corruption -- very difficult to isolate
---------------------------+--------------------------------
     Reporter:  starlight  |      Owner:
         Type:  defect     |     Status:  new
     Priority:  critical   |  Milestone:  Tor: 0.2.7.x-final
    Component:  Tor        |    Version:  Tor: 0.2.5.12
   Resolution:             |   Keywords:
Actual Points:             |  Parent ID:
       Points:             |
---------------------------+--------------------------------

Comment (by starlight):

 A theory as to the possible cause came to mind:

 This could be a race-condition bug in OpenSSL.

 They just fixed a race with 1.0.2b, though this is
 not likely the one as session tickets are disabled
 in `tor`.  But it gave me the idea.

 Have held back with this, but now seems reasonable
 to note that one big difference in the setup here
 is that OpenSSL is compiled with gcc 4.9.2 and
 -flto (link-time-optimization).  I believe this
 difference is why the bug shows up here and not
 elsewhere.

 When building with LTO, gcc blurs the boundaries
 between functions by treating separate modules as
 one big ball of code and inlining functions that
 normally would not be eligible.  This can expose
 race conditions where a pointer should be declared
 volatile or protected by a mutex and is not, by
 causing the pointer to be cached for long
 intervals in a register across newly inlined
 function boundaries.  So such a pointer, instead
 of going out of scope and being forced back to
 memory by a function return, stays in a register,
 possibly through a large iteration loop and the
 result is a latent race-condition bug raising its
 evil antenna.

 To be fair the bug could be in `tor` and not
 OpenSSL, but I suspect OpenSSL on an instinct.

 I'll be disabling the second crypto thread in
 `tor` to test this, though this check can only be
 conclusive if it fails.

 A more direct way to finding such issues is to run
 `tor`+`libssl.so` +`libcrypto.so` compiled with
 TSAN (thread sanitizer).  Unfortunately TSAN is
 massive CPU hog and though not as bad a the
 Valgrind Helgrind tool, still makes it impossible
 to run a relay live this way.

 So I'm hoping someone who works with the Tor test
 network and ASAN builds can spend some time trying
 it with TSAN.  I may still take a crack at it at
 some point and run the relay for an few minutes
 with a TSAN build, but the test environment seems
 a better approach on this.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/15901#comment:16>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs