[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #15901 [Tor]: apparent memory corruption from control channel request processing -- very difficult to isolate
#15901: apparent memory corruption from control channel request processing -- very
difficult to isolate
-----------------------+-------------------------------
Reporter: starlight | Owner:
Type: defect | Status: new
Priority: normal | Milestone:
Component: Tor | Version: Tor: 0.2.5.12
Keywords: | Actual Points:
Parent ID: | Points:
-----------------------+-------------------------------
I have encountered what appears to be a memory
corruption bug. Have reproduced it two times
since the initial incident.
Do not believe this is a remotely exploitable
bug as it only happens on the next consensus
download after a pair of locally written
control-channel scripts are run manually.
Am looking for advice on how to isolate the
issue after spending significant effort on it
thus far.
1) first occurred with 0.2.4.26 shortly after
building OpenSSL 1.0.1m shared library to
replace 1.0.1j; perhaps is related to
this but concurrently had substantially
increased relay bandwidth, which also might
have led to the conditions that triggered
the issue
2) saw start-up warning regarding difference
between OpenSSL build and runtime versions.
Never believed this was and issue and it
proved to be irrelevant
3) tried running 0.2.4.26 built with ASAN
but did not reproduce
4) built 0.2.5.12 and the bug recurred
5) ran 0.2.5.12 built with ASAN along
with OpenSSL 1.0.1m ASAN and libevent 2.0.21
ASAN and did not reproduce; tried with
libraries non-ASAN and did not reproduce
6) back to 0.2.5.12 standard build, but with
minor patch to enable core files and have
stdout+stderr directed to files; reproduced
the problem again and obtained a good
core file via SIGSEGV; core file is
fully intact and accessible with 'gdb';
thus far have chosen not to delve into
the core
7) tried running again with
MALLOC_CHECK_=3 MALLOC_PERTURB_=85
but did not reproduce the problem
8) have gone back to (6) configuration,
but so far no bug; relay has received
ever higher consensus bandwidth and
traffic since (6) and so perhaps the
sweat-spot for producing the bug is
no longer present
Have a vague suspicion that the problem
is tied to a race condition between the
main thread and the crypto thread.
The setup here is unique in a variety of
ways and I believe these differences
are the reason I see this problem where
others have not. Seems prudent at this
point to not include much detail.
At some point I deleted the cached-*
files--possibly this has prevented
reproducing the issue since but
I can't remember offhand. Have daily
network backups and can recover
various generations of these files.
In the two events where a "clean"
shutdown was obtained, `unparsable-desc`
files were written and these were
retained. In the case of the
SIGSEGV termination this file did
not appear.
Please note the
{{{
ISO time "2015-XX-XX v6Dp0:05" was unparseable
}}}
messages in the first incident. It appears to
me that four bytes of memory were overwritten
here. Unfortunately I was not patient enough
to wait for the second consensus attempt the
second and third times this happened so it's
unclear if this happens consistently. Perhaps
instrumenting this string with debug code
might lead to isolating the problem.
I would appreciate any advice or help
that might lead to isolating the bug,
ideally by triggering it in with the
ASAN build running and thus getting
directly to the problem. Hopefully
someone closely familiar with the
relay code might notice something
indicating a direction to pursue.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/15901>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs