On Thu, Jun 26, 2014 at 8:36 AM, Robin <imer@xxxxxxx> wrote:
Hey,
I'm still investigating the issues from my other email and came across
another thing.
This might, again, be due to memory corruption on my side - I'm hoping it
isn't though.
I implemented a simple timer-event class which basically creates an event
with a timeout and calls an std::function once that timeout is hit.
Every now and then I get crashes in my ping event.
The ping event is a lambda which captures the "this" pointer of my socket
and just pings the client every 30 seconds.
The crashes are always related to the ping event having a nullptr as the
socket pointer.
So I did some recording in gdb and came to the conclusion it gets set to 0
in the destructor of the event class, which gets called when a socket
disconnects.
Backtrace can be seen here:
http://puu.sh/9KSs8/927aa257e2.txt
Destructor just calls event_del, if the event is running, followed by an
event_free
It rarely happens, I had to wait for that crash for about 5hours, thats 10k
connections minimum.
This looks to me more like a symptom of whatever other event_base
corruption issue is going on than it does like an independent bug,
though I can't know for sure...
This is the kind of thing that could totally benefit from some example
code to show the problem. I'm afraid that I can't be too sure of
what's going on in between the layers of C++ glue in your backtrace.
Could it be the event doesnt get canceled when it is pending?
Like the event "queue" could look something like this
1. Disconnect socket x
2. Socket y recieve
4. Socket x send
5. Event z timeout
event_del() really is supposed to cancel an event. Generally
speaking, you need to event_del() or event_free() any event that uses
X as its data pointer before you free X.
(Things get a _little_ hairy if that event if running in one thread
and you event_del() it in another: Libevent 2.0 handles that
differently from the latest libevent 2.1 alphas. Could that be going
on here?)
One possible temporary workaround, to help verify whether this is the
bug or whether it's something else, would be to use a weak reference
to the socket rather than a pointer to the socket itself.
hope this helps,