[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: The WindowsBufferProblems



Yes, it definitely still exists. You don't need to "reboot" because
Tor is able to cope with failed read/write/connects. But you might as
well need to, because when the NPP is full, socket operations are
basically useless. Something magical happened in 0.1.1.x that causes
wsaenobufs to not occur on select(), I still haven't figured out
exactly why. Programming for Windows is very strange, sort of like
playing Jenga blindfolded.

The solution, as mentioned on the wiki, is to implement overlapped I/O
in libevent. Although changing the socket paradigm would have serious
repercussions in the other libraries, so I see it as a last resort.

My current working solution is hack around with libevent so that it
uses the built in socket notification routines (see WSAEventSelect()
and WSAWaitForMultipleEvents()), although this is proving quite
difficult since these functions are only intended for use with a small
number of sockets, and if not implemented properly become just as
inefficient as select().

An issue which seems to be seriously dampening my progress are
mysterious "time skips". I seem to be the only one encountering them,
but they occur on two different test machines on two different ISPs in
two different cities.  They are of the form "[notice] Your clock just
jumped 176 seconds forward; assuming established circuits no longer
work."
Have you noticed anything like this in your logs?

I had done some tracing and determined that connect() was blocking in
the Tor win32 socketpair implementation for an unknown reason (even
early in execution, when the NPP wasn't yet being strained). I
implemented my own socketpair over the weekend using non-blocking
sockets hoping it would resolve the issue, but alas it is still
occuring, exactly where it is blocking I have yet to find.

Thanks for looking into this, feel free to get in touch with me
personally if you want to compare notes.

-Mike


On 6/14/06, Ge van Geldorp <ge@xxxxxx> wrote:
After reading the pleas for help on the Tor website on the "WSAENOBUFS"
problem
(http://wiki.noreply.org/noreply/TheOnionRouter/WindowsBufferProblems and
http://bugs.noreply.org/flyspray/index.php?do=details&id=98) I decided to
dig a bit into it. Unfortunately, after two weeks of trying, I still haven't
been able to reproduce the problem.

I have tried with both clean installs of XP Professional SP2 and XP Home SP2
(no registry tweaks or other "fancy" stuff) and Tor 0.1.1.20. Since
indications are that the problem is related to nonpaged pool exhaustion and
the amount of nonpaged pool is related to the amount of RAM installed, I
tried with low RAM amounts (96Mb and 128Mb, resulting in maximum nonpaged
pool sizes of 38Mb and 50Mb resp.). Tor would run happily for days.

The Wiki says "Running a Tor server on a vanilla XP install does not
(easily) trigger the problem. But it can be consistently reproduced if you
also run TCP/IP intensive applications such as P2P clients (BitTorrent,
eDonkey, eMule, etc).". Well, not for me. I had a BitTorrent client with
about 250 peers running alongside Tor. This config didn't even come close to
exhausting the nonpaged pool (around 11Mb of the maximum of 38Mb used). Tor
performed flawlessly.

So, I wrote a test program that just eats up nonpaged pool by building TCP
connections to itself over the loopback interface and on each connection
writing some stuff without ever reading it from the other end (causing it to
be buffered in nonpaged pool). When running the test program to eat (almost)
all nonpaged pool I start to see failures when running Tor (WSAENOBUFS
errors on write, WSAECONNRESET on read, using "info" loglevel). However, Tor
handled these failures gracefully, closing the affected circuits. I didn't
have to reboot to fix. I also didn't see the "do_main_loop(): select failed:
No buffer space available [WSAENOBUFS ] [10055]" (or rather it's modern
equivalent, "libevent call with ... failed: [WSAENOBUFS] [10055]") message a
single time.

So by now I'm starting to wonder if this problem still exists? Has anyone
recently seen a WSAENOBUFS problem which needed a reboot to fix? Even
better, anyone who can consistently reproduce it?

GvG