[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: The WindowsBufferProblems
- To: or-dev@xxxxxxxxxxxxx
- Subject: Re: The WindowsBufferProblems
- From: "Mike Chiussi" <chiussi@xxxxxxxxx>
- Date: Thu, 15 Jun 2006 04:21:06 -0400
- Delivered-to: archiver@seul.org
- Delivered-to: or-dev-outgoing@seul.org
- Delivered-to: or-dev@seul.org
- Delivery-date: Thu, 15 Jun 2006 04:21:14 -0400
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=L3L11qi7T8fXGZ6DiXwEDKe/SgOhoNpNVqFelk9mZOwwg79JQ7/rZWIauEGVUL7/B1w/BglTggSJRjoiSBysP5K0RqzoHTHEjS4rkw0cIVtSzOMPdSuCU1lKARoOa4qBAMGfRRZCmU8n8rUDQUDJHlRQtn7aI0Us78hR7qL2498=
- In-reply-to: <200606141823.k5EININs026920@mailhost.geldorp.nl>
- References: <4efbec090606141039i75d09d9fse8526737c423f8fe@mail.gmail.com> <200606141823.k5EININs026920@mailhost.geldorp.nl>
- Reply-to: or-dev@xxxxxxxxxxxxx
- Sender: owner-or-dev@xxxxxxxxxxxxx
On second thought we should have this discussion over or-dev, just so
anyone else can learn from it or share their ideas.
Sorry about the delayed response, I'm on a vampire-ish sleep schedule right now.
On 6/14/06, Ge van Geldorp <ge@xxxxxx> wrote:
Hello Mike,
Thanks for your reply!
First of all, I've seen the time skips too. I don't remember if it was
during my experimentation or on one of my "production" Tor nodes though.
I've seen it only two or three times.
Good! It's nice to know I'm not crazy.
Roger has upgraded this message to a warning and lowered its
reportable value (the current release doesn't report anything under
100 seconds), we'll get a chance to see how widespread this phenomena
is, maybe it's only exaggerated on my systems because of low memory or
processing power.
Before I can even start to think about solutions to the sockets problem, I'd
like to be able to reproduce the problem, which as I said I can't at the
moment. So I hope you can give me some extra information allowing me to
recreate the problem.
First of all, which Tor and Windows versions (Home/Prof, SP2?) are you
using? How much physical memory? When the WSAENOBUFS problems occur, how
much NPPool are you using? How are you connected? Roughly how many Tor
connections do you have? Are you running other network related apps at the
same time? Would you be willing to send the output of "netstat -n" and your
torrc file to me?
You probably haven't been online long enough. For reasons that I'm
still not clear on, Tor clients don't "trust" servers that have a
short uptime.
A trick is to open up a DirPort, this will draw a lot of connections,
my torrc is here
http://www.cdf.toronto.edu/~g4mike/torrc
Here is a netstat -n from my system taken briefly after the first
wsaenobufs incident i noticed.
http://www.cdf.toronto.edu/~g4mike/netstat
The only problem I've been able to create is almost exhausting nppool and
then starting Tor. This will totally exhaust nppool and then some circuits
are closed. When I close my test app and Tor, all nppool is released (after
the socket close timeout), while the Wiki makes it sound like the nppool mem
is gone forever.
I've been playing around with HKLM\SYSTEM\CurrentControlSet\Control\Session
Manager\Memory Management\NonPagedPoolSize, which various sources (including
Microsoft Resource Kit) claim controls the maximum size of the nppool. If
the value is 0 (default), the system will compute a suitable max nppool
size. Otherwise, it is the size of the nppool in bytes. However, I have been
unable to verify that this actually works. I'd change the value, reboot and
find the maximum nppool size unchanged.
I haven't yet experimented with the registry yet, however I don't
think that is going to help. My understanding (I might be wrong here)
is that there is a fairly large amount of space available in the NPP,
but Windows puts a limit per process with the exception of localhost
traffic. For example, when first getting into this I tried writing
aserver which accepted connections and a client which did nothing but
connect and write (all traffic was local). note, the server never read
from the clients the goal was to fill up the NPP. I didn't start
getting wsaenobufs errors until NP usage was around 40 megabytes.
However, Tor would generate wsaenobufs at around 4-5 megabyes of
usage.
-Mike
Best regards, Gé van Geldorp.
> -----Original Message-----
> From: owner-or-dev@xxxxxxxxxxxxx
> [mailto:owner-or-dev@xxxxxxxxxxxxx] On Behalf Of Mike Chiussi
> Sent: Wednesday, June 14, 2006 19:40
> To: or-dev@xxxxxxxxxxxxx
> Subject: Re: The WindowsBufferProblems
>
> Yes, it definitely still exists. You don't need to "reboot"
> because Tor is able to cope with failed read/write/connects.
> But you might as well need to, because when the NPP is full,
> socket operations are basically useless. Something magical
> happened in 0.1.1.x that causes wsaenobufs to not occur on
> select(), I still haven't figured out exactly why.
> Programming for Windows is very strange, sort of like playing
> Jenga blindfolded.
>
> The solution, as mentioned on the wiki, is to implement
> overlapped I/O in libevent. Although changing the socket
> paradigm would have serious repercussions in the other
> libraries, so I see it as a last resort.
>
> My current working solution is hack around with libevent so
> that it uses the built in socket notification routines (see
> WSAEventSelect() and WSAWaitForMultipleEvents()), although
> this is proving quite difficult since these functions are
> only intended for use with a small number of sockets, and if
> not implemented properly become just as inefficient as select().
>
> An issue which seems to be seriously dampening my progress
> are mysterious "time skips". I seem to be the only one
> encountering them, but they occur on two different test
> machines on two different ISPs in two different cities. They
> are of the form "[notice] Your clock just jumped 176 seconds
> forward; assuming established circuits no longer work."
> Have you noticed anything like this in your logs?
>
> I had done some tracing and determined that connect() was
> blocking in the Tor win32 socketpair implementation for an
> unknown reason (even early in execution, when the NPP wasn't
> yet being strained). I implemented my own socketpair over the
> weekend using non-blocking sockets hoping it would resolve
> the issue, but alas it is still occuring, exactly where it is
> blocking I have yet to find.
>
> Thanks for looking into this, feel free to get in touch with
> me personally if you want to compare notes.
>
> -Mike