[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Tor Network response slowing to zero



On Wed, Feb 02, 2005 at 01:40:22PM -0600, Mike Perry wrote:
> Thus spake clifnor@xxxxxxxxxxxx (clifnor@xxxxxxxxxxxx):
> > Can other users confirm that network response is degrading badly during
> > the past 2-3 weeks? I have a decent home cable link (1.8Mb down/356 up)
> > but I am having to wait an average of 15-30 seconds to get any
> > response--sometimes the connection just hangs. Is there any data being
> > collected on network response? 

One of the immediate reasons for this, I think, is that the big
exit nodes are limited at 1024 open file descriptors, and as
Giorgos points out, they're hitting that limit. See my post
http://archives.seul.org/or/talk/Jan-2005/msg00144.html for details.

Another big reason is that we're getting increasingly hammered by
file-sharers. This has caused some of the bigger nodes to decide they
don't want to carry this much traffic. We need to teach clients to back
off better when things are going wrong; I'm not sure of the best way to
do that yet.

We also have some stability issues, as Giorgos points out. These are
getting resolved. (If you folks want to try running the CVS code again,
go for it. It might just work for you this time. :)

Plus, we're trying to adapt to having other Tor clients in the system.
JAP has implemented a Tor client and they're getting ready to release it
(or they have already, I'm not sure). I'm not sure how I feel about this,
because they only wrote a client, not a server, so in some sense they're
just as bad as the filesharing folks. (In another sense, yay privacy,
more privacy good.)

On top of all this is the blacklisting issue, where we're in a standoff
with various projects that would rather see the Tor network as a whole
go away. (And the funny thing is, every single person who wants to see
Tor die starts out his statement with "I think Tor is a really important
idea, but". It's like they're afraid to be seen as not hip, yet they
still want it to die.)

Clearly, the number one answer is that if Tor isn't working for you,
please run (or find somebody to run) a high-bandwidth node. Or failing
that, please run any sort of node (cable or dsl is fine). If everybody
only uses Tor as a client, and expects it to work for them, then this
whole experiment will fail.

> Well, in the past two weeks the number of nodes and available network
> bandwidth has fallen due to a combination of stability and
> blacklisting issues.  http://www.noreply.org/tor-running-routers/

Right.

On the one hand, we theorize that the network will stabilize: people who
are getting crappy performance and don't like it will get frustrated and
leave, freeing up bandwidth for the rest of the people. So Tor will grow
and shrink in cycles, as it adapts to how much it's getting hammered.

This theory falls apart though if the file sharers are more tolerant
of crappy performance than the rest of the users. It could be that the
file sharers will stick around until the whole network is pushed into
the ground.

There's a new default exit policy in CVS, which blocks more default
file-sharing ports:

ExitPolicy reject 0.0.0.0/8,reject 169.254.0.0/16,reject 127.0.0.0/8, reject 192.168.0.0/16,reject 10.0.0.0/8,reject 172.16.0.0/12
ExitPolicy accept *:20-22,accept *:53,accept *:79-81,accept *:110,accept *:143,accept *:389,accept *:443,accept *:636,accept *:706,accept *:873,accept *:993,accept *:995
ExitPolicy reject *:1214,reject *:4661-4666,reject *:6346-6347,reject *:6419,reject *:6881-6889
ExitPolicy accept *:1024-65535,reject *:*"

Feel free to apply it to your exit server if you like. Or if you run CVS,
you get it automatically. This approach isn't meant to definitively stop
all instances of file-sharing. Rather, it's meant to make casual users
think Tor doesn't work for them, so they go away. (A few exit servers
will likely still allow many ports; but in this case the protocols that
exit at them are bottlenecked by those servers.)

Or we could take more drastic measures. For example, we could recommend
an exit policy which accepts 20-22,53,79-81,110,143,389,443,636,706,873,
993,995,1863,5050,5190,5222,5223,6667,8080,8300,8888, and rejects the
rest.

Heck, we could accept just 80 and 443 and tell people using the other
protocols to go bugger off.

(A related plan would be to put priorities on certain ports, such as
80 and 443, and when using those we ignore servers with capacity less
than 100kB. I'm not sure how this would play out in the tragedy of
the commons.)

In some sense, these could be temporary measures. Tor is still in
development, and it needs more work on pretty much all fronts. We need
to focus on all this development rather than (or at least 'as well as')
just being reactionary about the immediate issues. As long as we have
some sort of deployed network finding bugs for us, we can move forward
on that.  All the publicity we've gotten from having a working system
has distracted us from making a system that will actually work. :)

What do you folks think?

> I've been wondering for some time if it might be possible to attempt
> more intelligent node selection when building circuits. I've noticed
> that connection setup time has quite a bit of variance to it, and I'm
> guessing that this is due to occasional selection of heavily loaded
> nodes. Would it be possible to include a statistic in the directory
> for node load? Maybe something as simple as using the ratio of a
> throughput stat to expected bandwidth as the node selection criteria
> instead of uptime?

Right now we choose nodes for our circuits proportional to their declared
capacity (the most bytes they've seen themselves able to handle, both
incoming and outgoing, in the past 24 hours, or their BandwidthRate if
it's lower).

So if everybody knows that everybody is choosing paths that way, that
seems pretty close to optimal under our constraints.

I suppose we could weight it further by the proportion of its capacity
it has been using lately, but this leads to your next question:

> How quickly can the directory be updated? Would this introduce cyclic
> load problems, or perhaps other security concerns?

Right now servers upload new descriptors every 20 minutes, servers fetch
full directories every hour, and clients fetch full directories (once
an hour) from any server that has it mirrored. So we're talking an hour
or more before the information propagates. And we'd like to crank those
numbers *up*, not down, since they still represent significant overhead,
and since we're still hoping to decentralize the network discovery process
even more. I guess we could back down on that hope if you convinced us
things would work better this way. But I think the problem is not that
the Tor network is mis-allocating its bandwidth, I think it's that we
simply don't have any left.

In short: run a server. :)

--Roger