On Fri, Jun 01, 2007 at 10:30:32AM -0700, Michael_google gmail_Gersten wrote: > Proposal for Tor. Hi, and sorry about the delay. Stuff has been really busy. I hope Roger will have the time to chime in too. First, this proposal needs a different name. Generally, you want to name proposals after the mechanism, not the goal: otherwise everybody who has a proposal to make Tor faster writes something called "Speed up Tor" or "Make Tor faster" or something. Second, you should check out 111-local-traffic-priority.txt; that's another proposal for speeding stuff up, based on creating incentives for people to run more servers. At the base, it looks like you want to do a few things: a) limit CPU usage on heavily loaded servers b) let servers turn down connections when they're already loaded. I agree that a) is important. I'm not sure that b) is the way to use it; it seems to me that the end effect could easily be that mostly people can't get on the network, but occasionally they get a really fast circuit. That's probably not an improvement. Another danger here is that, if you're a greedy client, your logical behavior under this proposal is to open as many circuits as you can and use them forever. This isn't so good for your anonymity though. Hmmmm. > The goal of this proposal is to support the following goals: > > 1. An easy way to toggle between "At least speed X" (for > single-threaded web browsing) and "Any speed, many connections" (for > downloads). Hmmm. I'm not sure I really like that part; it seems to leak pretty trivially what you're doing by your node selection. Also see below. > 2. A way to keep nodes from being CPU starved from the encryption > processing (high bandwidth nodes) > 3. A way to keep nodes from being bandwidth starved (the main limit on > middle-speed nodes). > > Motivation: Speed up Tor. > > Design: > 1. Add in a control message for switching torrc's. Add support in > Vidiallia to toggle these. > Flaw: Ideally the determination (high speed vs. high numbers) would > be made based on who is making the request. For example, while the > downloader is fetching 10 slow parts at once, I still want to > browse. This sounds like an unrelated thing to what you actually want. If you'd like some circuits to be built with a different node selection approach than others, you don't want to switch torrc files: you want to apply a different set of circuit building rules to them. Otherwise, there's no way for circuits of different types to coexist on the same client. > 2. If the protocol for extending a circuit to a new node does not > permit the new node to reject the connection, then add this ability. > Otherwise, start using it. Nodes can prevent being CPU starved by > refusing new connections when they are "full". Or they could could rate-limit connections or something. This "turn away circuits when you're full" approach would seem to mean that once you've got a circuit, you can use it as much CPU and bandwidth as you want, but it can get hard to get a circuit. Roger, what do you think here? This seems to be a step away from fairness. > 3. When a circuit is being built, estimates of bandwidth needed are > transmitted as well. Similar to #2, nodes will reject new connections > if the bandwidth isn't there. > > Security implications: Absolutely no idea. How does having large > numbers of connections affect Tor's tracability? Sorry; who sends bandwidth estimates to whom? > Specification (incomplete): > 1. New control message would either take a filename of the new torrc, > or the contents of the new torrc. I do not know Tor's inner workings, > and cannot tell which is "better". There's already control commands to change config options. > 2. Nodes can measure the CPU cost per circuit, and tell how many they > can afford CPU wise. There may be a configuration parameter to > indicate how much CPU it can use; maybe the output of "uptime" is read > to see what the CPU levels are (and Tor stops accepting when the load > is .8 or higher.) Seems like this would leak your CPU load. Don't know if that's okay. > 3. The simplest way to handle this is to put numbers in the config > file, and pass them along. For example, if I'm in "single threaded > browsing", I'll have numbers specifying a max speed of 150 KB/s, a > burst speed of 100 KB/s, and an average speed of 10 KB/s. If I'm in > "multi-threaded download", I'll specify a max speed of 25 KB, an > average speed of 15 KB, and a burst speed of 18 KB. We already specify numbers on the server side, and include them in router descriptors, and never use more than we're configured to use. Have you read dir-spec.txt (which says how we communicate about routers) or path-spec.txt (which says how we choose circuit to build)? > What to do with these numbers? Well, if the sum of the averages of all > incoming circuits exceed my actual bandwidth, I say "No" when someone > tries to connect. Similarly, if I cannot support the burst speed, I > saw "No", to avoid slowing them down (this becomes the minimum speed > needed). Finally, I know that the worst case for this circuit is the > max speed, and I can do ... ? with it. I don't know; is there a way to > The idea here: On my DSL, I cannot get more than 150 KB/s. While I > want to get that full speed, I'll be happy to get 100 KB/s. On > average, while I'm surfing, I'm not fetching pages all the time -- > hence, an average speed of 10 KB/s representing fetch, read, fetch, > read, fetch, read. > > Now, it's not perfect. I'm thinking that "Busy percentage" might make > more sense -- 10% busy for web surfing, 95% busy for downloaders. This > would also help CPU overhead calculations. It also helps tell when to > say "This circuit has been idle for a while. It isn't active at all, > and while it is inactive, we will regard it as having a speed demand > of 0". This will prevent a node from being filled up with "idle" > connections, and becoming wasted. > > I'm also realizing that my concept of "burst" isn't quite right, and > I'm hoping that someone else has a better idea. For downloading, > "burst" means that while the average demand for a 10 part download is > 15 KB/s per circuit, there will be variance, and a node might see a > higher burst. Yet I will be happy even if a node can only give me 10 > KB/s, because I have 9 other circuits that will each get slightly more > speed. So I think we need "This is my minimum acceptable speed, reject > this circuit if you can't give me this much", "This is my average", > and "This is my worst case / initial burst" (a lot of circuits will be > busy at first, and idle afterwards), as well as "percentage of time I > expect the circuit to be used". > > Compatibility: The only change in how nodes talk to each other is in > circuit building. I am not familiar with the current system to know > how this will change things. You can learn how nodes talk to each other by reading tor-spec.txt. On the whole, I think this is a good start; we need to think about how this impacts performance overall. I think limiting CPU use is a decent idea (though my top CPU priority right now is figuring out how to do a better job when we're on multi-core machines). We already limit bandwidth use. Our current bandwidth approach is (more or less; please correct me, Roger) "share equally among circuits on connections; share equally on connections." That's not so smart; proposal 111 is a little better. This proposal amounts to "Show up early, get a circuit. Show up late, get nothing." How can we investigate whether this would actually improve matters? There's also a DoS problem to consider here: it doesn't cost much to make a bunch of connections (possibly anonymously) to a node and say "I'll need a big chunk of bandwidth." If nodes don't overcommit, it's pretty easy to make a node useless by allocating all its bw without actually using any. There are already better fair queueing algorithms than the one Tor is using; could we improve throughput by switching to one of those instead? Anyway, just a first round of thoughts. If you want to do another revision of your proposal, I'll check that in. If you'd prefer to just rename it and have me slap a number on it, I'll put it into svn as is. Also, I should really go through my to-do folder and see what else is waiting for a number. man. so busy. peace, -- Nick Mathewson
Attachment:
pgpWVUYj4rwjI.pgp
Description: PGP signature