Mike Perry transcribed 9.3K bytes: > Andrew Lewman: > > I had a conversation with a vendor yesterday. They are > > interested in including Tor as their "private browsing mode" and > > basically shipping a re-branded tor browser which lets people toggle the > > connectivity to the Tor network on and off. > > > > They very much like Tor Browser and would like to ship it to their > > customer base. Their product is 10-20% of the global market, this is of > > roughly 2.8 billion global Internet users. > > > > As Tor Browser is open source, they are already working on it. However > > ,their concern is scaling up to handling some percent of global users > > with "tor mode" enabled. They're willing to entertain offering their > > resources to help us solve the scalability challenges of handling > > hundreds of millions of users and relays on Tor. > > > > As this question keeps popping up by the business world looking at > > privacy as the next "must have" feature in their products, I'm trying to > > compile a list of tasks to solve to help us scale. The old 2008 > > three-year roadmap looks at performance, > > https://www.torproject.org/press/2008-12-19-roadmap-press-release.html.en > > > > I've been through the specs, > > https://gitweb.torproject.org/torspec.git/tree/HEAD:/proposals to see if > > there are proposals for scaling the network or directory authorities. I > > didn't see anything directly related. > > > > The last research paper I see directly addressing scalability is Torsk > > (http://www.freehaven.net/anonbib/bibtex.html#ccs09-torsk) or PIR-Tor > > (http://www.freehaven.net/anonbib/bibtex.html#usenix11-pirtor) > > These research papers basically propose a total network overhaul to deal > with the problem of Tor relay directory traffic overwhelming the Tor > network and/or Tor clients. > > However, I believe that with only minor modifications, the current Tor > network architecture could support 100M daily directly connecting users, > assuming we focus our efforts on higher capacity relays and not simply > adding tons of slower relays. > > > The core problem is that the fraction of network capacity that you spend > telling users about the current relays in the network can be written as: > > f = D*U/B > > D is current Tor relay directory size in bytes per day, U is number of > users, and B is the bandwidth per day in bytes provided by this Tor > network. Of course, this is a simplification, because of multiple > directory fetches per day and partially-connecting/idle clients, but for > purposes of discussion it is good enough. > > To put some real numbers on this, if you compare > https://metrics.torproject.org/bandwidth.html#dirbytes with > https://metrics.torproject.org/bandwidth.html#bandwidth, you can see > that we're currently devoting about 2% of our network throughput to > directory activity (~120MiB/sec out of ~5000MiB/sec). So we're not > exactly hurting at this point in terms of our directory bytes per user > yet. > > But, because this is fraction rises with both D and U, these research > papers rightly point out that you can't keep adding relays *and* users > and expect Tor to scale. > > However, when you look at this f=D*U/B formula, what it also says is > that if you can reduce the relay directory size by a factor c, and also > grow the network capacity by this same factor c, then you can multiply > the userbase by c, and have the same fraction of directory bytes. > > This means that rather than trying to undertake a major network overhaul > like TorSK or PIR-Tor to try to support hundreds of thousands of slow > junky relays, we can scale the network by focusing on improving the > situation for high capacity relay operators, so we can provide more > network bandwidth for the same number of directory bytes per user. > > > So, let's look at ways to reduce the size of the Tor relay directory, and > each way we can find to do so means a corresponding increase in the > number of users we can support: > > 1. Proper multicore support. > > Right now, any relay with more than ~100Mbit of capacity really > needs to run an additional tor relay instance on that link to make > use of it. If they have AES-NI, this might go up to 300Mbit. > > Each of these additional instances is basically wasted directory > bytes for those relay descriptors. > > But with proper multicore support, such high capacity relays could > run only one relay instance on links as fast as 2.5Gbit (assuming an 8 > core AES-NI machine). > > Result: 2-8X reduction in consensus and directory size, depending > on the the number of high capacity relays on multicore systems we > have. > > 2. Cut off relays below the median capacity, and turn them into bridges. > > Relays in the top 10% of the network are 164 times faster than > relays in the 50-60% range, 1400 times faster than relays in the > 70-80% range, and 35000 times faster than relays in the 90-100% range. > > In fact, many relays are so slow that they provide less bytes to the > network than it costs to tell all of our users about them. There > should be a sweet spot where we can set this cutoff such that the > overhead from directory activity balances the loss of capacity from > these relays, as a function of userbase size. > > Result: ~2X reduction in consensus and directory size. > It's super frustrating when I publicly tell people that â as much as we <3 them for running a relay â doing so on a home connection, on wimpy hardware like Raspberry Pis, is likely only going to harm the Tor network. And then people point at "If you have at least 100 kilobytes/s each way, please help out Tor by configuring your Tor to be a relay" on our website [0] and stop listening to whatever other relay-running advice I have to give. So... here's the background on the "sweet spot" Mike was talking about, and why he stated: "[...]many relays are so slow that they provide less bytes to the network than it cost to tell all of our users about them.": Using Stem on my latest copy of the consensus to run some calculations on the relay advertised bandwidth (RAB), I get: Average RAB: 3887.222911227154 KB/s Median RAB: 249.5 KB/s Combined RABs of all RABs < 249.5KB/s: 162354 KB Bandwidth used for directory requests [1]: ~125 MB/s Current total bandwidth usage [2]: ~5700 MB/s Meaning that, if we cut off all relays below the current median of 250KB/s, we lose 3064 relays, and lose 158 MB/s of network throughput. Currently, 2.2% of our bandwidth usage goes toward directory requests (125MB/s / 5700MB/s). If we cut off the relays under 250 KB/s, we cut that 2.2% to 1.1%, saving roughly 75 MB/s in directory requests. Overall, this means that we can halve the size of the current consensus and, rather than losing 158 MB/s, we only actually lose 83 MB/s in throughput. We could easily play with these numbers a bit, and find a "sweet spot" where the bandwidth cutoff rate is determined by whatever makes us net a positive change in overall bandwidth, taking directory requests into account. In other words: "If your relay costs us more to tell users about than the actual traffic it's providing, we don't want it!" Long term, I don't think we want to do "only 3000 relays are allowed at any given time", but instead, a compromise where: 2.a. Have a sliding definition of what a "real internet connection" is, by modifying the statistics above to find the "sweet spot", and set this as the cutoff rate for the required minimum bandwidth for being a relay. 2.b. The sliding minimum bandwidth for running a relay is *actually* enforced. If you're below the minimum, no one's going to stop you from running your relay, but it's not going to be in the consensus. Result: Overall network bandwidth stays the same. The size of the current consensus is roughly chopped in half. Also, BridgeDB doesn't want your slow relays as bridges. See Footnote [3]. > 3. Switching to ECC keys only. > > We're wasting a lot of directory traffic on uncompressible RSA1024 > keys, which are 4X larger than ECC keys, and less secure. Right now, > were also listing both. When we finally remove RSA1024 entirely, the > directory should get quite a bit smaller. > > Result: ~2-4X reduction in consensus and directory size. I'm going to ignore microdescriptors for now, because I don't use them because they're a Bad Idea (see #5968). And I'm too lazy to go fetch some of them. :) Mike, you said: > were [sic] also listing both Should we assume, then, that you're only talking about the `onion-key`s, but not the `signing-keys`s (which are also currently 1024-bit RSA)? Also... removing `onion-key`s from the `@type server-descriptor`s would not result in a "~2-4X reduction in [...] directory size". (It might possibly for the cached-microdescriptors, but I'm still ignoring those.) Taking for example a really small server-descriptor (I removed the contact line and did things like making the bandwidth numbers as small as possible), and one of the largest server descriptors I could find, then making copies of each of these descriptors without the `onion-key`s, and then compressing each one of the four files with `gzip -n -9 $FILE`, I got: Small server-descriptor, with onion key, compressed: 905 B Small server-descriptor, without onion key, compressed: 756 B Large server-descriptor, with onion key, compressed: 1127 B Large server-descriptor, without onion key, compressed: 980 B Meaning that, without factoring in potential savings from gzipping multiple descriptors at a time, cutting out `onion-key`s would result in server-descriptors which are only 84% - 87% of the size. 13% savings isn't all that much. Plus, if you are proposing moving everything (including the `signing-key`s) to ECC, I'm not convinced yet that that is a good idea, especially if we're using only one curve. Putting all your eggs in one basket... > 4. Consensus diffs. > > With proposal 140, we can save 60% of the directory activity if > we send diffs of the consensus for regularly connecting clients. > Calculating the benefit from this is complicated, since if clients > leave the network for just 16 hours, there is very little benefit > to this optimization. These numbers are highly dependent on churn > though, and it may be that by removing most of the slow junk relays, > there is actually less churn in the network, and smaller diffs: > https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/140-consensus-diffs.txt > > Let's just ballpark it at 50% for the typical case. > > Result: 2X reduction in directory size. > Not to mention that, by reducing the bytes used in directory fetches, consensus diffs also help by increasing the "sweet spot" in #2, and ergo raise the number of relays which the network can sustainably maintain. > 5. Invest in the Tor network. > > Based purely on extrapolating from the Noisebridge relays, we could > add ~300 relays, and double the network capacity for $3M/yr, or about $1 > per user per year (based on the user counts from: > https://metrics.torproject.org/users.html). > > Note that this value should be treated as a minimum estimate. We > actually want to ensure diversity as we grow the network, which may make > this number higher. I am working on better estimates using replies from: > https://lists.torproject.org/pipermail/tor-relays/2014-September/005335.html > > Automated donation/funding distribution mechanisms such as > https://www.oniontip.com/ are especially interesting ways to do this > (and can even automatically enforce our diversity goals) but more > traditional partnerships are also possible. > > Result: 100% capacity increase for each O($3M/yr), or ~$1 per new user > per year. > -- ââ isis agora lovecruft _________________________________________________________ GPG: 4096R/A3ADB67A2CDB8B35 Current Keys: https://blog.patternsinthevoid.net/isis.txt
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ tor-dev mailing list tor-dev@xxxxxxxxxxxxxxxxxxxx https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev