[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[tor-talk] Scaling Tor



Hi Tor-Talk,

I am doing an MSc in Telecommunications and Network at City University, London.  For my dissertation I am looking at the limitations of scaling Tor up and how the limits could be overcome and should state that I am not a programmer, though would love to be involved in Tor’s progress.

I’m not sure if I am confusing Authority Server and Directory Authority and Directory Server or if they are all one and the same...

Firstly, in the original Tor documentation (Tor-Design 18/05/2004) initial “theoretical” limits were stated that Tor could operate, then three, but as many as, up to nine DA’s (Directory Authorities); however I note from the documentation you have gone through various version releases; have introduced directory caches etc to mitigate the overloading of the DA’s and now have ten DA’s operating and overall improved network performance.  

Later (section 8) "Early experiences: Tor in the Wild” states initial expectations "of the network to support a few hundred nodes and 10,000 users before we’re forced to become more distributed”.  This reference was made to the “clique topology” and “full-visibility directories”, yet you now operate almost 6,000 relays and around 2.25M users (directly connected).  Have you fundamentally changed the topology or have you found gains in the reporting of relays for form the consensus (or elsewhere) to allow this scale factor?

Two of the bottle necks identified in dir-spec (section 0.3 Some Remaining Questions) are that having every client know about every relay; and to have every Directory Cache to know about every router won’t scale ad infinitum. 

A question raised in Tor-Design (section 9) is, "if clients can no longer have a complete picture of the network, how can they perform discovery while preventing attackers from manipulating or exploiting gaps in their knowledge?”.  If the network were to be considered to scale up to significant number of all Internet users, could it be that the Directory Authority(Ies) release (to Directory Caches and clients) a even random sample of relays/nodes from the FULL set of nodes, such that the randomness of the path selection is still maintained.  The random selection could be sampled on a per client basis with enough of a sample as is currently downloaded (6000 relays).  What this means is that each client (or possibly groupings of clients) is getting a different “view” of the network, but there would need to be a scaling down from the full set to the sample set at some point before the client.  

I have looked over the documentation for the path selection, directory protocol and the consensus, but have not documented the timing of the exchanges of communications.  I imagine that this is an area that could present a limit if scaled up.  What are the current areas that present limitations for large scaling up?

I have been able to access most of the relevant documentation through the https://www.torproject.org/docs/documentation.html.en but would appreciate it if there are any other repositories of info.  As mentioned at the start, I am not a programmer so the code base is meaningless to me :(

A small note; it would be useful for the documentation to be dated (and reversioner with dates) to indicate the freshness and relevance of the data.  I am aware that this may be a resource issue.

I appreciate your support with the network and hope to be able to contribute more in the future.

Yours sincerely
 
Mike Fikuart IEng MIET
 
Mobile: 07801 070580
Office: 020 33840275
Blog: mikefikuart
Skype: mikefikuart
Twitter: mikefikuart
LinkedIn: mikefikuart

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

-- 
tor-talk mailing list - tor-talk@xxxxxxxxxxxxxxxxxxxx
To unsubscribe or change other settings go to
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk