[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #4499 [Analysis]: Investigate scaling points to handle more bridges
#4499: Investigate scaling points to handle more bridges
----------------------+-----------------------------------------------------
Reporter: runa | Owner: karsten
Type: task | Status: assigned
Priority: normal | Milestone: Sponsor E: March 15, 2012
Component: Analysis | Version:
Keywords: | Parent:
Points: | Actualpoints:
----------------------+-----------------------------------------------------
Comment(by karsten):
I started this analysis by writing a small tool to generate sample data
for BridgeDB and metrics-db. This tool takes the contents from one of
Tonga's bridge tarball as input, copies them a given number of times, and
overwrites the first two bytes of relay fingerprints in every copy with
0000, 0001, etc. The tool also fixes references between network statuses,
server descriptors, and extra-info descriptors. This is sufficient to
trick BridgeDB and metrics-db into thinking that relays in the copies are
distinct relays. I used the tool to generate tarballs with 2, 4, 8, 16,
32, and 64 times as many bridge descriptors in them.
In the next step I fed the tarballs into BridgeDB and metrics-db.
BridgeDB reads the network statuses and server descriptors from the latest
tarball and writes them to a local database. metrics-db sanitizes two
half-hourly created tarballs every hour, establishes an internal mapping
between descriptors, and writes sanitized descriptors with fixed
references to disk.
The attached graph shows the results.
The upper graph shows how the tarballs grow in size with more bridge
descriptors in them. This growth is, unsurprisingly, linear. One thing
to keep in mind here is that bandwidth and storage requirements to the
hosts transferring and storing bridge tarballs are growing with the
tarballs. We'll want to pay extra attention to disk space running out on
those hosts.
The middle graph shows how long BridgeDB takes to load descriptors from a
tarball. This graph is linear, too, which indicates that BridgeDB can
handle an increase in the number of bridges pretty well. One thing I
couldn't check is whether BridgeDB's ability to serve client requests is
in any way affected during the descriptor import. I assume it'll be fine.
Aaron, are there other things in BridgeDB that I overlooked that may not
scale?
The lower graph shows how metrics-db can or cannot handle more bridges.
The growth is slightly worse than linear. In any case, the absolute time
required to handle 25K bridges is worrisome (I didn't try 50K). metrics-
db runs in an hourly cronjob, and if that cronjob doesn't finish within 1
hour, we cannot start the next run and will be missing some data. We
might have to sanitize bridge descriptors in a different thread or process
than the one that fetches all the other metrics data. I can also look
into other Java libraries to handle .gz-compressed files that are faster
than the one we're using. So, we can probably handle 25K bridges somehow,
and maybe even 50K. Somehow.
Finally, note that I left out the most important part of this analysis:
can Tonga, or more generally, a single bridge authority handle this
increase in bridges? I'm not sure how to test such a setting, or at least
without running 50K bridges in a private network. I could imagine this
requires some more sophisticated sample data generation including getting
the crypto right and then talking to Tonga's DirPort. If there's an easy
way to test this, I'll do it. If not, we can always hope for the best.
What can go wrong.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4499#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs