[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #4490 [Analysis]: Sensitivity analysis of different ways to sample relay capacity for simulations
#4490: Sensitivity analysis of different ways to sample relay capacity for
simulations
------------------------------------+---------------------------------------
Reporter: arma | Owner:
Type: task | Status: new
Priority: normal | Milestone:
Component: Analysis | Version:
Keywords: performance simulation | Parent:
Points: | Actualpoints:
------------------------------------+---------------------------------------
Since current Tor simulators can't handle the whole 2500 relays, they end
up running simulations on a subset of relays that they hope is
representative. Early papers chose their sample by choosing n relays
weighted by capacity. Rob and Kevin both found that they could achieve
more consistent results by breaking the relays into deciles, and choosing
n/10 relays from the "0-10%" bucket, n/10 from the "10-20%" bucket, and so
on.
But I suspect that it is not optimal to base these parameter choices on
the number of fingers that primates have. In particular, Mike's graphs
show that the "0-5%" bucket is quite different from the "5-10%" bucket. So
I worry that different runs could see quite different outcomes.
Rob pointed out that if we want to match reality, we'll need to know what
load to place on the network -- and even messier, how to scale down that
load in a way that matches the scaling down of the relay population.
But I think there's still some good insight to be had here, by looking at
how much variation we get for a variety of sampling algorithms for a given
set of loads. If the results are consistent for a given set of loads while
varying sampling algorithms, that would be a surprising and interesting
result. And if the results for a given load change by sampling algorithm,
we should get some better intuition about how much they change, and what
parameters seem to influence the changes the most.
Alas, the part of this question that makes the number of simulation runs
blow up is that we ought to do the tests for a variety of research
questions, since maybe some research questions are quite sensitive to
changes in capacity distribution and others not so much.
I bet we could get quite a bit of mileage here by just looking at the
resulting capacity (and thus probability) distributions of various
sampling algorithms, and leaving the "Tor simulation" component out
entirely -- since using a full-blown Tor simulator to measure similarity
of distribution is a mighty indirect approach.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4490>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs