[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[tor-bugs] #4490 [Analysis]: Sensitivity analysis of different ways to sample relay capacity for simulations



#4490: Sensitivity analysis of different ways to sample relay capacity for
simulations
------------------------------------+---------------------------------------
 Reporter:  arma                    |          Owner:     
     Type:  task                    |         Status:  new
 Priority:  normal                  |      Milestone:     
Component:  Analysis                |        Version:     
 Keywords:  performance simulation  |         Parent:     
   Points:                          |   Actualpoints:     
------------------------------------+---------------------------------------
 Since current Tor simulators can't handle the whole 2500 relays, they end
 up running simulations on a subset of relays that they hope is
 representative. Early papers chose their sample by choosing n relays
 weighted by capacity. Rob and Kevin both found that they could achieve
 more consistent results by breaking the relays into deciles, and choosing
 n/10 relays from the "0-10%" bucket, n/10 from the "10-20%" bucket, and so
 on.

 But I suspect that it is not optimal to base these parameter choices on
 the number of fingers that primates have. In particular, Mike's graphs
 show that the "0-5%" bucket is quite different from the "5-10%" bucket. So
 I worry that different runs could see quite different outcomes.

 Rob pointed out that if we want to match reality, we'll need to know what
 load to place on the network -- and even messier, how to scale down that
 load in a way that matches the scaling down of the relay population.

 But I think there's still some good insight to be had here, by looking at
 how much variation we get for a variety of sampling algorithms for a given
 set of loads. If the results are consistent for a given set of loads while
 varying sampling algorithms, that would be a surprising and interesting
 result. And if the results for a given load change by sampling algorithm,
 we should get some better intuition about how much they change, and what
 parameters seem to influence the changes the most.

 Alas, the part of this question that makes the number of simulation runs
 blow up is that we ought to do the tests for a variety of research
 questions, since maybe some research questions are quite sensitive to
 changes in capacity distribution and others not so much.

 I bet we could get quite a bit of mileage here by just looking at the
 resulting capacity (and thus probability) distributions of various
 sampling algorithms, and leaving the "Tor simulation" component out
 entirely -- since using a full-blown Tor simulator to measure similarity
 of distribution is a mighty indirect approach.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4490>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs