[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Proposal: Path Selection Improvements



Hi everyone,

Here's a proposal for improving path selection through collecting build time statistics. We're still collecting data, so it's quite short. Comments are appreciated.

Fallon


Filename: path-selection-improvements.txt
Title: Improving Tor Path Selection
Version:
Last-Modified:
Author: Fallon Chen, Mike Perry
Created: 5-Jul-2008
Status: Draft

Overview
  The performance of paths selected can be improved by adjusting the
  CircuitBuildTimeout and the number of guards. This proposal describes
  a method of tracking buildtime statistics, and using those statistics to adjust the
  CircuitBuildTimeout and the number of guards.

Motivation
  Tor's performance can be improved by excluding those circuits that
  have long buildtimes (and by extension, high latency). For those Tor users
  who require better performance and have lower requirements for anonymity,
  this would be a very useful option to have.

Implementation

  Learning the CircuitBuildTimeout
    Based on studies of build times, we found that the distribution of
    circuit buildtimes appears to be a Pareto distribution. The number of
    circuits to observe (ncircuits_to_observe) before changing the
    CircuitBuildTimeout will be tunable. From our preliminary measurements,
    it is likely that ncircuits_to_observe will be somewhere on the order of
    1000. The values can be represented compactly in Tor in milliseconds as a      
    circular array of 16 bit integers. More compact long-term storage
    representations can be implemented by simply storing a histogram with
    50 millisecond buckets when writing out the statistics to disk.
 
  Calculating the preferred CircuitBuildTimeout
    Circuits that have longer buildtimes than some x% of the estimated CDF of
    the Pareto distribution will be excluded. x will be tunable as well.
 
  Circuit timeouts
    In the event of a timeout, backoff values should include the 100-x% of
    expected CDF of timeouts.
    Also, in the event of network failure, the observation mechanism should
    stop collecting timeout data.
 
  Other notes
    Since this follows a Pareto distribution, large reductions on the timeout
    can be achieved without cutting off a great number of the total paths.
    However, hard statistics on which cutoff percentage gives optimal
    performance have not yet been gathered.

Issues
  Impact on anonymity