[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[or-cvs] [metrics/master] Remove long outdated TODO list.
Author: Karsten Loesing <karsten.loesing@xxxxxxx>
Date: Sat, 27 Mar 2010 10:34:39 +0100
Subject: Remove long outdated TODO list.
Commit: 855b66941479ad49fe6e5c28a2f8d5634a5f8748
---
TODO | 190 ------------------------------------------------------------------
1 files changed, 0 insertions(+), 190 deletions(-)
delete mode 100644 TODO
diff --git a/TODO b/TODO
deleted file mode 100644
index 21c2b6a..0000000
--- a/TODO
+++ /dev/null
@@ -1,190 +0,0 @@
-Legend:
- - Not done
- * Top priority
- . Partially done
- o Done
- d Deferrable
- D Deferred
- X Abandoned
-
-==========================================================================
-
-Tasks for September or later:
-
- . Configure Mike's bandwidth scanner on gabelmoo
- . Include measured bandwidths in votes
-
- . Exiting traffic by port
-* . July 31: Evaluate data together with Steven. Write report on measured
-* exit port data.
-
- - Alternative requirements for flags
- - Display actual MTBF/WFU requirements for weakened requirements.
- Evaluation has finished. Look at results and include them in report.
-* - June 30: Write proposal for weakened requirements for being a Guard;
-* see TODO.022.
-
- - Client requests to directories
- - Compare bytes.txt output to dirreq download times
- - Figure out why estimations are that far off. Where is the flaw in the
- math?
-
- - Clients connecting to entry nodes
-* - July 31: Write report on measured entry stats data.
-
- - Circuit build timeouts
- - Find out why there are bumps at full seconds even though seconds start
- at random times on the relays; possibly measure full distributions of
- cell times in circuit queues, not just the deciles.
-
- - Evaluate Roger's reduced circuit window patch
- - Switch to 1 MiB downloads
- - Work organization
- - Add medium and low priority items from Roger's performance mail to
- this list, too. This list contains only the two high priority items.
-
- - Directory archives
- - Look at entropy of directory over the years. Right now the relay
- choices are not uniform. It is way more likely that clients choose
- fast relays than slow ones. If we re-normalize it, what is the
- equivalent number of uniformly-weighted relays in the network?
- mikeperry has some equations for this in his torflow, but it would be
- interesting to see whether that number is going up over time, and how
- it compares to number-of-relays and amount-of-bandwidth.
- - How many of the German relays that have disappeared in 2008 were set
- up at the end of 2007?
- - Is the (major) reason for disappearing nodes in France in mid-2008
- that OVH stopped supporting Tor relay operation?
- - Examine bandwidth-per-relay ratios for various countries. Do changes
- in bandwidth per country result from a few or a lot of relays joining
- or leaving?
- - Investigate very old Tor versions. Do these nodes have their contact
- info set? A possible explanation for these nodes not being updated is
- that they might run on nodes without knowledge of their owners.
- - Compare descriptors collected on gabelmoo with those collected by
- tor26. What fraction of descriptors is missing? Is it worth combining
- both archives?
- - Investigate whether the loss of German relays in 2008 was due to the
- pervasive dynamic IP reachability testing bugs. How?
- - Compare observed/history bandwidth by time of day to see if traffic is
- underutilized at night and saturated during the day.
- - For comparison of relays on dynamic IP addresses, don't count relays
- that were up for only a short time; consider using a dynamic IP
- database.
- - Consider recording bandwidth usage on relays by putting 1 random
- second of every 15-minute interval into extra-info documents, rather
- than the sum of transported bytes. Suggestion by Roger/Steven.
-
- - Tor exit list
- - Permit queries whether a certain IP was an exit for a certain target
- at a certain time.
-
- - Client requests to directories
- - Why do authorities (at least moria1 and moria2) see such a high
- request-to-address ratio? Shouldn't clients ask at most once? A
- possible explanation is that people are running Tor in a way where
- their cache doesn't survive, maybe old-school Torpark variants or
- something. Another explanation are people running relays that aren't
- reachable so aren't ignored in the geoip stats. Further investigate.
- - Figure out if there are better GeoIP databases available that focus
- more on small countries and that are still affordable.
- - Try to estimate the number of concurrent Tor users from active
- circuits and the probability of clients picking a relay for their
- circuits. This only requires that we know how many circuits users
- build on average. Hmm.
- - Investigate the algorithm in global_write_bucket_low() that contains
- the priorization of some directory requests over others. This
- algorithm was written when v1 was popular and v2 was new. Do the
- conditions in that function require an update?
- - Consider using a dynamic IP database to determine how many users are
- on dynamic IP addresses.
- - Investigate assumption that 1 IP address is equivalent to 1 user;
- consider dynamic IP addresses and NAT, too.
- - Add statistics to analyze failure types of directory requests and
- include transmission times of failed requests exceeding a certain
- threshold of 50% of all bytes.
-
- - Cells in circuit queues
- - Extend statistics to medians, and 1st/9th deciles; problematic as
- these statistics require keeping more history on Tor relays.
- - Also extend statistics to outbuffer sizes.
- - Investigate classification of circuits on a relay: Do most circuits
- stay inactive, but a few become active, send their cells, become
- inactive, get new cells and become active, and keep oscillating? Or
- are there active circuits that just stay active for seconds at a time
- because they cannot clear their queue?
- - Investigate timing of circuits flushing their queues: For relays that
- rate limit, what fraction of each second do they spend with empty
- write buckets? The theory from earlier analyses is that for most
- relays that rate limit, they have a full second's worth of data queued
- up already, and at the top of each second, they pull off one second's
- worth of bytes, send them, and then go dormant again until the next
- second. Two approaches to fix this behavior are lowering the circuit
- window sizes, so there's less data in flight on the network, and
- reducing the granularity of the token bucket refills, so it sends
- bytes more regularly throughout the second; but first the theory needs
- to be confirmed.
- - Another theory is that some relays refuse to read from a relay for
- a period of multiple seconds. Can this be confirmed by the
- measurements?
- - Instrument edge streams and how they add cells to their circuits, and
- how they flush them on the socks side.
-
- - Directory archives
- - Do guards that have had the guard flag for a long time (weeks or
- months) have more load than guards that just got their guard flag? Try
- to find a possible correlation between advertised bandwidth and the
- time a relay spent in the network with the Guard flag. (see 4.5 in
- performance roadmap)
- - Analyze 2004 and 2005 data, too.
-
- - Bridge archives
- - Investigate bridge churn to determine how many bridges users need.
- - Look through the bridge relay stats and see how much churn there is.
- Roger is guessing that the 400-some bridges we have running by end of
- June do not indicate that only 400-some people set up bridges. Rather
- they indicate that only 400-some of them have their bridge still up
- and reachable right now.
- - Are bridges known to be available when users receive their addresses?
- In one reported case, 9 bridges were unavailable only 2 hours after
- receiving their addresses.
- - Estimate how many bridge users we're skipping because only
- super-stable bridges report any stats.
-
- - Measure throughput and latency between relays
- - Implement opportunistic measuring of cell transfer times and bandwidth
- between relays.
- - Decide if statistics should be measured in the future in aggregate
- form.
-
- - Measure throughput
- . Write report on measured throughput data from torperf.
- - Evaluate speedracer results.
- - Passively measure throughput in Tor clients when configured.
- - Improve usability so that non-developer users in countries like
- Tunesia can measure throughput themselves. This can be speedracer,
- torperf, or some other tool. Consider implementing as Vidalia plugin
- once the plugin infrastructure is in place.
-
- - Measure latencies
- - Evaluate circuit-build times in buildtimes data.
- - Passively measure circuit-build times in clients when configured.
- - Measure latencies as clients would experience them. Run a Tor client
- somewhere that makes "typical" Tor circuits (i.e. just let Tor choose
- its own paths, but set UseEntryGuards to 0, and only make requests to
- port 80 so it builds circuits which exit there), and send pings every
- so often, and track how long they take. One easy way to ping is to
- make a request to an IP address that we know is refused by the last
- hop's exit policy. Say, 127.0.0.1:80. Then measure the time between
- sending the connect cell, and receiving the end cell. We expect this
- latency to go down over time, a) because we lower the circuit window,
- and b) because Tor has on-average-better circuits based on Mike
- Perry's plans.
-
- - Metrics portal
- - Write down architecture for TorStatus extension.
- - Implement extensions.
- - Load directory archives into MySQL database and optimize database
- schema so that evaluations are executed quickly.
- - Set up extended TorStatus.
-
--
1.6.5