[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-commits] [metrics-tasks/master] Update relay stability tech report sources.

To: tor-commits@xxxxxxxxxxxxxxxxxxxx
Subject: [tor-commits] [metrics-tasks/master] Update relay stability tech report sources.
From: karsten@xxxxxxxxxxxxxx
Date: Mon, 4 Jul 2011 07:46:08 +0000 (UTC)
Delivered-to: archiver@xxxxxxxx
Delivery-date: Mon, 04 Jul 2011 03:46:52 -0400
List-archive: <http://lists.torproject.org/pipermail/tor-commits>
List-help: <mailto:tor-commits-request@lists.torproject.org?subject=help>
List-id: code repository commits <tor-commits.lists.torproject.org>
List-post: <mailto:tor-commits@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-commits>, <mailto:tor-commits-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-commits>, <mailto:tor-commits-request@lists.torproject.org?subject=unsubscribe>
Patch-author: Karsten Loesing <karsten.loesing@xxxxxxx>
Sender: tor-commits-bounces@xxxxxxxxxxxxxxxxxxxx
commit 88926c8b0295e5de584b87cc241a71fc45526878
Author: Karsten Loesing <karsten.loesing@xxxxxxx>
Date:   Mon Jul 4 09:34:20 2011 +0200

    Update relay stability tech report sources.
---
 task-2911/.gitignore                               |    6 +-
 task-2911/README                                   |  297 +++----
 task-2911/SimulateStableGuard.java                 |  842 ++++++++++++++++++++
 .../mtbf-sim/SimulateMeanTimeBetweenFailure.java   |  351 --------
 task-2911/mtbf-sim/mtbf-sim.R                      |   73 --
 task-2911/report.bib                               |   31 +
 task-2911/report.tex                               |  673 +++++++++++-----
 task-2911/stability.R                              |  301 +++++++
 .../wfu-sim/SimulateWeightedFractionalUptime.java  |  318 --------
 task-2911/wfu-sim/wfu-sim.R                        |   57 --
 10 files changed, 1755 insertions(+), 1194 deletions(-)

diff --git a/task-2911/.gitignore b/task-2911/.gitignore
index d2480c1..23b4a67 100644
--- a/task-2911/.gitignore
+++ b/task-2911/.gitignore
@@ -1,9 +1,9 @@
 *.class
-mtbf-sim/tunf/
-wfu-sim/fwfu/
-wfu-sim/consensuses/
 *.csv
 *.aux
 *.log
+*.bbl
+*.blg
 *.pdf
+future-stability/
 
diff --git a/task-2911/README b/task-2911/README
index 686de7a..e2ae1d0 100644
--- a/task-2911/README
+++ b/task-2911/README
@@ -1,196 +1,143 @@
 Tech report: An Analysis of Tor Relay Stability
 ===============================================
 
-Simulation of MTBF requirements
--------------------------------
-
-When simulating MTBF requirements, we parse status entries and server
-descriptor parts.  For every data point we care about the valid-after time
-of the consensus, the relay fingerprint, and whether the relay had an
-uptime of 3599 seconds or less when the consensus was published.  The last
-part is important to detect cases when a relay was contained in two
-subsequent consensuses, but was restarted in the intervening time.  We
-rely on the uptime as reported in the server descriptor and decide whether
-the relay was restarted by calculating whether the following condition
-holds:
-
-  restarted == valid-after - published + uptime < 3600
-
-In the first simulation step we parse the data in reverse order from last
-consensus to first.  In this step we only care about time until next
-failure.
-
-For every relay we see in a consensus we look up whether we also saw it in
-the subsequently published consensus (that we parsed before).  If we did
-not see the relay before, we add it to our history with a time until
-failure of 0 seconds.  If we did see the relay, we add the seconds elapsed
-between the two consensuses to the relay's time until next failure in our
-history.  We then write the times until next failure from our history to
-disk for the second simulation step below.  Before processing the next
-consensus we remove all relays that have not been running in this
-consensus or that have been restarted before this consensus from our
-history.
-
-In the second simulation step we parse the data again, but in forward
-order from first to last consensus.  This time we're interested in the
-mean time between failure for all running relays.
-
-We keep a history of three variables per relay to calculate its MTBF:
-weighted run length, total run weights, and current run length.  The first
-two variables are used to track past uptime sessions whereas the third
-variable tracks the current uptime session if a relay is currently
-running.
-
-For every relay seen in a consensus we distinguish four cases:
-
-  1) the relay is still running,
-  2) the relay is still running but has been restarted,
-  3) the relay has been newly started in this consensus, and
-  4) the relay has left or failed in this consensus.
-
-In case 1 we add the seconds elapsed since the last consensus to the
-relay's current run length.
-
-In case 2 we add the current run length to the weighted run length,
-increment the total run weights by 1, and re-initialize the current run
-length with the seconds elapsed since the last consensus.
-
-In case 3 we initialize the current run length with the seconds elapsed
-since the last consensus.
-
-In case 4 we add the current run length to the weighted run length,
-increment the total run weights by 1, and set the current run length to 0.
-
-Once we're done with processing a consensus, we calculate MTBFs for all
-running relays.
-
-         weighted run length + current run length
-  MTBF = ----------------------------------------
-                   total run weights + 1
-
-We sort relays by MTBF in descending order, create subsets containing the
-30%, 40%, ..., 70% relays with highest MTBF, and look up mean time until
-failure for these relays.  We then write the mean value, 85th, 90th, and
-95th percentile to disk as simulation results.
-
-To run the simulation, start by changing to the MTBF simulation directory:
-
-  $ cd mtbf-sim/
-
-Export status entries and server descriptor parts from the metrics
-database, once in reverse and once in forward order.  Note that each file
-will be 2.2G large for roughly 2.5 years of data.  Plan for a buffer of at
-least 4 months before and after the interval to investigate:
-
+We developed a simulator for the tech report "An Analysis of Tor Relay
+Stability" that varies the requirements for assigning Stable and Guard
+flags to relays and evaluates how useful these assignments are.  In this
+README we describe how to use the simulator to re-run and possibly modify
+our analysis.
+
+First, download the comma-separated value file containing extracted status
+entries and server descriptors parts from the metrics database.  The file
+is called running-relays-reverse.csv and contains lines like the
+following:
+
+  2011-06-30 23:00:00,0089102aa16208d991dc36efafd5cf13b35aa707,
+      f,f,f,193581,1148763136
+
+The columns are:
+
+ - consensus valid-after time
+ - relay fingerprint
+ - whether the relay was restarted in the last hour (t/f)
+ - whether the relay got the Stable flag assigned (t/f)
+ - whether the relay got the Guard flag assigned (t/f)
+ - advertised bandwidth in bytes per day
+ - written bytes per day
+
+If you don't have this file or want to generate your own file, you can
+export status entries and server descriptor parts from the metrics
+database.  Be sure to plan for a buffer of at least 4 months (better: 6
+months) before and after the interval to investigate.  The psql commands
+and SELECT statement for extracting these data are as follows:
+
+  tordir=> \f ','
+  tordir=> \a
+  tordir=> \t
   tordir=> \o running-relays-reverse.csv
   tordir=> SELECT statusentry.validafter,
              statusentry.fingerprint,
              CASE WHEN descriptor.uptime IS NULL THEN FALSE ELSE
                statusentry.validafter - descriptor.published +
                descriptor.uptime * '1 second'::INTERVAL <
-                 '01:00:00'::INTERVAL END AS restarted
+                 '01:00:00'::INTERVAL END AS restarted,
+             statusentry.isstable,
+             statusentry.isguard,
+             LEAST(descriptor.bandwidthavg, descriptor.bandwidthobserved)
+               AS bwadvertised,
+             bwhist.written_sum AS writtensum
            FROM statusentry
            LEFT JOIN descriptor
            ON statusentry.descriptor = descriptor.descriptor
+           LEFT JOIN bwhist
+           ON DATE(statusentry.validafter) = bwhist.date
+           AND statusentry.fingerprint = bwhist.fingerprint
            WHERE statusentry.isrunning
-           AND statusentry.validafter >= '2009-01-01 00:00:00'
+           AND statusentry.validafter >= '2010-01-01 00:00:00'
+           AND statusentry.validafter < '2011-07-01 00:00:00'
            ORDER BY statusentry.validafter DESC, statusentry.fingerprint;
   tordir=> \o
-  tordir=> \o running-relays-forward.csv
-  tordir=> SELECT statusentry.validafter,
-             statusentry.fingerprint,
-             CASE WHEN descriptor.uptime IS NULL THEN FALSE ELSE
-               statusentry.validafter - descriptor.published +
-               descriptor.uptime * '1 second'::INTERVAL <
-                 '01:00:00'::INTERVAL END AS restarted
-           FROM statusentry
-           LEFT JOIN descriptor
-           ON statusentry.descriptor = descriptor.descriptor
-           WHERE statusentry.isrunning
-           AND statusentry.validafter >= '2009-01-01 00:00:00'
-           ORDER BY statusentry.validafter, statusentry.fingerprint;
-  tordir=> \o
-
-Run the simulation consisting of a reverse and a forward run.  The results
-of the reverse run will be stored to the tunf/ directory and will be
-re-used in subsequent simulations.  Delete the tunf/ directory to repeat
-the reverse run, too.
-
-  $ javac SimulateMeanTimeBetweenFailure.java
-  $ java SimulateMeanTimeBetweenFailure
-
-Plot the results:
-
-  $ R --slave -f mtbf-sim.R
-
-Once you're satisfied with the result, copy the graph to the parent
-directory to include it in the report:
-
-  $ cp mtbf-sim.pdf ../
-
+  tordir=> \q
 
-Simulation of WFU requirements
-------------------------------
+The simulator needs to parse the file in forward and in reverse direction.
+The easiest way to implement this is to reverse the file using tac:
 
-In the first simulation step we parse consensuses in reverse order to
-calculate future WFU for every relay and for every published consensus.
-We keep a relay history with two values for each relay: weighted uptime
-and total weighted time.
+  $ tac running-relays-reverse.csv > running-relays-foward.csv
 
-When parsing a consensus, we add 3600 seconds to the weighted uptime
-variable of every running relay and 3600 seconds to the total weighted
-time of all relays in our history.  We then write future WFUs for all
-known relays to disk by dividing weighted uptime by total weighted time.
-
-Every 12 hours, we multiply the weighted uptimes and total weighted times
-of all relays in our history by 0.95.  If the quotiend of the two
-variables drops below 0.0001, we remove a relay from our history.
-
-In the second simulation step we parse the consensuses again, but in
-forward order.  The history and WFU calculation is exactly the same as in
-the first simulation step.
-
-After calculating WFUs for all relays in the history, we look up the
-future WFUs for all relays meeting certain past WFU requirements and
-calculate their mean value, 85th, 90th, and 95th percentile.
-
-To run the simulation, start by changing to the WFU simulation directory:
-
-  $ cd wfu-sim/
-
-Create a consensuses/ directory and put the consensus files of the
-interval to investigate plus 4+ months before and 4+ months after in it:
-
-  $ mkdir consensuses/
-  $ ln -s $extracted/consensuses-20* .
-
-Run the simulation that first parses consensuses from last to first and
-then from first to last.  The results from the reverse direction will be
-stored in the fwfu/ directory and re-used in subsequent simulations.
-Delete the fwfu/ directory to re-run both simulation parts.
-
-  $ javac SimulateWeightedFractionalUptime.java
-  $ java SimulateWeightedFractionalUptime
-
-Plot the results:
-
-  $ R --slave -f wfu-sim.R
-
-Copy the graph to the parent directory to include it in the report:
-
-  $ cp wfu-sim.pdf ../
-
-
-Compiling the report
---------------------
-
-Copy the generated graphs to the base directory, unless you have done so
-before:
-
-  $ cp mtbf-sim/mtbf-sim.pdf .
-  $ cp wfu-sim/wfu-sim.pdf .
-
-Compile the report:
+Run the simulation consisting of a reverse and a forward run.  The results
+of the reverse run will be stored to the future-stability/ directory and
+will be re-used in subsequent simulations.  Delete or move away the
+future-stability/ directory to repeat the reverse run, too.  The execution
+can take 20--30 minutes or even more depending on your hardware.
+
+  $ javac SimulateStableGuard.java
+  $ java SimulateStableGuard
+
+The result is a file stability.csv that contains the following columns:
+
+- time: consensus valid-after time
+- excladvbw, exclwfu, exclwt: the number of relays that did not get the
+  Guard flag assigned in the simulation which got it from the directory
+  authorities; excladvbw did not get it because of the advertised
+  bandwidth requirement, exclwfu because of the WFU requirement, and
+  exclwt because of the weighted time requirement; it's quite possible
+  that a relay did not get the Guard flag for more than one reason
+- familiar: absolute number of `familiar' relays
+- guard: absolute number of relays with the Guard flag as assigned by the
+  directory authorities
+- guardYYwfuZZadvbw: absolute number of relays with the Guard flag as
+  assigned in the simulation when using YY as the WFU percentile and ZZ as
+  the advertised bandwidth percentile
+- guardintersect, guardobserved, guardsimulated: number of relays that got
+  the Guard flag both by the directory authorities and by the simulator or
+  that only got it by one of them
+- minadvbwaZZadvbw: Minimum advertised bandwidth when using ZZ as the
+  advertised bandwidth percentile, before cutting it down to 250 KiB/s
+- minadvbwbZZadvbw: Minimum advertised bandwidth when using ZZ as the
+  advertised bandwidth percentile, after cutting it down to 250 KiB/s
+- minwfuaYYwfu: Minimum WFU when using YY as the WFU percentile, before
+  cutting it down to 0.98
+- minwfubYYwfu: Minimum WFU when using YY as the WFU percentile, after
+  cutting it down to 0.98
+- minwmtbfaXX: Minimum WMTBF when using XX as the WMTBF percentile, before
+  cutting it down to 5 days
+- minwmtbfbXX: Minimum WMTBF when using XX as the WMTBF percentile, after
+  cutting it down to 5 days
+- minwta: Minimum weighted time before cutting it down to 8 days
+- minwtb: Minimum weighted time after cutting it down to 8 days
+- mtunfXX: Mean time until next failure when using XX as the WMTBF
+  percentile
+- mwfu: Mean WFU of relays that got the Guard flag assigned by the
+  directory authorities
+- mwfuYYwfuZZadvbw: Mean WFU when using YY as the WFU percentile and ZZ as
+  the advertised bandwidth percentile
+- percWWfwb: WW-th percentile of future weighted bandwidth of relays that
+  got the Guard flag assigned by the directory authorities
+- percWWfwbYYwfuZZadvbw: WW-th percentile of future weighted bandwidth
+  when using YY as the WFU percentile and ZZ as the advertised bandwidth
+  percentile
+- percWWtunf: WW-th percentile of time until next failure of relays that
+  got the Stable flag assigned by the directory authorities
+- percWWtunfXX: WW-th percentile of time until next failure when using XX
+  as the WMTBF percentile
+- percWWwfu: WW-th percentile of WFU of relays that got the Guard flag
+  assigned by the directory authorities
+- percWWwfuYYwfuZZadvbw: WW-th percentile of WFU when using YY as the WFU
+  percentile and ZZ as the advertised bandwidth percentile
+- running: absolute number of running relays
+- stable: absolute number of relays with the Stable flag as assigned by the
+  directory authorities
+- stableXX: absolute number of relays with the Stable flag as assigned in
+  the simulation when using XX as WMTBF percentile
+- stableintersect, stableobserved, stablesimulated: number of relays that
+  got the Stable flag both by the directory authorities and by the
+  simulator or that only got it by one of them
+
+Plot the graphs for the tech report:
+
+  $ R --slave -f stability.R
+
+Compile the tech report:
 
   $ pdflatex report.tex
 
diff --git a/task-2911/SimulateStableGuard.java b/task-2911/SimulateStableGuard.java
new file mode 100644
index 0000000..1edbcc3
--- /dev/null
+++ b/task-2911/SimulateStableGuard.java
@@ -0,0 +1,842 @@
+/**
+ * Simulate variation of WMTBF, WFU, and advertised bandwidth requirements
+ * on Stable and Guard flag assignment.  In a first step, parse network
+ * status consensus entries from last to first, calculate future stability
+ * metrics for all running relays, and write them to disk as one file per
+ * network status in future-stability/$filename.  (Skip this step if there
+ * is already a future-stability/ directory.)  In a second step, parse the
+ * network statuse consensus entries again, but this time from first to
+ * last, calculate past stability metrics for all relays, assign relay
+ * flags to relays, look up future stability of these relays, and write
+ * results to stability.csv to disk.
+ *
+ * Please see the README for more information! */
+import java.io.*;
+import java.text.*;
+import java.util.*;
+public class SimulateStableGuard {
+  public static void main(String[] args) throws Exception {
+
+    /* Define analysis interval. */
+    String analyzeFrom = "2010-07-01 00:00:00",
+        analyzeTo = "2010-12-31 23:00:00";
+
+    /* Decide whether we need to do the reverse run, or if we can use
+     * previous results. */
+    if (!new File("future-stability").exists()) {
+
+      /* Track weighted uptime and total weighted time in a long[2]. */
+      SortedMap<String, long[]> wfuHistory =
+          new TreeMap<String, long[]>();
+
+      /* Track time until next failure in seconds in a long. */
+      SortedMap<String, Long> tunfHistory = new TreeMap<String, Long>();
+
+      /* Track weighted written bytes and total weighted time in a
+       * long[2]. */
+      SortedMap<String, long[]> wbHistory = new TreeMap<String, long[]>();
+
+      /* Parse previously exported network status entries in reverse
+       * order. */
+      SimpleDateFormat formatter = new SimpleDateFormat(
+          "yyyy-MM-dd-HH-mm-ss");
+      formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
+      SimpleDateFormat isoFormatter = new SimpleDateFormat(
+          "yyyy-MM-dd HH:mm:ss");
+      isoFormatter.setTimeZone(TimeZone.getTimeZone("UTC"));
+      Map<String, String> runningRelays = new HashMap<String, String>();
+      BufferedReader br = new BufferedReader(new FileReader(
+          "running-relays-reverse.csv"));
+      String line, lastValidAfter = null, lastButOneValidAfter = null;
+      long nextWeightingInterval = -1L;
+      while ((line = br.readLine()) != null) {
+        if (!line.startsWith("20")) {
+          continue;
+        }
+        String[] parts = line.split(",");
+        String validAfter = parts[0];
+        if (lastValidAfter != null &&
+            !lastValidAfter.equals(validAfter)) {
+
+          /* We just parsed the very first consensus.  There's nothing to
+           * do here. */
+          if (lastButOneValidAfter == null) {
+            runningRelays.clear();
+            lastButOneValidAfter = lastValidAfter;
+            continue;
+          }
+
+          /* We just parsed the first consensus outside the analysis
+           * interval.  Stop parsing here. */
+          if (lastValidAfter.compareTo(analyzeFrom) < 0) {
+            break;
+          }
+
+          /* We just parsed all lines of a consensus.  First, see if 12
+           * hours have passed since we last discounted wfu and wb history
+           * values.  If so, discount variables for all known relays by
+           * factor 0.95 (or 19/20 since these are long integers) and
+           * remove those relays with a weighted fractional uptime below
+           * 1/10000 from the history. */
+          long lastValidAfterMillis = isoFormatter.parse(lastValidAfter).
+              getTime();
+          long secondsSinceLastValidAfter =
+              (isoFormatter.parse(lastButOneValidAfter).getTime()
+              - lastValidAfterMillis) / 1000L;
+          long weightingInterval = lastValidAfterMillis
+              / (12L * 60L * 60L * 1000L);
+          if (nextWeightingInterval < 0L) {
+            nextWeightingInterval = weightingInterval;
+          }
+          while (weightingInterval < nextWeightingInterval) {
+            Set<String> relaysToRemove = new HashSet<String>();
+            for (Map.Entry<String, long[]> e : wfuHistory.entrySet()) {
+              long[] w = e.getValue();
+              w[0] *= 19L;
+              w[0] /= 20L;
+              w[1] *= 19L;
+              w[1] /= 20L;
+              if (((10000L * w[0]) / w[1]) < 1L) {
+                relaysToRemove.add(e.getKey());
+              }
+            }
+            for (String fingerprint : relaysToRemove) {
+              wfuHistory.remove(fingerprint);
+            }
+            relaysToRemove.clear();
+            for (Map.Entry<String, long[]> e : wbHistory.entrySet()) {
+              long[] w = e.getValue();
+              w[0] *= 19L;
+              w[0] /= 20L;
+              w[1] *= 19L;
+              w[1] /= 20L;
+              if (w[1] < 1L) {
+                relaysToRemove.add(e.getKey());
+              }
+            }
+            for (String fingerprint : relaysToRemove) {
+              wbHistory.remove(fingerprint);
+            }
+            nextWeightingInterval -= 1L;
+          }
+
+          /* Increment weighted written bytes and total weighted time for
+           * all running relays. */
+          for (Map.Entry<String, String> runningRelay :
+              runningRelays.entrySet()) {
+            String fingerprint = runningRelay.getKey();
+            String[] relayParts = runningRelay.getValue().split(",");
+            long writtenSum = relayParts.length < 7 ? 0 :
+                Long.parseLong(relayParts[6]);
+            long[] w = wbHistory.containsKey(fingerprint) ?
+                wbHistory.get(fingerprint) : new long[2];
+            w[0] += writtenSum * secondsSinceLastValidAfter /
+                (24L * 60L * 60L);
+            w[1] += secondsSinceLastValidAfter;
+            wbHistory.put(fingerprint, w);
+          }
+
+          /* Increment weighted uptime for all running relays that have
+           * not been restarted by seconds since last valid-after time. */
+          for (Map.Entry<String, String> runningRelay :
+              runningRelays.entrySet()) {
+            String fingerprint = runningRelay.getKey();
+            boolean restarted = runningRelay.getValue().
+                split(",")[2].equals("t");
+            long seconds = restarted ? 0 : secondsSinceLastValidAfter;
+            if (!wfuHistory.containsKey(fingerprint)) {
+              wfuHistory.put(fingerprint, new long[] { seconds, 0L });
+            } else {
+              wfuHistory.get(fingerprint)[0] += seconds;
+            }
+          }
+
+          /* Increment total weighted time for all relays by seconds since
+           * last valid-after time. */
+          for (long[] w : wfuHistory.values()) {
+            w[1] += secondsSinceLastValidAfter;
+          }
+
+          /* Iterate over our time until next failure history first and
+           * see if these relays have been running in the considered
+           * consensus.  Remember changes to our history and modify it
+           * below to avoid concurrent modification errors. */
+          Set<String> removeFromHistory = new HashSet<String>(),
+              foundInHistory = new HashSet<String>();
+          Map<String, Long> addToHistory = new HashMap<String, Long>();
+          for (Map.Entry<String, Long> e : tunfHistory.entrySet()) {
+            String fingerprint = e.getKey();
+            if (runningRelays.containsKey(fingerprint)) {
+
+              /* This relay has been running, so update our history. */
+              boolean restarted = runningRelays.get(fingerprint).
+                  split(",")[2].equals("t");
+              if (restarted) {
+                removeFromHistory.add(fingerprint);
+              } else {
+                addToHistory.put(fingerprint, secondsSinceLastValidAfter
+                    + e.getValue());
+              }
+              foundInHistory.add(fingerprint);
+            } else {
+
+              /* This relay has not been running, so remove it from our
+               * history. */
+              removeFromHistory.add(fingerprint);
+            }
+          }
+
+          /* Update our history for real now.  We couldn't do this above,
+           * or we'd have modified the set we've been iterating over. */
+          for (String f : removeFromHistory) {
+            tunfHistory.remove(f);
+          }
+          for (Map.Entry<String, Long> e : addToHistory.entrySet()) {
+            tunfHistory.put(e.getKey(), e.getValue());
+          }
+
+          /* Iterate over the relays that we found in the consensus, but
+           * that we didn't have in our history. */
+          for (Map.Entry<String, String> e : runningRelays.entrySet()) {
+            String fingerprint = e.getKey();
+            if (!foundInHistory.contains(fingerprint)) {
+              boolean restarted = e.getValue().split(",")[2].equals("t");
+              if (!restarted) {
+                tunfHistory.put(fingerprint, 0L);
+              }
+            }
+          }
+
+          /* If the consensus falls within our analysis interval, write
+           * future WFUs, TUNFs, and WBs for all known relays to disk. */
+          if (lastValidAfter.compareTo(analyzeFrom) >= 0) {
+            File futureStabilityFile = new File("future-stability",
+                formatter.format(lastValidAfterMillis));
+            futureStabilityFile.getParentFile().mkdirs();
+            BufferedWriter bw = new BufferedWriter(new FileWriter(
+                futureStabilityFile));
+            for (String fingerprint : runningRelays.keySet()) {
+              long[] wfu = wfuHistory.get(fingerprint);
+              long tunf = tunfHistory.containsKey(fingerprint) ?
+                  tunfHistory.get(fingerprint) : 0;
+              long[] wb = wbHistory.get(fingerprint);
+              bw.write(fingerprint + " " + tunf + " "
+                  + ((10000L * wfu[0]) / wfu[1]) + " "
+                  + (wb[0] / wb[1]) + "\n");
+            }
+            bw.close();
+          }
+
+          /* Prepare for next consensus. */
+          runningRelays.clear();
+          lastButOneValidAfter = lastValidAfter;
+        }
+
+        /* Add the running relay lines to a map that we parse once we have
+         * all lines of a consensus. */
+        String fingerprint = parts[1];
+        runningRelays.put(fingerprint, line);
+        lastValidAfter = validAfter;
+      }
+    }
+
+    /* Run the Guard simulation for the following WFU and advertised
+     * bandwidth percentiles: */
+    List<Integer> wfuPercentiles = new ArrayList<Integer>();
+    for (int l : new int[] { 30, 40, 50, 60, 70 }) {
+      wfuPercentiles.add(l);
+    }
+    List<Integer> advBwPercentiles = new ArrayList<Integer>();
+    for (int l : new int[] { 30, 40, 50, 60, 70 }) {
+      advBwPercentiles.add(l);
+    }
+
+    /* Run the Stable simulation for the following WMTBF percentiles: */
+    List<Integer> wmtbfPercentiles = new ArrayList<Integer>();
+    for (int l : new int[] { 30, 40, 50, 60, 70 }) {
+      wmtbfPercentiles.add(l);
+    }
+
+    /* Add column headers to output file. */
+    SortedSet<String> columns = new TreeSet<String>();
+    columns.addAll(new ArrayList<String>(Arrays.asList(("running,"
+        + "guard,mwfu,perc15wfu,perc10wfu,perc5wfu,"
+        + "guardintersect,guardobserved,guardsimulated,"
+        + "minwta,minwtb,familiar,"
+        + "perc1fwb,perc2fwb,perc5fwb,perc10fwb,"
+        + "exclwt,exclwfu,excladvbw").split(","))));
+    for (int wfuPercentile : wfuPercentiles) {
+      for (int advBwPercentile : advBwPercentiles) {
+        String simulation = wfuPercentile + "wfu" + advBwPercentile
+            + "advbw";
+        columns.add("minwfua" + wfuPercentile + "wfu");
+        columns.add("minwfub" + wfuPercentile + "wfu");
+        columns.add("minadvbwa" + advBwPercentile + "advbw");
+        columns.add("minadvbwb" + advBwPercentile + "advbw");
+        columns.add("guard" + simulation);
+        columns.add("mwfu" + simulation);
+        columns.add("perc15wfu" + simulation);
+        columns.add("perc10wfu" + simulation);
+        columns.add("perc5wfu" + simulation);
+        columns.add("perc1fwb" + simulation);
+        columns.add("perc2fwb" + simulation);
+        columns.add("perc5fwb" + simulation);
+        columns.add("perc10fwb" + simulation);
+      }
+    }
+    columns.addAll(new ArrayList<String>(Arrays.asList(("stable,"
+        + "perc25tunf,perc20tunf,perc15tunf,perc10tunf,perc5tunf,"
+        + "stableintersect,stableobserved,stablesimulated").split(","))));
+    for (int wmtbfPercentile : wmtbfPercentiles) {
+      columns.add("minwmtbfa" + wmtbfPercentile);
+      columns.add("minwmtbfb" + wmtbfPercentile);
+      columns.add("stable" + wmtbfPercentile);
+      columns.add("mtunf" + wmtbfPercentile);
+      columns.add("perc25tunf" + wmtbfPercentile);
+      columns.add("perc20tunf" + wmtbfPercentile);
+      columns.add("perc15tunf" + wmtbfPercentile);
+      columns.add("perc10tunf" + wmtbfPercentile);
+      columns.add("perc5tunf" + wmtbfPercentile);
+    }
+    BufferedWriter bw = new BufferedWriter(new FileWriter(
+        "stability.csv"));
+    bw.write("time");
+    for (String column : columns) {
+      bw.write("," + column);
+    }
+    bw.write("\n");
+
+    /* Track weighted uptime and total weighted time in a long[2]. */
+    SortedMap<String, long[]> wfuHistory = new TreeMap<String, long[]>();
+
+    /* Track weighted run length, total run weights, and current run
+     * length in a double[3]. */
+    SortedMap<String, double[]> mtbfHistory =
+        new TreeMap<String, double[]>();
+
+    /* Parse previously exported network status entries again, but this
+     * time in forward order. */
+    SimpleDateFormat formatter = new SimpleDateFormat(
+        "yyyy-MM-dd-HH-mm-ss");
+    formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
+    SimpleDateFormat isoFormatter = new SimpleDateFormat(
+        "yyyy-MM-dd HH:mm:ss");
+    isoFormatter.setTimeZone(TimeZone.getTimeZone("UTC"));
+    Map<String, String> runningRelays = new HashMap<String, String>();
+    Set<String> guardRelays = new HashSet<String>(),
+        stableRelays = new HashSet<String>();
+    BufferedReader br = new BufferedReader(new FileReader(
+        "running-relays-forward.csv"));
+    String line, lastValidAfter = null, firstValidAfter = null,
+        lastButOneValidAfter = null;
+    long nextWeightingInterval = -1L;
+    while ((line = br.readLine()) != null) {
+      if (!line.startsWith("20")) {
+        continue;
+      }
+      String[] parts = line.split(",");
+      String validAfter = parts[0];
+      if (firstValidAfter == null) {
+        firstValidAfter = validAfter;
+      }
+      if (lastValidAfter != null &&
+          !lastValidAfter.equals(validAfter)) {
+
+        /* We just parsed the very first consensus.  There's nothing to
+         * do here. */
+        if (lastButOneValidAfter == null) {
+          runningRelays.clear();
+          guardRelays.clear();
+          stableRelays.clear();
+          lastButOneValidAfter = lastValidAfter;
+          continue;
+        }
+
+        /* We just parsed the first consensus outside the analysis
+         * interval.  Stop analysis here. */
+        if (lastValidAfter.compareTo(analyzeTo) > 0) {
+          break;
+        }
+
+        /* We just parsed all lines of a consensus.  First, see if 12
+         * hours have passed since we last discounted uptimes and total
+         * times.  If so, discount both variables for all known relays by
+         * factor 0.95 (or 19/20 since these are long integers) and remove
+         * those relays with a weighted fractional uptime below 1/10000.
+         * Also, discount weighted run length and total run weights for
+         * all known relays by factor 0.95. */
+        long lastValidAfterMillis = isoFormatter.parse(lastValidAfter).
+            getTime();
+        long secondsSinceLastValidAfter = (lastValidAfterMillis
+            - isoFormatter.parse(lastButOneValidAfter).getTime()) / 1000L;
+        long weightingInterval = lastValidAfterMillis
+            / (12L * 60L * 60L * 1000L);
+        if (nextWeightingInterval < 0L) {
+          nextWeightingInterval = weightingInterval;
+        }
+        while (weightingInterval > nextWeightingInterval) {
+          Set<String> relaysToRemove = new HashSet<String>();
+          for (Map.Entry<String, long[]> e : wfuHistory.entrySet()) {
+            long[] w = e.getValue();
+            w[0] *= 19L;
+            w[0] /= 20L;
+            w[1] *= 19L;
+            w[1] /= 20L;
+            if (((10000L * w[0]) / w[1]) < 1L) {
+              relaysToRemove.add(e.getKey());
+            }
+          }
+          for (String fingerprint : relaysToRemove) {
+            wfuHistory.remove(fingerprint);
+          }
+          for (Map.Entry<String, double[]> e : mtbfHistory.entrySet()) {
+            double[] w = e.getValue();
+            w[0] *= 0.95;
+            w[1] *= 0.95;
+          }
+          nextWeightingInterval += 1L;
+        }
+
+        /* Increment weighted uptime for all running relays that have not
+         * been restarted by seconds since last valid-after time. */
+        for (Map.Entry<String, String> runningRelay :
+            runningRelays.entrySet()) {
+          String fingerprint = runningRelay.getKey();
+          boolean restarted = runningRelay.getValue().
+              split(",")[2].equals("t");
+          long seconds = restarted ? 0 : secondsSinceLastValidAfter;
+          if (!wfuHistory.containsKey(fingerprint)) {
+            wfuHistory.put(fingerprint, new long[] { seconds, 0L });
+          } else {
+            wfuHistory.get(fingerprint)[0] += seconds;
+          }
+        }
+
+        /* Increment total weighted time for all relays by seconds since
+         * last valid-after time. */
+        for (long[] w : wfuHistory.values()) {
+          w[1] += secondsSinceLastValidAfter;
+        }
+
+        /* Update MTBF history for running relays.  Start by iterating
+         * over all relays in the history, see if they're running now and
+         * whether they have been restarted.  Distinguish four cases for
+         * relays in the history: 1) still running, 2) still running but
+         * restarted, 3) started in this consensus, 4) stopped in this
+         * consensus. */
+        Set<String> updatedRelays = new HashSet<String>();
+        for (Map.Entry<String, double[]> e : mtbfHistory.entrySet()) {
+          String fingerprint = e.getKey();
+          double[] w = e.getValue();
+          if (runningRelays.containsKey(fingerprint)) {
+            if (w[2] > 0.1) {
+              if (!runningRelays.get(fingerprint).split(",")[2].
+                  equals("t")) {
+
+                /* Case 1) still running: */
+                w[2] += (double) secondsSinceLastValidAfter;
+              } else {
+
+                /* Case 2) still running but restarted: */
+                w[0] += w[2];
+                w[1] += 1.0;
+                w[2] = (double) secondsSinceLastValidAfter;
+              }
+            } else {
+
+              /* Case 3) started in this consensus: */
+              w[2] = (double) secondsSinceLastValidAfter;
+            }
+
+            /* Mark relay as already processed, or we'd add it to the
+             * history as a new relay below. */
+            updatedRelays.add(fingerprint);
+          } else if (w[2] > 0.1) {
+
+            /* Case 4) stopped in this consensus: */
+            w[0] += w[2];
+            w[1] += 1.0;
+            w[2] = 0.0;
+          }
+        }
+
+        /* Iterate over the set of currently running relays and add those
+         * that we haven't processed above to our history. */
+        for (String fingerprint : runningRelays.keySet()) {
+          if (!updatedRelays.contains(fingerprint)) {
+            updatedRelays.add(fingerprint);
+            mtbfHistory.put(fingerprint, new double[] { 0.0, 0.0,
+                (double) secondsSinceLastValidAfter });
+          }
+        }
+
+        /* Read previously calculated future WFUs, TUNFs, and WBs from
+         * disk. */
+        Map<String, Long> fwfus = new HashMap<String, Long>(),
+            tunfs = new HashMap<String, Long>(),
+            fwbs = new HashMap<String, Long>();
+        File futureStabilityFile = new File("future-stability",
+            formatter.format(lastValidAfterMillis));
+        if (!futureStabilityFile.exists()) {
+          if (!lastValidAfter.equals(firstValidAfter) &&
+              lastValidAfter.compareTo(analyzeFrom) >= 0) {
+            System.out.println("Could not find file "
+                + futureStabilityFile + ". Skipping simulation!");
+          }
+        } else if (lastValidAfter.compareTo(analyzeFrom) >= 0) {
+          BufferedReader fsBr = new BufferedReader(new FileReader(
+              futureStabilityFile));
+          String fsLine;
+          while ((fsLine = fsBr.readLine()) != null) {
+            String[] fsParts = fsLine.split(" ");
+            tunfs.put(fsParts[0], Long.parseLong(fsParts[1]));
+            fwfus.put(fsParts[0], Long.parseLong(fsParts[2]));
+            fwbs.put(fsParts[0], Long.parseLong(fsParts[3]));
+          }
+          fsBr.close();
+
+          /* Prepare writing results. */
+          Map<String, String> results = new HashMap<String, String>();
+          results.put("running", "" + runningRelays.size());
+          results.put("guard", "" + guardRelays.size());
+          results.put("stable", "" + stableRelays.size());
+
+          /* Look up future WFUs and calculate advertised bandwidth
+           * percentiles of relays that actually got the Guard flag
+           * assigned. */
+          long totalFwfu = 0L, totalFwb = 0L;
+          List<Long> fwfuList = new ArrayList<Long>();
+          for (String fingerprint : guardRelays) {
+            long fwfu = fwfus.get(fingerprint);
+            totalFwfu += fwfu;
+            fwfuList.add(fwfu);
+          }
+          Collections.sort(fwfuList);
+          results.put("mwfu", "" + (totalFwfu / guardRelays.size()));
+          results.put("perc15wfu", ""
+              + fwfuList.get((15 * fwfuList.size()) / 100));
+          results.put("perc10wfu", ""
+              + fwfuList.get((10 * fwfuList.size()) / 100));
+          results.put("perc5wfu", ""
+              + fwfuList.get((5 * fwfuList.size()) / 100));
+          List<Long> fwbList = new ArrayList<Long>();
+          for (String fingerprint : guardRelays) {
+            long fwb = fwbs.get(fingerprint);
+            totalFwb += fwb;
+            fwbList.add(fwb);
+          }
+          if (fwbList.size() > 20) {
+            Collections.sort(fwbList);
+            results.put("perc1fwb", "" + fwbList.get(
+                fwbList.size() / 100));
+            results.put("perc2fwb", "" + fwbList.get(
+                fwbList.size() / 50));
+            results.put("perc5fwb", "" + fwbList.get(
+                fwbList.size() / 20));
+            results.put("perc10fwb", "" + fwbList.get(
+                fwbList.size() / 10));
+          }
+
+          /* Prepare calculating thresholds for assigning the Guard flag
+           * in simulations. */
+          List<Long> advertisedBandwidths = new ArrayList<Long>(),
+              totalWeightedTimes = new ArrayList<Long>();
+          for (Map.Entry<String, String> e : runningRelays.entrySet()) {
+            String[] relayParts = e.getValue().split(",");
+            long advertisedBandwidth = relayParts[5].length() == 0 ? 0 :
+                Long.parseLong(relayParts[5]);
+            advertisedBandwidths.add(advertisedBandwidth);
+            totalWeightedTimes.add(wfuHistory.get(e.getKey())[1]);
+          }
+          Collections.sort(advertisedBandwidths);
+          Collections.sort(totalWeightedTimes);
+          long minimumTotalWeightedTime = totalWeightedTimes.get((
+              1 * totalWeightedTimes.size()) / 8);
+          results.put("minwta", "" + minimumTotalWeightedTime);
+          if (minimumTotalWeightedTime > 8L * 24L * 60L * 60L) {
+            minimumTotalWeightedTime = 8L * 24L * 60L * 60L;
+          }
+          results.put("minwtb", "" + minimumTotalWeightedTime);
+          List<Long> weightedFractionalUptimesFamiliar =
+              new ArrayList<Long>();
+          for (Map.Entry<String, String> e : runningRelays.entrySet()) {
+            long[] wfuHistoryEntry = wfuHistory.get(e.getKey());
+            long totalWeightedTime = wfuHistoryEntry[1];
+            if (totalWeightedTime >= minimumTotalWeightedTime) {
+              long weightedFractionalUptime =
+                  (10000L * wfuHistoryEntry[0]) / wfuHistoryEntry[1];
+              weightedFractionalUptimesFamiliar.add(
+                  weightedFractionalUptime);
+            }
+          }
+          Collections.sort(weightedFractionalUptimesFamiliar);
+          results.put("familiar",
+              "" + weightedFractionalUptimesFamiliar.size());
+
+          /* Run Guard simulation for the relays in the current consensus
+           * for various WFU percentiles. */
+          for (int wfuPercentile : wfuPercentiles) {
+            for (int advBwPercentile : advBwPercentiles) {
+              String simulation = wfuPercentile + "wfu" + advBwPercentile
+                  + "advbw";
+              long minimumAdvertisedBandwidth = advertisedBandwidths.get(
+                  (advBwPercentile * advertisedBandwidths.size()) / 100);
+              results.put("minadvbwa" + advBwPercentile + "advbw",
+                  "" + minimumAdvertisedBandwidth);
+              if (minimumAdvertisedBandwidth > 250L * 1024L) {
+                minimumAdvertisedBandwidth = 250L * 1024L;
+              }
+              results.put("minadvbwb" + advBwPercentile + "advbw",
+                  "" + minimumAdvertisedBandwidth);
+              long minimumWeightedFractionalUptime =
+                  weightedFractionalUptimesFamiliar.get((wfuPercentile
+                  * weightedFractionalUptimesFamiliar.size()) / 100);
+              results.put("minwfua" + wfuPercentile + "wfu",
+                  "" + minimumWeightedFractionalUptime);
+              if (minimumWeightedFractionalUptime > 9800) {
+                minimumWeightedFractionalUptime = 9800;
+              }
+              results.put("minwfub" + wfuPercentile + "wfu",
+                  "" + minimumWeightedFractionalUptime);
+              totalFwfu = 0L;
+              totalFwb = 0L;
+              fwfuList.clear();
+              fwbList.clear();
+              Set<String> selectedRelays = new HashSet<String>();
+              int excladvbw = 0, exclwt = 0, exclwfu = 0;
+              for (String relay : runningRelays.values()) {
+                boolean notEnoughAdvertisedBandwidth = false,
+                    notEnoughTotalWeightedTime = false,
+                    notEnoughWeightedFractionalUptime = false,
+                    selected = true;
+                String[] relayParts = relay.split(",");
+                long advertisedBandwidth = relayParts[5].length() == 0 ?
+                    0 : Long.parseLong(relayParts[5]);
+                if (advertisedBandwidth < minimumAdvertisedBandwidth) {
+                  notEnoughAdvertisedBandwidth = true;
+                  selected = false;
+                }
+                String fingerprint = relayParts[1];
+                long[] wfuHistoryEntry = wfuHistory.get(fingerprint);
+                long totalWeightedTime = wfuHistoryEntry[1];
+                if (totalWeightedTime < minimumTotalWeightedTime) {
+                  notEnoughTotalWeightedTime = true;
+                  selected = false;
+                }
+                long weightedFractionalUptime =
+                    (10000L * wfuHistoryEntry[0]) / wfuHistoryEntry[1];
+                if (weightedFractionalUptime <
+                    minimumWeightedFractionalUptime) {
+                  notEnoughWeightedFractionalUptime = true;
+                  selected = false;
+                }
+                if (selected) {
+                  long fwfu = fwfus.get(fingerprint);
+                  totalFwfu += fwfu;
+                  fwfuList.add(fwfu);
+                  long fwb = fwbs.get(fingerprint);
+                  totalFwb += fwb;
+                  fwbList.add(fwb);
+                  selectedRelays.add(fingerprint);
+                } else if (guardRelays.contains(fingerprint)) {
+                  if (notEnoughWeightedFractionalUptime) {
+                    exclwfu++;
+                  }
+                  if (notEnoughTotalWeightedTime) {
+                    exclwt++;
+                  }
+                  if (notEnoughAdvertisedBandwidth) {
+                    excladvbw++;
+                  }
+                }
+              }
+
+              /* Calculate percentiles of future WFU and of advertised
+               * bandwidth as the simulation results. */
+              results.put("guard" + simulation,
+                  "" + selectedRelays.size());
+              if (fwfuList.size() > 0) {
+                Collections.sort(fwfuList);
+                results.put("mwfu" + simulation,
+                    "" + (totalFwfu / fwfuList.size()));
+                results.put("perc15wfu" + simulation,
+                    "" + fwfuList.get((15 * fwfuList.size()) / 100));
+                results.put("perc10wfu" + simulation,
+                    "" + fwfuList.get((10 * fwfuList.size()) / 100));
+                results.put("perc5wfu" + simulation,
+                    "" + fwfuList.get((5 * fwfuList.size()) / 100));
+              }
+              if (fwbList.size() > 20) {
+                Collections.sort(fwbList);
+                results.put("perc1fwb" + simulation,
+                    "" + fwbList.get(fwbList.size() / 100));
+                results.put("perc2fwb" + simulation,
+                    "" + fwbList.get(fwbList.size() / 50));
+                results.put("perc5fwb" + simulation,
+                    "" + fwbList.get(fwbList.size() / 20));
+                results.put("perc10fwb" + simulation,
+                    "" + fwbList.get(fwbList.size() / 10));
+              }
+
+              /* If this is the simulation using default values, compare
+               * selected Guard relays with observed Guard relays. */
+              if (wfuPercentile == 50 && advBwPercentile == 50) {
+                Set<String> intersection = new HashSet<String>();
+                intersection.addAll(guardRelays);
+                intersection.retainAll(selectedRelays);
+                Set<String> observedOnly = new HashSet<String>();
+                observedOnly.addAll(guardRelays);
+                observedOnly.removeAll(selectedRelays);
+                Set<String> simulatedOnly = new HashSet<String>();
+                simulatedOnly.addAll(selectedRelays);
+                simulatedOnly.removeAll(guardRelays);
+                results.put("guardintersect", "" + intersection.size());
+                results.put("guardobserved", "" + observedOnly.size());
+                results.put("guardsimulated", "" + simulatedOnly.size());
+                results.put("exclwt", "" + exclwt);
+                results.put("exclwfu", "" + exclwfu);
+                results.put("excladvbw", "" + excladvbw);
+              }
+            }
+          }
+
+          /* Look up TUNFs of relays that actually got the Stable flag
+           * assigned. */
+          long totalTunf = 0L;
+          List<Long> tunfList = new ArrayList<Long>();
+          for (String fingerprint : stableRelays) {
+            long tunf = tunfs.get(fingerprint);
+            totalTunf += tunf;
+            tunfList.add(tunf);
+          }
+          Collections.sort(tunfList);
+          results.put("perc25tunf", ""
+              + tunfList.get((25 * tunfList.size()) / 100));
+          results.put("perc20tunf", ""
+              + tunfList.get((20 * tunfList.size()) / 100));
+          results.put("perc15tunf", ""
+              + tunfList.get((15 * tunfList.size()) / 100));
+          results.put("perc10tunf", ""
+              + tunfList.get((10 * tunfList.size()) / 100));
+          results.put("perc5tunf", ""
+              + tunfList.get((5 * tunfList.size()) / 100));
+
+          /* Prepare calculating thresholds for assigning the Stable flag
+           * in simulations. */
+          List<Long> wmtbfs = new ArrayList<Long>();
+          for (String fingerprint : runningRelays.keySet()) {
+            double[] w = mtbfHistory.get(fingerprint);
+            double totalRunLength = w[0] + w[2];
+            double totalWeights = w[1] + (w[2] > 0.1 ? 1.0 : 0.0);
+            long wmtbf = totalWeights < 0.0001 ? 0
+                : (long) (totalRunLength / totalWeights);
+            wmtbfs.add(wmtbf);
+          }
+          Collections.sort(wmtbfs);
+
+          /* Run Stable simulation for the relays in the current consensus
+           * for various WMTBF percentiles. */
+          for (int wmtbfPercentile : wmtbfPercentiles) {
+            long minimumWeightedMeanTimeBetweenFailure =
+                wmtbfs.get((wmtbfPercentile * wmtbfs.size()) / 100);
+            results.put("minwmtbfa" + wmtbfPercentile,
+                "" + minimumWeightedMeanTimeBetweenFailure);
+            if (minimumWeightedMeanTimeBetweenFailure >
+                5L * 24L * 60L * 60L) {
+              minimumWeightedMeanTimeBetweenFailure =
+                  5L * 24L * 60L * 60L;
+            }
+            results.put("minwmtbfb" + wmtbfPercentile,
+                "" + minimumWeightedMeanTimeBetweenFailure);
+            totalTunf = 0L;
+            tunfList.clear();
+            Set<String> selectedRelays = new HashSet<String>();
+            for (String fingerprint : runningRelays.keySet()) {
+              double[] w = mtbfHistory.get(fingerprint);
+              double totalRunLength = w[0] + w[2];
+              double totalWeights = w[1] + (w[2] > 0.1 ? 1.0 : 0.0);
+              long wmtbf = totalWeights < 0.0001 ? 0
+                  : (long) (totalRunLength / totalWeights);
+              if (wmtbf < minimumWeightedMeanTimeBetweenFailure) {
+                continue;
+              }
+              long tunf = tunfs.get(fingerprint);
+              totalTunf += tunf;
+              tunfList.add(tunf);
+              selectedRelays.add(fingerprint);
+            }
+            results.put("stable" + wmtbfPercentile, "" + tunfList.size());
+            if (tunfList.size() > 0L) {
+              Collections.sort(tunfList);
+              results.put("mtunf" + wmtbfPercentile,
+                  "" + (totalTunf / tunfList.size()));
+              results.put("perc25tunf"
+                  + wmtbfPercentile,
+                  "" + tunfList.get((25 * tunfList.size()) / 100));
+              results.put("perc20tunf"
+                  + wmtbfPercentile,
+                  "" + tunfList.get((20 * tunfList.size()) / 100));
+              results.put("perc15tunf"
+                  + wmtbfPercentile,
+                  "" + tunfList.get((15 * tunfList.size()) / 100));
+              results.put("perc10tunf"
+                  + wmtbfPercentile,
+                  "" + tunfList.get((10 * tunfList.size()) / 100));
+              results.put("perc5tunf"
+                  + wmtbfPercentile,
+                  "" + tunfList.get((5 * tunfList.size()) / 100));
+            }
+
+            /* If this is the simulation using default values, compare
+             * selected Stable relays with observed Stable relays. */
+            if (wmtbfPercentile == 50) {
+              Set<String> intersection = new HashSet<String>();
+              intersection.addAll(stableRelays);
+              intersection.retainAll(selectedRelays);
+              Set<String> observedOnly = new HashSet<String>();
+              observedOnly.addAll(stableRelays);
+              observedOnly.removeAll(selectedRelays);
+              Set<String> simulatedOnly = new HashSet<String>();
+              simulatedOnly.addAll(selectedRelays);
+              simulatedOnly.removeAll(stableRelays);
+              results.put("stableintersect", "" + intersection.size());
+              results.put("stableobserved", "" + observedOnly.size());
+              results.put("stablesimulated", "" + simulatedOnly.size());
+            }
+          }
+
+          /* Write results. */
+          bw.write(lastValidAfter);
+          for (String column : columns) {
+            if (results.containsKey(column)) {
+              bw.write("," + results.get(column));
+            } else {
+              bw.write(",NA");
+            }
+          }
+          bw.write("\n");
+        }
+
+        /* We're done with this consensus.  Prepare for the next. */
+        runningRelays.clear();
+        guardRelays.clear();
+        stableRelays.clear();
+        lastButOneValidAfter = lastValidAfter;
+      }
+
+      /* Add the running relay lines to a map that we parse once we have
+       * all lines of a consensus. */
+      String fingerprint = parts[1];
+      runningRelays.put(fingerprint, line);
+      if (parts[3].equals("t")) {
+        stableRelays.add(fingerprint);
+      }
+      if (parts[4].equals("t")) {
+        guardRelays.add(fingerprint);
+      }
+      lastValidAfter = validAfter;
+    }
+    bw.close();
+  }
+}
+
diff --git a/task-2911/mtbf-sim/SimulateMeanTimeBetweenFailure.java b/task-2911/mtbf-sim/SimulateMeanTimeBetweenFailure.java
deleted file mode 100644
index cd73f82..0000000
--- a/task-2911/mtbf-sim/SimulateMeanTimeBetweenFailure.java
+++ /dev/null
@@ -1,351 +0,0 @@
-/**
- * Simulate variation of mean time between failure on Stable relays.  The
- * simulation is based on the previously generated SQL results containing
- * network status entries and parts of server descriptors.  In a first
- * step, parse the SQL results that are in descending order to calculate
- * time until next failure for all relays and write them to disk as one
- * file per network status in tunf/$filename.  (Skip this step if there is
- * already a tunf/ directory.)  In a second step, parse the network
- * statuses again, but this time from first to last, calculate mean times
- * between failure for all relays, form relay subsets based on minimal
- * MTBF, look up what the time until next failure would be for a subset,
- * and write results to mtbf-sim.csv to disk. */
-import java.io.*;
-import java.text.*;
-import java.util.*;
-public class SimulateMeanTimeBetweenFailure {
-  public static void main(String[] args) throws Exception {
-
-    /* Measure how long this execution takes. */
-    long started = System.currentTimeMillis();
-
-    /* Decide whether we need to do the reverse run, or if we can use
-     * previous results. */
-    if (!new File("tunf").exists()) {
-
-      /* For each relay as identified by its hex encoded fingerprint,
-       * track time until next failure in seconds in a long. */
-      SortedMap<String, Long> knownRelays = new TreeMap<String, Long>();
-
-      /* Parse previously exported network status entries in reverse
-       * order. */
-      SimpleDateFormat formatter = new SimpleDateFormat(
-          "yyyy-MM-dd-HH-mm-ss");
-      formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
-      SimpleDateFormat isoFormatter = new SimpleDateFormat(
-          "yyyy-MM-dd HH:mm:ss");
-      isoFormatter.setTimeZone(TimeZone.getTimeZone("UTC"));
-      Map<String, String> runningRelays = new HashMap<String, String>();
-      BufferedReader br = new BufferedReader(new FileReader(
-          "running-relays-reverse.csv"));
-      String line, lastValidAfter = null, lastButOneValidAfter = null;
-      while ((line = br.readLine()) != null) {
-        if (!line.startsWith("20")) {
-          continue;
-        }
-        String[] parts = line.split(",");
-        String validAfter = parts[0];
-        if (lastValidAfter != null &&
-            !lastValidAfter.equals(validAfter)) {
-
-          /* We just parsed all lines of a consensus.  Let's write times
-           * until next failure to disk for all running relays and update
-           * our internal history. */
-          if (lastButOneValidAfter == null) {
-            lastButOneValidAfter = lastValidAfter;
-          }
-          long lastValidAfterMillis = isoFormatter.parse(lastValidAfter).
-              getTime();
-          File tunfFile = new File("tunf",
-              formatter.format(lastValidAfterMillis));
-          tunfFile.getParentFile().mkdirs();
-          BufferedWriter bw = new BufferedWriter(new FileWriter(
-              tunfFile));
-          long secondsSinceLastValidAfter =
-              (isoFormatter.parse(lastButOneValidAfter).getTime()
-              - lastValidAfterMillis) / 1000L;
-
-          /* Iterate over our history first and see if these relays have
-           * been running in the considered consensus.  Remember changes
-           * to our history and modify it below to avoid concurrent
-           * modification errors. */
-          Set<String> removeFromHistory = new HashSet<String>();
-          Map<String, Long> addToHistory = new HashMap<String, Long>();
-          for (Map.Entry<String, Long> e : knownRelays.entrySet()) {
-            String fingerprint = e.getKey();
-            if (runningRelays.containsKey(fingerprint)) {
-
-              /* This relay has been running, so write it to the output
-               * file and update our history. */
-              long hoursUntilFailure = e.getValue();
-              bw.write(fingerprint + "," + (secondsSinceLastValidAfter
-                  + hoursUntilFailure) + "\n");
-              boolean restarted = runningRelays.get(fingerprint).
-                  split(",")[2].equals("t");
-              if (restarted) {
-                removeFromHistory.add(fingerprint);
-              } else {
-                addToHistory.put(fingerprint, secondsSinceLastValidAfter
-                    + hoursUntilFailure);
-              }
-              runningRelays.remove(fingerprint);
-            } else {
-
-              /* This relay has not been running, so remove it from our
-               * history. */
-              removeFromHistory.add(fingerprint);
-            }
-          }
-
-          /* Update our history for real now.  We couldn't do this above,
-           * or we'd have modified the set we've been iterating over. */
-          for (String f : removeFromHistory) {
-            knownRelays.remove(f);
-          }
-          for (Map.Entry<String, Long> e : addToHistory.entrySet()) {
-            knownRelays.put(e.getKey(), e.getValue());
-          }
-
-          /* Iterate over the relays that we found in the consensus, but
-           * that we didn't have in our history. */
-          for (Map.Entry<String, String> e : runningRelays.entrySet()) {
-            String fingerprint = e.getKey();
-            bw.write(fingerprint + ",0\n");
-            boolean restarted = e.getValue().split(",")[2].equals("t");
-            if (!restarted) {
-              knownRelays.put(fingerprint, 0L);
-            }
-          }
-          bw.close();
-
-          /* Prepare for next consensus. */
-          runningRelays = new HashMap<String, String>();
-          lastButOneValidAfter = lastValidAfter;
-        }
-
-        /* Add the running relay lines to a map that we parse once we have
-         * all lines of a consensus. */
-        String fingerprint = parts[1];
-        runningRelays.put(fingerprint, line);
-        lastValidAfter = validAfter;
-      }
-    }
-
-    /* Run the simulation for the following WMTBF percentiles: */
-    List<Long> requiredWMTBFs = new ArrayList<Long>();
-    for (long l : new long[] { 20, 30, 40, 50, 60, 70, 80 }) {
-      requiredWMTBFs.add(l);
-    }
-    Collections.sort(requiredWMTBFs);
-    BufferedWriter bw = new BufferedWriter(new FileWriter(
-        "mtbf-sim.csv"));
-    bw.write("time");
-    for (long requiredWMTBF : requiredWMTBFs) {
-      bw.write(",mtunf" + requiredWMTBF + ",perc75tunf" + requiredWMTBF
-      + ",perc80tunf" + requiredWMTBF + ",perc85tunf" + requiredWMTBF
-      + ",perc90tunf" + requiredWMTBF + ",perc95tunf" + requiredWMTBF
-      + ",wmtbf" + requiredWMTBF);
-    }
-    bw.write("\n");
-
-    /* For each relay as identified by its base-64 encoded fingerprint,
-     * track weighted run length, total run weights, and current run
-     * length in a double[3]. */
-    SortedMap<String, double[]> knownRelays =
-        new TreeMap<String, double[]>();
-
-    /* Parse previously exported network status entries again, but this
-     * time in forward order. */
-    SimpleDateFormat formatter = new SimpleDateFormat(
-        "yyyy-MM-dd-HH-mm-ss");
-    formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
-    SimpleDateFormat isoFormatter = new SimpleDateFormat(
-        "yyyy-MM-dd HH:mm:ss");
-    isoFormatter.setTimeZone(TimeZone.getTimeZone("UTC"));
-    Map<String, String> runningRelays = new HashMap<String, String>(),
-        lastRunningRelays = new HashMap<String, String>();
-    BufferedReader br = new BufferedReader(new FileReader(
-        "running-relays-forward.csv"));
-    String line, lastValidAfter = null, firstValidAfter = null;
-    long nextWeightingInterval = -1L;
-    while ((line = br.readLine()) != null) {
-      if (!line.startsWith("20")) {
-        continue;
-      }
-      String[] parts = line.split(",");
-      String validAfter = parts[0];
-      if (firstValidAfter == null) {
-        firstValidAfter = validAfter;
-      }
-      if (lastValidAfter != null &&
-          !lastValidAfter.equals(validAfter)) {
-
-        /* We just parsed all lines of a consensus.  First, see if 12
-         * hours have passed since we last discounted weighted run lengths
-         * and total run weights.  If so, discount both variables for all
-         * known relays by factor 0.95 (or 19/20 since these are long
-         * integers) and remove those relays with a total run weight below
-         * 1/10000. */
-        long lastValidAfterMillis = isoFormatter.parse(lastValidAfter).
-            getTime();
-        long validAfterMillis = isoFormatter.parse(validAfter).getTime();
-        long weightingInterval = validAfterMillis
-            / (12L * 60L * 60L * 1000L);
-        if (nextWeightingInterval < 0L) {
-          nextWeightingInterval = weightingInterval;
-        }
-        while (weightingInterval > nextWeightingInterval) {
-          Set<String> relaysToRemove = new HashSet<String>();
-          for (Map.Entry<String, double[]> e : knownRelays.entrySet()) {
-            double[] w = e.getValue();
-            w[0] *= 0.95;
-            w[1] *= 0.95;
-          }
-          for (String fingerprint : relaysToRemove) {
-            knownRelays.remove(fingerprint);
-          }
-          nextWeightingInterval += 1L;
-        }
-
-        /* Update history for running relays.  Start by iterating over all
-         * relays in the history, see if they're running now and whether
-         * they have been restarted.  Distinguish four cases for relays in
-         * the history: 1) still running, 2) still running but restarted,
-         * 3) started in this consensus, 4) stopped in this consensus. */
-        double secondsSinceLastValidAfter =
-            (double) ((validAfterMillis - lastValidAfterMillis) / 1000L);
-        Set<String> updatedRelays = new HashSet<String>();
-        for (Map.Entry<String, double[]> e : knownRelays.entrySet()) {
-          String fingerprint = e.getKey();
-          double[] w = e.getValue();
-          if (runningRelays.containsKey(fingerprint)) {
-            if (w[2] > 0.1) {
-              if (!runningRelays.get(fingerprint).split(",")[2].
-                  equals("t")) {
-
-                /* Case 1) still running: */
-                w[2] += secondsSinceLastValidAfter;
-              } else {
-
-                /* Case 2) still running but restarted: */
-                w[0] += w[2];
-                w[1] += 1.0;
-                w[2] = secondsSinceLastValidAfter;
-              }
-            } else {
-
-              /* Case 3) started in this consensus: */
-              w[2] = secondsSinceLastValidAfter;
-            }
-
-            /* Mark relay as already processed, or we'd add it to the
-             * history as a new relay below. */
-            updatedRelays.add(fingerprint);
-          } else if (w[2] > 0.1) {
-
-            /* Case 4) stopped in this consensus: */
-            w[0] += w[2];
-            w[1] += 1.0;
-            w[2] = 0.0;
-          }
-        }
-
-        /* Iterate over the set of currently running relays and add those
-         * that we haven't processed above to our history. */
-        for (String fingerprint : runningRelays.keySet()) {
-          if (!updatedRelays.contains(fingerprint)) {
-            updatedRelays.add(fingerprint);
-            knownRelays.put(fingerprint, new double[] { 0.0, 0.0,
-                secondsSinceLastValidAfter });
-          }
-        }
-
-        /* Calculate WMTBFs for all running relays and put them in a list
-         * that we can sort by WMTBF in descending order. */
-        List<String> wmtbfs = new ArrayList<String>();
-        for (String fingerprint : runningRelays.keySet()) {
-          double[] w = knownRelays.get(fingerprint);
-          double totalRunLength = w[0] + w[2];
-          double totalWeights = w[1] + (w[2] > 0.1 ? 1.0 : 0.0);
-          long wmtbf = totalWeights < 0.0001 ? 0
-              : (long) (totalRunLength / totalWeights);
-          wmtbfs.add(String.format("%012d %s", wmtbf, fingerprint));
-        }
-        Collections.sort(wmtbfs, Collections.reverseOrder());
-
-        /* Read previously calculated TUNFs from disk. */
-        Map<String, Long> tunfs = new HashMap<String, Long>();
-        File tunfFile = new File("tunf",
-            formatter.format(lastValidAfterMillis));
-        if (!tunfFile.exists()) {
-          if (!lastValidAfter.equals(firstValidAfter)) {
-            System.out.println("Could not find file " + tunfFile
-                + ". Skipping simulation!");
-          }
-        } else {
-          BufferedReader tunfBr = new BufferedReader(new FileReader(
-              tunfFile));
-          String tunfLine;
-          while ((tunfLine = tunfBr.readLine()) != null) {
-            String[] tunfParts = tunfLine.split(",");
-            tunfs.put(tunfParts[0], Long.parseLong(tunfParts[1]));
-          }
-          tunfBr.close();
-
-          /* Run the simulation for the relays in the current consensus
-           * for various required WFUs. */
-          bw.write(isoFormatter.format(lastValidAfterMillis));
-          long totalRelays = (long) wmtbfs.size(), selectedRelays = 0L,
-              totalTunf = 0L, minimalWmtbf = 0L;
-          int simulationIndex = 0;
-          List<Long> tunfList = new ArrayList<Long>();
-          for (String relay : wmtbfs) {
-            while (simulationIndex < requiredWMTBFs.size() &&
-                selectedRelays * 100L > totalRelays
-                * requiredWMTBFs.get(simulationIndex)) {
-              if (selectedRelays == 0L) {
-                bw.write(",NA,NA,NA,NA,NA,NA");
-              } else {
-                Collections.sort(tunfList, Collections.reverseOrder());
-                long perc75 = tunfList.get((75 * tunfList.size()) / 100);
-                long perc80 = tunfList.get((80 * tunfList.size()) / 100);
-                long perc85 = tunfList.get((85 * tunfList.size()) / 100);
-                long perc90 = tunfList.get((90 * tunfList.size()) / 100);
-                long perc95 = tunfList.get((95 * tunfList.size()) / 100);
-                bw.write("," + (totalTunf / selectedRelays) + "," + perc75
-                    + "," + perc80 + "," + perc85 + "," + perc90 + ","
-                    + perc95);
-              }
-              bw.write("," + minimalWmtbf);
-              simulationIndex++;
-            }
-            String[] wmtbfParts = relay.split(" ");
-            minimalWmtbf = Long.parseLong(wmtbfParts[0]);
-            String fingerprint = wmtbfParts[1];
-            long tunf = tunfs.get(fingerprint);
-            totalTunf += tunf;
-            tunfList.add(tunf);
-            selectedRelays += 1L;
-          }
-          bw.write("\n");
-        }
-
-        /* We're done with this consensus.  Prepare for the next. */
-        lastRunningRelays = runningRelays;
-        runningRelays = new HashMap<String, String>();
-      }
-
-      /* Add the running relay lines to a map that we parse once we have
-       * all lines of a consensus. */
-      String fingerprint = parts[1];
-      runningRelays.put(fingerprint, line);
-      lastValidAfter = validAfter;
-    }
-    bw.close();
-
-    /* Print how long this execution took and exit. */
-    System.out.println("Execution took " + ((System.currentTimeMillis()
-        - started) / (60L * 1000L)) + " minutes.");
-  }
-}
-
diff --git a/task-2911/mtbf-sim/mtbf-sim.R b/task-2911/mtbf-sim/mtbf-sim.R
deleted file mode 100644
index a630406..0000000
--- a/task-2911/mtbf-sim/mtbf-sim.R
+++ /dev/null
@@ -1,73 +0,0 @@
-library(ggplot2)
-
-data <- read.csv("mtbf-sim.csv", stringsAsFactors = FALSE)
-d <- data[data$time >= '2010' & data$time < '2011', ]
-d <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)), mean)
-d <- rbind(
-  data.frame(x = d$wmtbf30, y = d$perc90tunf30, sim = "30 %"),
-  data.frame(x = d$wmtbf40, y = d$perc90tunf40, sim = "40 %"),
-  data.frame(x = d$wmtbf50, y = d$perc90tunf50, sim = "50 % (default)"),
-  data.frame(x = d$wmtbf60, y = d$perc90tunf60, sim = "60 %"),
-  data.frame(x = d$wmtbf70, y = d$perc90tunf70, sim = "70 %"))
-ggplot(d, aes(x = x / (24 * 60 * 60), y = y / (60 * 60))) +
-facet_wrap(~ sim) +
-geom_path() +
-scale_x_continuous("\nRequired WMTBF in days",
-  breaks = seq(0, max(d$x, na.rm = TRUE) / (24 * 60 * 60), 7),
-  minor = seq(0, max(d$x, na.rm = TRUE) / (24 * 60 * 60), 1)) +
-scale_y_continuous(paste("Time in hours until 10 % of relays\nor ",
-  "27.1 % of streams have failed\n", sep = ""),
-  breaks = seq(0, max(d$y, na.rm = TRUE) / (60 * 60), 24))
-ggsave(filename = "mtbf-sim.pdf", width = 8, height = 5, dpi = 100)
-
-## Commented out, because this graph is meaningless in b/w.  The graph
-## above contains the same data, but can be printed in b/w.
-#data <- read.csv("mtbf-sim.csv", stringsAsFactors = FALSE)
-#d <- data[data$time >= '2010' & data$time < '2011', ]
-#d <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)), mean)
-#d <- rbind(
-#  data.frame(x = d$wmtbf70, y = d$perc90tunf70, sim = "70 %"),
-#  data.frame(x = d$wmtbf60, y = d$perc90tunf60, sim = "60 %"),
-#  data.frame(x = d$wmtbf50, y = d$perc90tunf50, sim = "50 % (default)"),
-#  data.frame(x = d$wmtbf40, y = d$perc90tunf40, sim = "40 %"),
-#  data.frame(x = d$wmtbf30, y = d$perc90tunf30, sim = "30 %"))
-#ggplot(d, aes(x = x / (24 * 60 * 60), y = y / (60 * 60),
-#  colour = sim)) +
-#geom_path() +
-#scale_x_continuous("\nRequired WMTBF in days",
-#  breaks = seq(0, max(d$x, na.rm = TRUE) / (24 * 60 * 60), 7),
-#  minor = seq(0, max(d$x, na.rm = TRUE) / (24 * 60 * 60), 1)) +
-#scale_y_continuous(paste("Time until    \n10 % of relays or    \n",
-#  "27.1 % of streams    \nhave failed    \nin hours    ", sep = ""),
-#  breaks = seq(0, max(d$y, na.rm = TRUE) / (60 * 60), 24)) +
-#scale_colour_hue("Fraction of relays\nby highest WMTBF",
-#  breaks = c("30 %", "40 %", "50 % (default)", "60 %", "70 %")) +
-#opts(axis.title.x = theme_text(size = 12 * 0.8, face = "bold",
-#  hjust = 0.5),
-#  axis.title.y = theme_text(size = 12 * 0.8, face = "bold", vjust = 0.5,
-#  hjust = 1))
-#ggsave(filename = "mtbf-sim.pdf", width = 8, height = 5, dpi = 100)
-
-## Commented out, because focusing on the development over time is the
-## wrong thing here.
-#simulations <- paste("mtunf", c(20, 30, 40, 50, 60, 70, 80),
-#  sep = "")
-#d <- data[data$time >= '2010' & data$time < '2011',
-#  c("time", simulations)]
-#d <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)), mean)
-#d <- melt(d, id.vars = 1)
-#ggplot(d, aes(x = date, y = value / (24 * 60 * 60), colour = variable)) +
-#geom_line() +
-#scale_x_date("", major = "3 months", minor = "1 month",
-#  format = "%b %Y") +
-#scale_y_continuous(paste("Mean time    \nuntil next    \nfailure    \n",
-#  "in days    \n", sep = ""),
-#  limits = c(0, max(d$value, na.rm = TRUE) / (24 * 60 * 60))) +
-#scale_colour_hue(paste("Percentile\nhighest\nweighted mean\n",
-#  "time between\nfailures", sep = ""), breaks = simulations,
-#  labels = paste(substr(simulations, 6, 9),
-#  ifelse(simulations == "mtunf50", "(default)", ""))) +
-#opts(axis.title.y = theme_text(size = 12 * 0.8, face = "bold",
-#  vjust = 0.5, hjust = 1))
-#ggsave(filename = "mtbf-sim1.pdf", width = 8, height = 5, dpi = 100)
-
diff --git a/task-2911/report.bib b/task-2911/report.bib
new file mode 100644
index 0000000..63ab260
--- /dev/null
+++ b/task-2911/report.bib
@@ -0,0 +1,31 @@
+@misc{proposal146,
+  author = {Nick Mathewson},
+  title = {Add new flag to reflect long-term stability},
+  note = {\url{https://gitweb.torproject.org/torspec.git/blob_plain/HEAD:/proposals/146-long-term-stability.txt}},
+}
+
+@misc{dirspec,
+  author = {Roger Dingledine and Nick Mathewson},
+  title = {Tor directory protocol, version 3},
+  note = {\url{https://gitweb.torproject.org/tor.git/blob_plain/HEAD:/doc/spec/dir-spec.txt}},
+}
+
+@phdthesis{loesing2009thesis,
+  title = {Privacy-enhancing Technologies for Private Services},
+  author = {Karsten Loesing},
+  school = {University of Bamberg},
+  year = {2009},
+  month = {May},
+  note = {\url{http://www.opus-bayern.de/uni-bamberg/volltexte/2009/183/pdf/loesingopusneu.pdf}},
+}
+
+@techreport{loesing2011overview,
+  title = {Overview of Statistical Data in the {Tor} Network},
+  author = {Karsten Loesing},
+  institution = {The Tor Project},
+  year = {2011},
+  month = {March},
+  type = {Technical Report},
+  note = {\url{https://metrics.torproject.org/papers/data-2011-03-14.pdf}},
+}
+
diff --git a/task-2911/report.tex b/task-2911/report.tex
index 4dc6ab9..96fe38f 100644
--- a/task-2911/report.tex
+++ b/task-2911/report.tex
@@ -4,74 +4,291 @@
 \usepackage{graphics}
 \usepackage{color}
 \begin{document}
-\title{An Analysis of Tor Relay Stability\\(DRAFT)}
+\title{An Analysis of Tor Relay Stability}
 \author{Karsten Loesing\\{\tt karsten@xxxxxxxxxxxxxx}}
+\date{June 30, 2011}
 
 \maketitle
 
 \section{Introduction}
 
-The Tor network consists of 2,200 relays and 600 bridges run by
+The Tor network consists of around 2,000 relays and 500 bridges run by
 volunteers, some of which are on dedicated servers and some on laptops or
 mobile devices.
-% TODO Look up more recent relay and bridge numbers.  -KL
 Obviously, we can expect the relays run on dedicated servers to be more
 ``stable'' than those on mobile phones.
 But it is difficult to draw a line between stable and unstable relays.
 In most cases it depends on the context which relays count as stable:
 
-\begin{itemize}
+\begin{enumerate}
 \item A stable relay that is supposed to be part of a circuit for a
 \emph{long-running stream} should not go offline during the next day.
+If only 1 of typically 3 relays in a circuit goes away, the stream
+collapses and must be rebuilt.
 \item A stable relay that clients pick as \emph{entry guard} doesn't have
 to be running continuously, but should be online most of the time in the
 upcoming weeks.
+In addition to being stable, an entry guard should have reasonable
+bandwidth capacity in order not to slow down clients.
 \item A stable relay that acts as \emph{hidden-service directory} should
 be part of a relay subset that mostly overlaps with the subsets 1, 2, or
 even 3 hours in the future.
 That means that the relays in this set should be stable, but also that not
-too many new relays should join the set of stable relays at once.
-\item A stable relay that clients use in a \emph{fallback consensus} that
-is already a few days or even weeks old should still be available on the
-same IP address and port.\footnote{See also proposal 146.}
-Such a relay doesn't necessarily have to run without interruption, though.
-% TODO Correctly cite proposal 146 here.  -KL
+too many new relays should join the set at once.
+\item A stable relay that clients use in a \emph{fallback consensus}
+\cite{proposal146} that is already a few days or even weeks old should
+still be available on the same IP address and port, but
+doesn't necessarily have to run without interruption.
 \item A stable \emph{bridge relay} should be running on the same IP
-address a few days after a client learns about the bridge, but again,
-doesn't have to run continuously.
-\end{itemize}
-
-All these stability notions have in common that some relays or bridges are
-better suited for the described contexts than others.
-In this analysis we will look at various relay stability metrics to find
-the best suited set of relays for each context.
-The idea of this report is to use the results to optimize how the
-directory authorities assign relay flags that clients use to make path
-select decisions.
-
-For every context, we try to simulate what requirements based on past
-observations would have resulted in what relay stabilities in the near
-future.
-Generally, we'd expect that stricter requirements lead to higher
-stability.
-But every prediction contains a certain amount of randomness, so that we
-cannot tighten the requirements arbitrarily.
-Further, we want to ensure that the subset of relays identified as stable
-does not become too small.
-The reason is that there should be some diversity, so that not a few
-operators can aim at running most relays used in a given context.
-In some cases, the stable relays also need to provide sufficient bandwidth
-to the network in order not to become a performance bottleneck.
-We are going into more details about the requirements when looking into
-the separate analyses in the sections below.
-
-The analysis data and tools are available on the Tor metrics website at
-\url{https://metrics.torproject.org/}.\footnote{Or rather, will be made
-available.}
+address a few days after a client learns about the bridge and should be
+available most of the time in the upcoming weeks.
+\end{enumerate}
+
+The Tor directory authorities measure relay stability for the first three
+contexts listed above and assign the Stable, Guard, and HSDir flags that
+clients use to select relays.
+Figure~\ref{fig:relayflags} shows the number of relays with relay flags
+assigned between July and December 2010 which will be our analysis
+interval.
+
+\begin{figure}[t]
+\includegraphics[width=\textwidth]{relayflags.pdf}
+\caption{Number of relays with relay flags Running, Stable, and Guard
+assigned by the directory authorities between July and December 2010}
+\label{fig:relayflags}
+\end{figure}
+
+In this analysis, we use a simulator to resemble how the directory
+authorities assign the Stable and Guard relay flags.
+This simulator uses archived directory data to decide when a relay was
+running and when it failed or was restarted.
+We further use this simulator to calculate future stability metrics that
+help us evaluate the quality of a given relay flag assignment.
+We modify the input parameters to Tor's stability metrics to see whether
+we can improve how the directory authorities should assign relay flags.
+The analysis data and simulator used in this report are available on the
+Tor metrics website at \url{https://metrics.torproject.org/}.
+
+\section{Requirements for Stable and Guard flags}
+
+The requirements for assigning the Stable and the Guard flags are defined
+in the Tor directory protocol specification \cite{dirspec}.
+These definitions are revised here to better reflect what is implemented
+in the code:
+
+\begin{quote}
+``A router is `Stable' if it is active, and either its Weighted MTBF is at
+least the median for known active routers or its Weighted MTBF corresponds
+to at least [5] days.
+[...]
+A router is a possible `Guard' [if it is `familiar',] if its
+Weighted Fractional
+Uptime is at least the median for `familiar' active routers [or its
+Weighted Fractional Uptime is at least 98~\%], and if
+its [advertised] bandwidth is at least median [of all active routers] or
+at least 250KB/s.
+[...]
+A node is `familiar' if 1/8 of all active nodes have appeared more
+recently than it, or it has been around for [a weighted time of 8 days].''
+\end{quote}
+
+These definitions entail four requirements for assigning relay flags, one
+for the Stable flag and three for the Guard flag:
+\begin{description}
+\item[Weighted Mean Time Between Failure (WMTBF):]
+The WMTBF metric is used to track the length of continuous uptime sessions
+over time.
+The (unweighted) MTBF part measures the average uptime between a relay
+showing up in the network and either leaving or failing.
+In the weighted form of this metric, which is used here, past sessions
+are discounted by factor $0.95$ every $12$~hours.
+Current uptime sessions are not discounted, so that a WMTBF value of
+5~days can be reached after 5~days at the earliest.
+\item[Weighted Fractional Uptime (WFU):] The WFU metric measures the
+fraction of uptime of a relay in the past.
+Similar to WMTBF, WFU values are discounted by factor $0.95$ every
+$12$~hours, but in this case including the current uptime session.
+\item[Weighted Time:] The Weighted Time metric is used to calculate a
+relay's WFU and to decide whether a relay is around long enough to be
+considered `familiar.'
+The Weighted Time is discounted similar to WMTBF and WFU, so that a
+Weighted Time of 8 days can be reached after around 16 days at the
+earliest.
+\item[Advertised Bandwidth:] The Advertised Bandwidth of a relay is the
+minimum of its configured average bandwidth rate and the maximum observed
+bandwidth over any ten second period in the past day.
+It should be noted that the advertised bandwidth is self-reported by
+relays and has been misused in the past to attract more traffic than a
+relay should see.
+In theory, using the advertised bandwidth value has become discouraged
+for anything critical.
+\end{description}
+
+All four requirements have in common that they consist of a dynamic part
+that relies on the current network situation (e.g., median of all WMTBF
+values for the Stable flag) and a static part that is independent
+of other relays (e.g., WMTBF value of 5 days or higher).
+The dynamic part ensures that a certain fraction of relays get the
+Stable and Guard flags assigned even in a rather unstable network.
+The static part ensures that rather stable (or fast) relays are not denied
+the flags even when most of the other relays in the network are highly
+stable (or very fast).
+
+\section{Obtaining data for relay stability analysis}
+
+The internal data that the directory authorities use to decide whether a
+relay should get the Stable or Guard flag assigned is not publicly
+available.
+Instead, we rely on the network status consensuses and server descriptors
+that are archived and publicly available since October
+2007.
+The network status consensuses tell us which relays were available with a
+granularity of 1~hour, and the server descriptors help us figure out
+whether a relay was restarted in the 1~hour before being listed in a
+network status.
+We further learn whether a relay got the Stable and/or Guard flag assigned
+and what bandwidth it advertised and actually used for relaying data at a
+given time.
+In detail, we obtain the following data for this analysis:
+
+\begin{description}
+\item[Valid-after time:] The time when a network status consensus was
+published.
+\item[Fingerprint:] A relay's unique public key fingerprint.
+\item[Restarted:] Was a relay restarted within the last hour before the
+valid-after time based on its self-reported uptime?
+\item[Stable:] Did the directory authorities assign the Stable flag to a
+relay?
+\item[Guard:] Did the directory authorities assign the Guard flag to a
+relay?
+\item[Advertised bandwidth:] A relay's self-reported advertised bandwidth.
+\item[Bandwidth history:] How many bytes did the relay write on a given
+day?
+\end{description}
+
+The choice for using these archived data instead of, e.g., trying to
+obtain the internal relay stability data that the directory authorities
+use has a few advantages and disadvantes.
+The main disadvantage is that our data has a granularity of one hour and
+can only detect a single failure or restart within this hour.
+It's unclear which effect this lack of detail has on less stable relays.
+
+But there are a few advantages of using the archived network data:
+We have a few years of data available that we can use for the analysis.
+If we started obtaining original stability data from the directory
+authorities, we'd have to wait a few months before conducting this
+analysis.
+Also, it's still unclear whether the Tor code that assigns relay flags is
+correct, so it makes sense to use a different data basis.
+Finally, we need the directory archives anyway to evaluate how stable or
+fast a relay turned out to be after being selected.
+
+\section{Simulating current Stable/Guard assignments}
+
+The first step before varying the requirements for assigning relay flags
+is to run a simulation with the current default requirements.
+Only if we manage to come up with similar results as the directory
+authorities, we can hope that our suggested parameter changes will have
+similar effects in the real Tor network.
+
+Figure~\ref{fig:default-reqs} shows the fractions of relays that got the
+Stable and Guard flags assigned by the directory
+authorities (``observed'' lines) and in our simulation with default
+requirements (``simulated'' lines).
+Ideally, the observed lines should match the simulated lines.
+The two Guard lines overlap for about half of the observed time interval
+and the difference is always below 5~\% of running relays, which seems to
+be acceptable.
+The fraction of Guard relays in the simulation is rather stable at 28~\%
+whereas the observed fraction goes up to 33~\% for three months.
+The observed Stable line seems to be systematically 5 to 10~\% higher than
+the simulated line.
+
+\begin{figure}[t]
+\includegraphics[width=\textwidth]{default-reqs.pdf}
+\caption{Fraction of relays that got the Stable and Guard flag assigned by
+the directory authorities and in our simulation with default requirements}
+\label{fig:default-reqs}
+\end{figure}
+
+We first investigate the differences by looking in how far the sets of
+observed and simulated sets overlap.
+After all, it could be that our simulated 30~\% of Guard relays are a
+completely different subset of relays than the observed 30~\%.
+Figure~\ref{fig:diff-sim-obs} shows the absolute number of relays in the
+intersection and the sets of only observed or only simulated relays.
+This graph shows that the majority of relay assignments overlap for both
+flags.
+The sets of Stable relays that are only found in the simulation is rather
+small with around 50 relays.
+Compared to that, there are between 100 and 200 relays on average that got
+the Stable flag assigned by the directory authorities but which did not
+qualify for that flag in the simulation.
+The situation of assigned Guard flags is somewhat different.
+In July and August 2010, there are about 50 relays in the only-observed
+and only-simulated sets.
+From September 2010 on, there are hardly any relays in the only-simulated
+set, but the number of relays in the only-observed set goes up to nearly
+100.
+
+\begin{figure}[t]
+\includegraphics[width=\textwidth]{diff-sim-obs.pdf}
+\caption{Number of relays that got the Stable and Guard flag assigned by
+both the directory authorities and by our simulator or by just one of
+them}
+\label{fig:diff-sim-obs}
+\end{figure}
+
+Before accepting these deviations as a given, we take a quick look at the
+actual requirements to learn whether the dynamic or the static part of
+requirements was more relevant.
+Figure~\ref{fig:requirements} shows the dynamic parts as daily means (dark
+gray lines) and daily min-max ranges (light gray ribbons) as well as the
+static parts (dashed lines).
+The dashed lines act as a limit here: if a dynamic requirement based on
+the other relays exceeds the dashed line, the requirement is cut to the
+fixed value and the fraction of relays meeting the requirement increases.
+Whenever the dynamic requirement remains below the dashed line, the
+fraction of relays that get flags assigned should remain more or less the
+same.
+For example, the top left graph indicates that it's quite possible that
+more than 50~\% of relays got the Stable flag assigned from July
+to November, but that this fraction should have dropped to 50~\% in
+December 2010.
+However, this was not the case.
+The graph also indicates that the 250 KiB/s limit was never reached, so
+that relays were always selected based on the comparison to the advertised
+bandwidth of the other relays.
+
+\begin{figure}
+\includegraphics[width=\textwidth]{requirements.pdf}
+\caption{Simulated requirements for assigning the Stable and Guard flags}
+\label{fig:requirements}
+\end{figure}
+
+Possible reasons for deviation of simulation and actual assignments by the
+directory authorities are:
+
+\begin{enumerate}
+\item The simulator code may contain bugs.
+\item Our data with a granularity of 1~hour lack the details for detecting
+restarts and failures and thereby make it hard to observe uptime sessions
+correctly.
+\item The directory authorities have slightly different views on the
+network which then get mixed in the voting process.
+\item The implementation of relay history on the directory authorities may
+contain bugs.
+\end{enumerate}
+
+For the remainder of this analysis we will assume that the simulated
+number of Stable relays is 5 to 10~\% lower than in reality and that the
+simulation of Guard is more or less accurate.
 
 \section{Choosing relays for long-lived streams}
 \label{sec:mtbf-sim}
 
+After learning that our simulation of relay stability is at least not
+totally off, we want to take a closer look at the Stable flag.
 Whenever clients request Tor to open a long-lived stream, Tor should try
 to pick only those relays for the circuit that are not likely to disappear
 shortly after.
@@ -80,65 +297,81 @@ new circuit needs to be built.
 Depending on how well the application handles connection failures this may
 impact usability significantly.
 
-In order to declare some relays as more useful for long-lived streams, the
-directory authorities track uptime sessions of all relays over time.
-Based on this history, they calculate the \emph{weighted mean time between
-failure (WMTBF)} for each relay.
-The MTBF part simply measures the average uptime between a relay showing
-up in the Tor network and either leaving or failing.
-In the weighted form of this metric, which is used here, older sessions
-are weighted to count less.
-The directory authorities assign the \texttt{Stable} flag to the 50~\% of
-relays with the highest WMTBF.
-
-In this simulation we want to find out how useful the WMTBF metric is for
-predicting future stability and how stability would be affected when
-declaring more or less than 50~\% of the relays as stable.
-The metric we chose for evaluating how stable a relay is is the \emph{time
+The WMTBF metric as the requirement for assigning the Stable flag has
+already been discussed above.
+Now we want to find out how useful WMTBF is to predict which relays are
+not likely to leave or crash in the near future.
+In order to evaluate future relay stability we measure the \emph{time
 until next failure}.
-When running a simulation we determine the time until 10~\% of the
-``stable'' relays have failed.
+There is no need to measure more than the current uptime session length,
+and hence there is no need to weight measurements.
+For a given set of relays with the Stable flag we determine the 10th
+percentile of time until next failure.
+This is the time when 10~\% of relays have failed and 90~\% are still
+running.
 Under the (grossly simplified) assumption that relays are chosen
 uniformly, $1 - 0.9^3 = 27.1~\%$ of streams using relays from this set
-would have failed up to this point.
+would have been interrupted up to this point.
+
+The default requirement is that a relay's WMTBF is at least the median for
+known relays or at least 5 days.
+We simulate times until next failure for the 30th, 40th, 50th, 60th, and
+70th percentiles of WMTBF values in the network while leaving the fixed
+5-day constant unchanged.
+We also look up times until next failure for the set of Stable relays that
+got their flags from the directory authorities.
+Figure~\ref{fig:wmtbf-tunf-sim} shows the results.
+The artificial steps come from our limitation to measure time until next
+failure in more detail than in full hours.
+The further right a line is the higher is the time until next failure for
+the considered consensuses from July to December 2010.
 
 \begin{figure}[t]
-\includegraphics[width=\textwidth]{mtbf-sim.pdf}
-\caption{Impact of assigning the \texttt{Stable} flag to a given fraction
-of relays on the actual required WMTBF ($x$ axis) and on the time
-until 10~\% of relays or 27.1~\% of streams have failed ($y$ axis)}
-\label{fig:mtbf-sim}
+\includegraphics[width=\textwidth]{wmtbf-tunf-sim.pdf}
+\caption{Impact of changing the WMTBF percentile requirement for assigning
+the Stable flag on the expected time until next failure}
+\label{fig:wmtbf-tunf-sim}
 \end{figure}
 
-Figure~\ref{fig:mtbf-sim} shows the analysis results for assigning the
-\texttt{Stable} flag to fractions of relays between 30~\% and 70~\% in a
-path plot.
-This path plot shows the effect of choosing a different fraction of
-relays on the actual required WMTBF value on the $x$ axis and on the
-resulting time until 10~\% of relays have failed on the $y$ axis.
-Two data points adjacent in time are connected by a line, forming a path.
-
-The results indicate a somewhat linear relation between required WMTBF and
-time until failure, which is as expected.
-The time until 10~\% of relays have failed in the default case of having
-50~\% stable relays is somewhere between 12 and 48 hours.
-If the directory authorities assigned the \texttt{Stable} flag to 60~\% or
-even 70~\% of all relays, this time would go down to on average 24 or 12
-hours.
-Reducing the set to only 40~\% or 30\% of relays would increase the time
-until failure to 36 or even 48 hours on average.
-
-\subsubsection*{Next steps}
-
-{\it
-\begin{itemize}
-\item What's the desired stability goal here?
-\item What other requirements (bandwidth) should go into the simulation?
-\end{itemize}
-}
+The first finding that comes to mind is that the observed line is much
+farther left than the simulated 50th percentile line.
+In theory, both lines use the 50th percentile as requirement and should
+therefore have similar results.
+However, since the directory authorities assign the Stable flag to almost
+60~\% of all relays, it's likely that their overall stability is lower
+than the stability of the most stable 50~\% of all relays.
+It's unclear why observed results are even worse than those from the
+simulated 40th percentile.
+It might be that the directory authorities don't just assign the
+Stable flag to too many relays, but also to the wrong relays.
+
+Apart from that, the graph shows the large influence of different WMTBF
+percentiles on relay stability.
+If we lowered the requirement to the 30th WMTBF percentile, thus assigning
+the Stable flag to at least 70~\% of all relays, the 10th percentile of
+time until next failure would be reached after 6 to 24 hours in most
+cases.
+The 40th and 50th percentile requirements move this point to 12 to 36 and
+18 to 54 hours, respectively.
+If we changed the requirement to the 60th WMTBF percentile, this point
+would only be reached after 24 to 60 hours.
+The 70th percentile requirement is not even shown in the graph, because
+this requirement is so high that the fixed 5-day constant applies in most
+cases here, making the dynamic percentile requirement meaningless.
+
+As an intermediate conclusion, we could improve relay stability a lot by
+raising the WMTBF percentile requirement a bit.
+A good next step might be to find out why the directory authorities assign
+the Stable flag to 55 to 60~\% of all relays and not 50~\%.
+Of course, a higher requirement would result in fewer relays with the
+Stable flag.
+But having 40~\% of relays with that flag should still provide for enough
+diversity.
 
 \section{Picking stable entry guards}
 
+The second flag that we're going to analyze in more detail is the
+Guard flag.
 Clients pick a set of entry guards as fixed entry points into the Tor
 network.
 Optimally, clients should be able to stick with their choice for a few
@@ -147,149 +380,155 @@ While it is not required for all their entry guards to be running all the
 time, at least a subset of them should be running, or the client needs to
 pick a new set.
 
-Tor's metric for deciding which relays are stable enough to be entry
-guards is \emph{weighted fractional uptime (WFU)}.
+As discussed above, Tor's primary metric for deciding which relays are
+stable enough to be entry guards is \emph{weighted fractional uptime
+(WFU)}.
 WFU measures the fraction of uptime of a relay in the past with older
 observations weighted to count less.
 The assumption is that a relay that was available most of the time in the
 past will also be available most of the time in the future.
 
-In a first analysis we simulate the effect of varying the requirements for
-becoming an entry guard on the average relay stability in the future.
+In a first analysis we simulate the effect of varying the percentile
+requirements for becoming an entry guard on the relay stability in the
+future.
 We measure future stability by using the same WFU metric, but for uptime
 in the future.
 We similarly weight observations farther in the future less than
 observations in the near future.
-We then simulate different pre-defined required WFUs between $90~\%$ and
-$99.9~\%$ and calculate what the mean future WFUs would be.
+The rationale is that a high fractional uptime in the next few days is
+slightly more important than in a month.
+For a given set of relays we measure the 10th percentile of WFU in the
+future as an indication of relay stability.
+The result is that 10~\% of relays will have a lower uptime than this
+value and 90~\% of relays will have a higher uptime.
+
+Figure~\ref{fig:wfu-wfu-sim} shows the 10th percentile of WFU for simulations
+using the 30th, 40th, 50th, and 60th WFU percentile as requirement for
+assigning the Guard flag.
+This graph also shows the future WFU of relays that got their Guard flag
+assigned by the directory authorities.
+Here, the simulation using the default 50th percentile is much closer to
+the flags assigned by the directory authorities than in the case of the
+Stable flag.
+Unsurprisingly, the 30th percentile requirement has the worst stability,
+because it includes up to 70\% of all relays, minus the non-familiar ones
+and those that don't meet the bandwidth requirement.
+Relay stability increases for raising the WFU requirement to the 40th,
+50th, and 60th percentile, but in most cases the outcome is an uptime of
+more than 85~\% or even more than 90~\%.
+Under the assumption that a client needs only one or two working guards at
+a time and can pick a new set of guards easily, these stabilities seem to
+be sufficiently high.
+
+\begin{figure}[t]
+\includegraphics[width=\textwidth]{wfu-wfu-sim.pdf}
+\caption{Impact of changing the WFU percentile requirement for assigning
+the Guard flag on WFU in the future}
+\label{fig:wfu-wfu-sim}
+\end{figure}
+
+\section{Selecting high-bandwidth entry guards}
+
+A second question regarding Guard flag assignments is whether we can raise
+the advertised bandwidth requirement to end up with faster entry guards.
+The fixed set of entry guards determines to a certain extent what overall
+performance the client can expect.
+If a client is unlucky and picks only slow guards, the overall Tor
+performance can be bad, in particular because clients don't drop slow
+guards, but only failing ones.
+
+We introduce a new metric to evaluate how much bandwidth capacity a relay
+will provide in the future: the \emph{weighted bandwidth}.
+This metric is not based on a relay's advertised bandwidth, but on the
+actual written bytes as reported by the relay.
+Again, we're more interested in the bandwidth in the near future, so we
+discount observations 12 hours further in the future by factor 0.95.
+
+Figure~\ref{fig:advbw-wb-sim} shows the effect of changing the advertised
+bandwidth requirement from the 50th percentile to the 40th, 60th, or even
+70th percentile on the 10th percentile of weighted bandwidth.
+Similar to the graphs above, 10~\% of relays have a lower weighted
+bandwidth and 90~\% have a higher weighted bandwidth.
+Here, the observed 50th percentile is almost identical to the simulated
+50th percentile.
+The graph shows that raising the requirement to the 60th percentile of
+advertised bandwidth would shift this percentile line by roughly 10 KiB/s
+to the right.
+The 70th percentile would ensure that 90~\% of the selected
+Guard relays have a weighted bandwidth of at least between 60 and
+170~KiB/s depending on the current network situation.
 
 \begin{figure}[t]
-\includegraphics[width=\textwidth]{wfu-sim.pdf}
-\caption{Impact of different required WFU on the mean empirical future WFU
-and fraction of potential entry guards}
-\label{fig:wfu-sim}
+\includegraphics[width=\textwidth]{advbw-wb-sim.pdf}
+\caption{Impact of changing the advertised bandwidth percentile for
+assigning the Guard flag on bandwidth capacity in the future}
+\label{fig:advbw-wb-sim}
 \end{figure}
 
-Figure~\ref{fig:wfu-sim} shows the analysis results in a path plot similar
-to the one in Section~\ref{sec:mtbf-sim}.
-This path plot shows the effect of varying the WFU requirement, displayed
-as different line colors, on the fraction of relays meeting this
-requirement on the $x$ axis and on the WFU in the future on the $y$ axis.
-Two data points adjacent in time are connected by a line, forming a path.
-
-In this graph we can see that the majority of data points for the default
-required WFU of 98~\% falls in a future WFU range of 94~\% to 96\% with
-the smallest WFU being no less than 89~\%.
-In most cases, the fraction of relays meeting the default WFU requirement
-is between 40~\% and 50~\%.
-
-If the WFU requirement is relaxed to 95~\% or even 90~\%, the WFU in the
-future decreases slightly towards around 94~\% to 95~\% for most cases.
-At first sight it may seem surprising that a past WFU of 90~\% leads to
-a future WFU of 94~\%, but it makes sense, because the past WFU is a
-required minimum whereas the future WFU is a mean value of all relays
-meeting the requirement.
-Another effect of relaxing the required WFU is that the fraction of relays
-meeting the requirement increases from 50~\% to almost 66~\%.
-
-Interestingly, when tightening the requirement to a WFU value of 99~\% or
-even 99.9~\%, the future WFU does not increase significantly, if at all.
-To the contrary, the future WFU of relays meeting the 99.9~\% requirement
-drops to a range of 91~\% to 94~\% for quite a while.
-A likely explanation for this effect is that the fraction of relays
-meeting these high requirements is only 15~\%.
-While these 15~\% of relays may have had a very high uptime in the past,
-failure of only a few of these relays ruin the WFU metric in the future.
-
-A cautious conclusion of this analysis could be that, if the goal is to
-increase the number of \texttt{Guard} relays, reducing the required WFU to
-95~\% or even 90~\% wouldn't impact relay stability by too much.
-Conversely, increasing the required WFU beyond the current value of 98~\%
-doesn't make much sense and might even negatively affect relay stability.
-
-\subsubsection*{Next steps}
-
-{\it
-\begin{itemize}
-\item Tor penalizes relays that change their IP address or port by ending
-the running uptime session and starting a new uptime session.  This
-reduces both WFU and MTBF.  The simulation doesn't take this into account
-yet.  Should it?
-\item Add the bandwidth requirements to the simulation.  The current
-simulation doesn't make any assumptions about relay bandwidth when
-assigning \texttt{Guard} flags.  Which bandwidth value would we use here?
-\item Add another graph similar to Figure~\ref{fig:wfu-sim}, but replace
-the ``Fraction of relays meeting WFU requirement'' on the \emph{x} axis
-with the ``Fraction of \emph{bandwidth} of relays meeting WFU
-requirement.''
-After all, we're interested in having enough bandwidth capacity for the
-entry guard position, not (only) in having enough distinct relays.
-Which bandwidth value would we use here?
-\item Roger suggests to come up with a better metric than ``WFU since we
-first saw a relay.''
-He says ``it seems wrong to make something that we saw earlier have a
-worse WFU than something we saw later, even if they've had identical
-uptimes in that second period.''
-What would be good candidate metrics?
-\item Ponder finding another metric than WFU for future observations.  In
-particular, with the current WFU parameters of $0.95$ and $12$ hours, the
-WFU reaches up to 4 months into the future.  It seems useful to weight
-uptime in the near future higher than uptime in the farther future, but
-maybe we should use parameters to limit the interval to $1$ or $2$ months.
-\end{itemize}
-}
-
-\section{Forming stable hidden-service directory sets}
-
-{\it
-In this section we should evaluate the current requirements for getting
-the \texttt{HSDir} flag.
-Also, what happened to the number of relays with the \texttt{HSDir} flag
-in August 2010?
-}
-
-\section{Selecting stable relays for a fallback consensus}
-
-{\it
-Is the concept of a fallback consensus still worth considering?
-If so, we should analyze how to identify those relays that are most likely
-to be around and reachable under the same IP address.
-The result of this analysis could lead to adding a new \texttt{Longterm}
-(or \texttt{Fallback}?) flag as suggested in proposal 146.
-% TODO Correctly cite proposal 146 here.  -KL
-Maybe the analysis of bridges on stable IP addresses should come first,
-though.
-}
-
-\section{Distributing bridges with stable IP addresses}
-
-{\it
-A possible outcome of this analysis could be to add a new flag
-\texttt{StableAddress} (similar to the \texttt{Longterm} flag from the
-previous section) to bridge network statuses and to change BridgeDB to
-include at least one bridge with this flag in its results.
-One of the challenges of this analysis will be to connect sanitized bridge
-descriptors from two months with each other.
-The sanitized IP addresses of two bridges in two months do not match,
-because we're using a new secret key as input to the hash function every
-month.
-We might be able to correlate the descriptors of running bridges via their
-descriptor publication times or bridge statistics.
-But if that fails, we'll have to run the analysis with only 1 month of
-data at a time.
-}
+Of course, raising the advertised bandwidth requirement for becoming a
+guard node results in having fewer possible guard nodes.
+Figure~\ref{fig:advbw-frac-relays-sim} shows the effect of raising the
+advertised bandwidth requirement from the 50th to the 60th or 70th
+percentile.
+The 60th percentile requirement would reduce the fraction of relays with
+the Guard flag from 28~\% to around 25~\%, and the 70th percentile even to
+a value below 20~\%.
+There's a clear trade-off here between raising the bandwidth capacity of
+entry guards and having fewer guards to distribute one third of the
+network load to.
+Having fewer than 20~\% of all relays being possible Guard nodes is
+probably not enough and will generate new performance problems.
+
+\begin{figure}[t]
+\includegraphics[width=\textwidth]{advbw-frac-relays-sim.pdf}
+\caption{Influence of changing the advertised bandwidth percentile on the
+fraction of relays getting the Guard flag assigned}
+\label{fig:advbw-frac-relays-sim}
+\end{figure}
 
 \section{Discussion and future work}
 
-The approach taken in this analysis was to select relays that are most
-stable in a given context based on their history.
-A different angle to obtain higher relay stability might be to identify
-what properties of a relay have a positive or negative impact on its
-stability.
-For example, relays running a given operating system or given Tor software
-version might have a higher stability than others.
-Possible consequences could be to facilitate setting up relays on a given
-operating system or to improve the upgrade process of the Tor software.
+In this report we used a simulator to evaluate Tor's relay stability
+metrics for assigning Stable and Guard flags to relays.
+We introduced three metrics to evaluate how stable or fast a set of relays
+is that got the Stable or Guard flag assigned.
+Our simulator uses the same algorithms to decide whether a relay is stable
+as the directory authorities and can be parameterized to analyze different
+requirements.
+We could further add new algorithms to the simulator and see what subsets
+of Stable and Guard relays that would produce.
+
+Using our simulator we found that the fraction of relays with the Stable
+flag in the current network is higher than it probably should be.
+We also learned that the WFU requirements for assigning the Guard flag are
+quite reasonable and lead to stable guards.
+But we might consider raising the advertised bandwidth requirement a bit
+to have higher-bandwidth guard nodes.
+Medium-term, we should get rid of a requirement that is based on the
+self-reported advertised bandwidth.
+
+Possible next steps are to review the Tor source code for assigning flags
+and compare the internal relay stability data from the directory
+authorities to simulated values.
+It would be interesting to know why the directory authorities assign the
+Stable flag so generously.
+Also, it would be interesting to compare the internal stability data from
+multiple directory authorities to see in how far they agree.
+Another possible next step might be to turn the four requirement
+percentiles (WMTBF, WFU, Weighted Time, and Advertises Bandwidth) into
+consensus parameters to be able to test parameter changes in the real
+network.
+
+We also left the analysis of relay stability for hidden-service
+directories, fallback consensuses, and bridge relays as future work.
+Possible starting points are an earlier analysis of hidden-service
+directory stability \cite{loesing2009thesis}, the Tor proposal
+describing fallback consensuses \cite{proposal146}, and a tech report
+explaining how bridge descriptors are sanitized to make them publicly
+available \cite{loesing2011overview}.
+
+\bibliography{report}
+\bibliographystyle{plain}
 
 \end{document}
 
diff --git a/task-2911/stability.R b/task-2911/stability.R
new file mode 100644
index 0000000..5b40d67
--- /dev/null
+++ b/task-2911/stability.R
@@ -0,0 +1,301 @@
+library(ggplot2)
+data <- read.csv("stability.csv", stringsAsFactors = FALSE)
+
+d <- data
+d <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)),
+  mean)
+d <- rbind(
+  data.frame(date = d$date, value = d$running,
+    variable = "Running"),
+  data.frame(date = d$date, value = d$stable,
+    variable = "Running + Stable"),
+  data.frame(date = d$date, value = d$guard,
+    variable = "Running + Guard"))
+ggplot(d, aes(x = as.Date(date), y = value, colour = variable)) +
+geom_line(size = 0.7) +
+scale_x_date("", major = "3 months", minor = "1 month",
+  format = "%b %Y") +
+scale_y_continuous("Number    \nof relays    ",
+  limits = c(0, max(d$value, na.rm = TRUE))) +
+scale_colour_manual(name = "Assigned relay flags\n",
+  values = c("Running" = "black", "Running + Stable" = "grey45",
+    "Running + HSDir" = "grey", "Running + Guard" = "grey70")) +
+opts(axis.title.y = theme_text(size = 12 * 0.8, face = "bold",
+  vjust = 0.5, hjust = 1))
+ggsave(filename = "relayflags.pdf", width = 8, height = 4, dpi = 100)
+
+d <- data
+d <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)),
+  mean)
+d <- rbind(
+  data.frame(x = d$date, y = d$stable / d$running,
+    variable = "Stable (observed)"),
+  data.frame(x = d$date, y = d$stable50 / d$running,
+    variable = "Stable (simulated)"),
+  data.frame(x = d$date, y = d$guard / d$running,
+    variable = "Guard (observed)"),
+  data.frame(x = d$date, y = d$guard50wfu50advbw / d$running,
+    variable = "Guard (simulated)"))
+d[d$x >= '2010-06-26' & d$x <= '2010-06-28', "y"] <- NA
+d[d$x >= '2010-01-25' & d$x <= '2010-01-26', "y"] <- NA
+ggplot(d, aes(x = x, y = y, colour = variable, linetype = variable)) +
+geom_line() +
+scale_y_continuous(name = "Fraction of   \nRunning relays   ",
+  formatter = "percent", limits = c(0, max(d$y, na.rm = TRUE))) +
+scale_x_date(name = "", major = "3 months", minor = "1 month",
+  format = "%b %Y") +
+scale_colour_manual(name = "Assigned relay flags\n",
+  values = c("Stable (observed)" = "grey50",
+  "Stable (simulated)" = "grey50",
+  "Guard (observed)" = "black",
+  "Guard (simulated)" = "black")) +
+scale_linetype_manual(name = "Assigned relay flags\n",
+  values = c(3, 1, 3, 1)) +
+opts(axis.title.x = theme_text(size = 12 * 0.8, face = "bold",
+  hjust = 0.5),
+  axis.title.y = theme_text(size = 12 * 0.8, face = "bold", vjust = 0.5,
+  hjust = 1))
+ggsave(filename = "default-reqs.pdf", width = 8, height = 4, dpi = 100)
+
+d <- data
+d <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)),
+  mean)
+d <- rbind(
+  data.frame(x = d$date, y = d$stableintersect,
+    variable = "simulated and observed", flag = "Stable"),
+  data.frame(x = d$date, y = d$stableobserved,
+    variable = "only observed", flag = "Stable"),
+  data.frame(x = d$date, y = d$stablesimulated,
+    variable = "only simulated", flag = "Stable"),
+  data.frame(x = d$date, y = d$guardintersect,
+    variable = "simulated and observed", flag = "Guard"),
+  data.frame(x = d$date, y = d$guardobserved,
+    variable = "only observed", flag = "Guard"),
+  data.frame(x = d$date, y = d$guardsimulated,
+    variable = "only simulated", flag = "Guard"),
+  data.frame(x = NA, y = 0, variable = "only simulated", flag = "Stable"),
+  data.frame(x = NA, y = 0, variable = "only simulated", flag = "Guard"))
+ggplot(d, aes(x = x, y = y, linetype = variable)) +
+geom_line() +
+facet_grid(flag ~ ., scale = "free_y") +
+scale_y_continuous(name = "Number   \nof relays   ") +
+scale_x_date(name = "", major = "3 months", minor = "1 month",
+  format = "%b %Y") +
+scale_linetype_manual(name = "", values = c(4, 3, 1)) +
+opts(axis.title.x = theme_text(size = 12 * 0.8, face = "bold",
+  hjust = 0.5),
+  axis.title.y = theme_text(size = 12 * 0.8, face = "bold", vjust = 0.5,
+  hjust = 1))
+ggsave(filename = "diff-sim-obs.pdf", width = 8, height = 4, dpi = 100)
+
+d <- data
+d_mean <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)),
+  mean)
+d_max <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)),
+  max)
+d_min <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)),
+  min)
+d <- rbind(
+  data.frame(x = d_mean$date, y = d_mean$minwmtbfa50 / (24 * 60 * 60),
+    ymin = d_min$minwmtbfa50 / (24 * 60 * 60),
+    ymax = d_max$minwmtbfa50 / (24 * 60 * 60),
+    var = "Median Weighted Mean Time Between Failure"),
+  data.frame(x = d_mean$date, y = d_mean$minwta / (24 * 60 * 60),
+    ymin = d_min$minwta / (24 * 60 * 60),
+    ymax = d_max$minwta / (24 * 60 * 60),
+    var = "12.5th percentile Weighted Time Known"),
+  data.frame(x = d_mean$date, y = d_mean$minwfua50wfu / 10000,
+    ymin = d_min$minwfua50wfu / 10000,
+    ymax = d_max$minwfua50wfu / 10000,
+    var = "Median Weighted Fractional Uptime"),
+  data.frame(x = d_mean$date, y = d_mean$minadvbwa50advbw / 1024,
+    ymin = d_min$minadvbwa50advbw / 1024,
+    ymax = d_max$minadvbwa50advbw / 1024,
+    var = "Median Advertised Bandwidth"))
+e <- data.frame(
+  yintercept = c(5, 8, 0.98, 250),
+  var = c("Median Weighted Mean Time Between Failure",
+    "12.5th percentile Weighted Time Known",
+    "Median Weighted Fractional Uptime",
+    "Median Advertised Bandwidth"))
+ggplot(d, aes(x = as.Date(x), y = y, ymin = ymin, ymax = ymax)) +
+geom_line(colour = "grey30") +
+geom_ribbon(alpha = 0.3) +
+geom_hline(data = e, aes(yintercept = yintercept), colour = "gray30",
+  linetype = 2) +
+facet_wrap(~ var, scales = "free_y") +
+scale_x_date(name = "", major = "3 months", minor = "1 month",
+  format = "%b %Y") +
+scale_y_continuous(name = "")
+ggsave(filename = "requirements.pdf", width = 8, height = 5, dpi = 100)
+
+d <- data
+d <- d[d$time < '2010-06-26 00:00:00' | d$time > '2010-06-28 23:00:00', ]
+d <- d[d$time < '2010-01-25 00:00:00' | d$time > '2010-01-26 23:00:00', ]
+d <- rbind(
+  data.frame(x = sort(d$perc10tunf30) / (60 * 60),
+      y = 1:length(d$perc10tunf30) / length(d$perc10tunf30),
+      sim = "30th (simulated)"),
+  data.frame(x = sort(d$perc10tunf) / (60 * 60),
+      y = 1:length(d$perc10tunf) / length(d$perc10tunf),
+      sim = "50th (observed)"),
+  data.frame(x = sort(d$perc10tunf40) / (60 * 60),
+      y = 1:length(d$perc10tunf40) / length(d$perc10tunf40),
+      sim = "40th (simulated)"),
+  data.frame(x = sort(d$perc10tunf50) / (60 * 60),
+      y = 1:length(d$perc10tunf50) / length(d$perc10tunf50),
+      sim = "50th (simulated)"),
+  data.frame(x = sort(d$perc10tunf60) / (60 * 60),
+      y = 1:length(d$perc10tunf60) / length(d$perc10tunf60),
+      sim = "60th (simulated)"))
+ggplot(d, aes(x = x, y = y, colour = sim, linetype = sim)) +
+geom_line() +
+scale_x_continuous(name = paste("\n10th percentile of time until next",
+  "failure in hours"),
+  breaks = seq(0, max(d$x, na.rm = TRUE), 24),
+  minor = seq(0, max(d$x, na.rm = TRUE), 6),
+  limits = c(0, max(d$x, na.rm = TRUE))) +
+scale_y_continuous(name = paste("Cumulative fraction  \nof",
+  "consensuses  \nfrom July to  \nDecember 2010  "),
+  formatter = "percent", limits = c(0, 1)) +
+scale_colour_manual(name = paste("WMTBF percentile\nfor",
+  "assigning\nStable flag\n"),
+    values = c("60th (simulated)" = "black",
+      "50th (simulated)" = "grey45", "50th (observed)" = "black",
+      "40th (simulated)" = "grey60", "30th (simulated)" = "grey80")) +
+scale_linetype_manual(name = paste("WMTBF percentile\nfor",
+  "assigning\nStable flag\n"),
+  values = c(1, 3, 1, 1, 1)) +
+opts(plot.title = theme_text(size = 14 * 0.8, face = "bold"),
+  axis.title.x = theme_text(size = 12 * 0.8, face = "bold",
+  hjust = 0.5),
+  axis.title.y = theme_text(size = 12 * 0.8, face = "bold", vjust = 0.5,
+  hjust = 1))
+ggsave(filename = "wmtbf-tunf-sim.pdf", width = 8, height = 4, dpi = 100)
+
+d <- data
+d <- d[d$time < '2010-06-26 00:00:00' | d$time > '2010-06-28 23:00:00', ]
+d <- d[d$time < '2010-01-25 00:00:00' | d$time > '2010-01-26 23:00:00', ]
+d <- rbind(
+  data.frame(x = sort(d$perc10wfu30wfu50advbw) / 10000,
+      y = 1:length(d$perc10wfu30wfu50advbw) /
+          length(d$perc10wfu30wfu50advbw),
+      sim = "30th (simulated)"),
+  data.frame(x = sort(d$perc10wfu40wfu50advbw) / 10000,
+      y = 1:length(d$perc10wfu40wfu50advbw) /
+          length(d$perc10wfu40wfu50advbw),
+      sim = "40th (simulated)"),
+  data.frame(x = sort(d$perc10wfu) / 10000,
+      y = 1:length(d$perc10wfu) / length(d$perc10wfu),
+      sim = "50th (observed)"),
+  data.frame(x = sort(d$perc10wfu50wfu50advbw) / 10000,
+      y = 1:length(d$perc10wfu50wfu50advbw) /
+          length(d$perc10wfu50wfu50advbw),
+      sim = "50th (simulated)"),
+  data.frame(x = sort(d$perc10wfu60wfu50advbw) / 10000,
+      y = 1:length(d$perc10wfu60wfu50advbw) /
+          length(d$perc10wfu60wfu50advbw),
+      sim = "60th (simulated)"))
+ggplot(d, aes(x = x, y = y, colour = sim, linetype = sim)) +
+geom_line() +
+scale_x_continuous(name = "\n10th percentile of WFU in the future",
+  formatter = "percent") +
+scale_y_continuous(name = paste("Cumulative fraction  \nof",
+  "consensuses  \nfrom July to  \nDecember 2010  "),
+  formatter = "percent", limits = c(0, 1)) +
+scale_colour_manual(name = paste("WFU percentile\nfor",
+  "assigning\nGuard flag\n"),
+    values = c("60th (simulated)" = "black",
+      "50th (simulated)" = "grey45", "50th (observed)" = "black",
+      "40th (simulated)" = "grey60", "30th (simulated)" = "grey80")) +
+scale_linetype_manual(name = paste("WFU percentile\nfor",
+  "assigning\nGuard flag\n"),
+  values = c(1, 1, 3, 1, 1)) +
+opts(plot.title = theme_text(size = 14 * 0.8, face = "bold"),
+  axis.title.x = theme_text(size = 12 * 0.8, face = "bold",
+  hjust = 0.5),
+  axis.title.y = theme_text(size = 12 * 0.8, face = "bold", vjust = 0.5,
+  hjust = 1))
+ggsave(filename = "wfu-wfu-sim.pdf", width = 8, height = 4, dpi = 100)
+
+d <- data
+d <- rbind(
+  data.frame(x = sort(d$perc10fwb50wfu40advbw),
+      y = 1:length(d$perc10fwb50wfu40advbw) /
+          length(d$perc10fwb50wfu40advbw),
+      sim = "40th (simulated)"),
+  data.frame(x = sort(d$perc10fwb50wfu50advbw),
+      y = 1:length(d$perc10fwb50wfu50advbw) /
+          length(d$perc10fwb50wfu50advbw),
+      sim = "50th (simulated)"),
+  data.frame(x = sort(d$perc10fwb),
+      y = 1:length(d$perc10fwb) /
+          length(d$perc10fwb),
+      sim = "50th (observed)"),
+  data.frame(x = sort(d$perc10fwb50wfu60advbw),
+      y = 1:length(d$perc10fwb50wfu60advbw) /
+          length(d$perc10fwb50wfu60advbw),
+      sim = "60th (simulated)"),
+  data.frame(x = sort(d$perc10fwb50wfu70advbw),
+      y = 1:length(d$perc10fwb50wfu70advbw) /
+          length(d$perc10fwb50wfu70advbw),
+      sim = "70th (simulated)"))
+ggplot(d, aes(x = x / 1024, y = y, linetype = sim, colour = sim)) +
+geom_line() +
+scale_x_continuous(name = paste("\n10th percentile of weighted bandwidth",
+  "in KiB/s in the future")) +
+scale_y_continuous(name = paste("Cumulative fraction  \nof",
+  "consensuses  \nfrom July to  \nDecember 2010  "),
+  formatter = "percent", limits = c(0, 1)) +
+scale_colour_manual(name = paste("Advertised\nbandwidth\npercentile\nfor",
+  "assigning\nGuard flag\n"),
+    values = c(
+      "40th (simulated)" = "grey80",
+      "50th (observed)" = "black",
+      "50th (simulated)" = "grey60",
+      "60th (simulated)" = "grey45",
+      "70th (simulated)" = "black")) +
+scale_linetype_manual(name = paste("Advertised\nbandwidth\n",
+  "percentile\nfor assigning\nGuard flag\n", sep = ""),
+  values = c(1, 1, 3, 1, 1)) +
+opts(axis.title.x = theme_text(size = 12 * 0.8, face = "bold",
+  hjust = 0.5),
+  axis.title.y = theme_text(size = 12 * 0.8, face = "bold", vjust = 0.5,
+  hjust = 1))
+ggsave(filename = "advbw-wb-sim.pdf", width = 8, height = 4, dpi = 100)
+
+d <- data
+d <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)),
+  mean)
+d <- rbind(
+  data.frame(x = d$date, y = d$guard / d$running,
+    variable = "50th (observed)"),
+  data.frame(x = d$date, y = d$guard50wfu50advbw / d$running,
+    variable = "50th (simulated)"),
+  data.frame(x = d$date, y = d$guard50wfu60advbw / d$running,
+    variable = "60th (simulated)"),
+  data.frame(x = d$date, y = d$guard50wfu70advbw / d$running,
+    variable = "70th (simulated)"))
+ggplot(d, aes(x = x, y = y, colour = variable, linetype = variable)) +
+geom_line() +
+scale_y_continuous(name = "Fraction of   \nRunning relays   ",
+  formatter = "percent", limits = c(0, max(d$y, na.rm = TRUE))) +
+scale_x_date(name = "", major = "3 months", minor = "1 month",
+  format = "%b %Y") +
+scale_colour_manual(name = paste("Advertised\nbandwidth\npercentile\nfor",
+  "assigning\nGuard flag\n"),
+    values = c(
+      "50th (observed)" = "black",
+      "50th (simulated)" = "grey60",
+      "60th (simulated)" = "grey45",
+      "70th (simulated)" = "black")) +
+scale_linetype_manual(name = paste("Advertised\nbandwidth\npercentile\nfor",
+  "assigning\nGuard flag\n"),
+  values = c(3, 1, 1, 1)) +
+opts(axis.title.x = theme_text(size = 12 * 0.8, face = "bold",
+  hjust = 0.5),
+  axis.title.y = theme_text(size = 12 * 0.8, face = "bold", vjust = 0.5,
+  hjust = 1))
+ggsave(filename = "advbw-frac-relays-sim.pdf", width = 8, height = 4,
+  dpi = 100)
+
diff --git a/task-2911/wfu-sim/SimulateWeightedFractionalUptime.java b/task-2911/wfu-sim/SimulateWeightedFractionalUptime.java
deleted file mode 100644
index d803057..0000000
--- a/task-2911/wfu-sim/SimulateWeightedFractionalUptime.java
+++ /dev/null
@@ -1,318 +0,0 @@
-/**
- * Simulate variation of weighted fractional uptime on Guard relays.  In
- * a first step, parse network status consensus in consensuses/ from last
- * to first, calculate future weighted fractional uptimes for all relays,
- * and write them to disk as one file per network status in
- * fwfu/$filename.  (Skip this step if there is already a fwfu/
- * directory.)  In a second step, parse the network statuse consensus
- * again, but this time from first to last, calculate past weighted
- * fractional uptimes for all relays, form relay subsets based on minimal
- * WFU, look up what the mean future WFU would be for a subset, and write
- * results to wfu-sim.csv to disk. */
-import java.io.*;
-import java.text.*;
-import java.util.*;
-public class SimulateWeightedFractionalUptime {
-  public static void main(String[] args) throws Exception {
-
-    /* Measure how long this execution takes. */
-    long started = System.currentTimeMillis();
-
-    /* Decide whether we need to do the reverse run, or if we can use
-     * previous results. */
-    if (!new File("fwfu").exists()) {
-
-      /* Scan existing consensus files and sort them in reverse order. */
-      SortedSet<File> allConsensuses =
-          new TreeSet<File>(Collections.reverseOrder());
-      Stack<File> files = new Stack<File>();
-      files.add(new File("consensuses"));
-      while (!files.isEmpty()) {
-        File file = files.pop();
-        if (file.isDirectory()) {
-          files.addAll(Arrays.asList(file.listFiles()));
-        } else {
-          if (file.getName().endsWith("-consensus")) {
-            allConsensuses.add(file);
-          }
-        }
-      }
-
-      /* For each relay as identified by its base-64 encoded fingerprint,
-       * track weighted uptime and total weighted time in a long[2]. */
-      SortedMap<String, long[]> knownRelays =
-          new TreeMap<String, long[]>();
-
-      /* Parse all consensuses in reverse order. */
-      SimpleDateFormat formatter = new SimpleDateFormat(
-          "yyyy-MM-dd-HH-mm-ss");
-      formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
-      long nextWeightingInterval = formatter.parse(allConsensuses.first().
-          getName().substring(0, "yyyy-MM-dd-HH-mm-ss".length())).
-          getTime() / (12L * 60L * 60L * 1000L);
-      for (File consensus : allConsensuses) {
-
-        /* Every 12 hours, weight both uptime and total time of all known
-         * relays with 0.95 (or 19/20 since these are long integers) and
-         * remove all with a weighted fractional uptime below 1/10000. */
-        long validAfter = formatter.parse(consensus.getName().substring(0,
-            "yyyy-MM-dd-HH-mm-ss".length())).getTime();
-        long weightingInterval = validAfter / (12L * 60L * 60L * 1000L);
-        while (weightingInterval < nextWeightingInterval) {
-          Set<String> relaysToRemove = new HashSet<String>();
-          for (Map.Entry<String, long[]> e : knownRelays.entrySet()) {
-            long[] w = e.getValue();
-            w[0] *= 19L;
-            w[0] /= 20L;
-            w[1] *= 19L;
-            w[1] /= 20L;
-            if (((10000L * w[0]) / w[1]) < 1L) {
-              relaysToRemove.add(e.getKey());
-            }
-          }
-          for (String fingerprint : relaysToRemove) {
-            knownRelays.remove(fingerprint);
-          }
-          nextWeightingInterval -= 1L;
-        }
-
-        /* Parse all fingerprints of Running relays from the consensus. */
-        Set<String> fingerprints = new HashSet<String>();
-        BufferedReader br = new BufferedReader(new FileReader(consensus));
-        String line, rLine = null;
-        boolean reachedEnd = false;
-        while ((line = br.readLine()) != null) {
-          if (line.startsWith("r ")) {
-            rLine = line;
-          } else if (line.startsWith("s ") && line.contains(" Running")) {
-            String[] parts = rLine.split(" ");
-            if (parts.length < 3) {
-              System.out.println("Illegal line '" + rLine + "' in "
-                  + consensus + ". Skipping consensus.");
-              continue;
-            } else {
-              String fingerprint = parts[2];
-              if (fingerprint.length() !=
-                  "AAAAAAAAAAAAAAAAAAAAAAAAAAA".length()) {
-                System.out.println("Illegal line '" + rLine + "' in "
-                    + consensus + ". Skipping consensus.");
-                continue;
-              }
-              fingerprints.add(fingerprint);
-            }
-          } else if (line.startsWith("directory-signature ")) {
-            reachedEnd = true;
-            break;
-          }
-        }
-        br.close();
-        if (!reachedEnd) {
-          System.out.println("Did not reach the consensus end of "
-              + consensus + ". Skipping consensus.");
-          continue;
-        }
-
-        /* Increment weighted uptime for all running relays by 3600
-         * seconds. */
-        /* TODO 3600 seconds is only correct if we're not missing a
-         * consensus.  We could be more precise here, but it will probably
-         * not affect results significantly, if at all.  The same applies
-         * to the 3600 seconds constants below. */
-        for (String fingerprint : fingerprints) {
-          if (!knownRelays.containsKey(fingerprint)) {
-            knownRelays.put(fingerprint, new long[] { 3600L, 0L });
-          } else {
-            knownRelays.get(fingerprint)[0] += 3600L;
-          }
-        }
-
-        /* Increment total weighted time for all relays by 3600 seconds. */
-        for (long[] w : knownRelays.values()) {
-          w[1] += 3600L;
-        }
-
-        /* Write future WFUs for all known relays to disk. */
-        File fwfuFile = new File("fwfu", consensus.getName());
-        fwfuFile.getParentFile().mkdirs();
-        BufferedWriter bw = new BufferedWriter(new FileWriter(fwfuFile));
-        for (Map.Entry<String, long[]> e : knownRelays.entrySet()) {
-          bw.write(e.getKey() + " "
-              + ((10000L * e.getValue()[0]) / e.getValue()[1]) + "\n");
-        }
-        bw.close();
-      }
-    }
-
-    /* Run the simulation for the following WFU/10000 values: */
-    long[] requiredWFUs = new long[] { 9000, 9100, 9200, 9300, 9400, 9500,
-        9600, 9700, 9750, 9800, 9850, 9900, 9950, 9975, 9990, 9999 };
-    BufferedWriter bw = new BufferedWriter(new FileWriter("wfu-sim.csv"));
-    bw.write("time");
-    for (long requiredWFU : requiredWFUs) {
-      bw.write(",wfu" + requiredWFU + ",perc85wfu" + requiredWFU
-          + ",perc90wfu" + requiredWFU + ",perc95wfu" + requiredWFU
-          + ",guards" + requiredWFU);
-    }
-    bw.write("\n");
-
-    /* Scan existing consensus files and sort them in forward order. */
-    SortedSet<File> allConsensuses = new TreeSet<File>();
-    Stack<File> files = new Stack<File>();
-    files.add(new File("consensuses"));
-    while (!files.isEmpty()) {
-      File file = files.pop();
-      if (file.isDirectory()) {
-        files.addAll(Arrays.asList(file.listFiles()));
-      } else {
-        if (file.getName().endsWith("-consensus")) {
-          allConsensuses.add(file);
-        }
-      }
-    }
-
-    /* For each relay as identified by its base-64 encoded fingerprint,
-     * track weighted uptime and total weighted time in a long[2]. */
-    SortedMap<String, long[]> knownRelays = new TreeMap<String, long[]>();
-
-    /* Parse all consensuses in forward order. */
-    SimpleDateFormat formatter = new SimpleDateFormat(
-        "yyyy-MM-dd-HH-mm-ss");
-    formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
-    SimpleDateFormat isoFormatter = new SimpleDateFormat(
-        "yyyy-MM-dd HH:mm:ss");
-    isoFormatter.setTimeZone(TimeZone.getTimeZone("UTC"));
-    long nextWeightingInterval = formatter.parse(allConsensuses.first().
-        getName().substring(0, "yyyy-MM-dd-HH-mm-ss".length())).getTime()
-        / (12L * 60L * 60L * 1000L);
-    for (File consensus : allConsensuses) {
-
-      /* Every 12 hours, weight both uptime and total time of all known
-       * relays with 0.95 (or 19/20 since these are long integers) and
-       * remove all with a weighted fractional uptime below 1/10000. */
-      long validAfter = formatter.parse(consensus.getName().substring(0,
-          "yyyy-MM-dd-HH-mm-ss".length())).getTime();
-      long weightingInterval = validAfter / (12L * 60L * 60L * 1000L);
-      while (weightingInterval > nextWeightingInterval) {
-        Set<String> relaysToRemove = new HashSet<String>();
-        for (Map.Entry<String, long[]> e : knownRelays.entrySet()) {
-          long[] w = e.getValue();
-          w[0] *= 19L;
-          w[0] /= 20L;
-          w[1] *= 19L;
-          w[1] /= 20L;
-          if (((10000L * w[0]) / w[1]) < 1L) {
-            relaysToRemove.add(e.getKey());
-          }
-        }
-        for (String fingerprint : relaysToRemove) {
-          knownRelays.remove(fingerprint);
-        }
-        nextWeightingInterval += 1L;
-      }
-
-      /* Parse all fingerprints of Running relays from the consensus. */
-      Set<String> fingerprints = new HashSet<String>();
-      BufferedReader br = new BufferedReader(new FileReader(consensus));
-      String line, rLine = null;
-      boolean reachedEnd = false;
-      while ((line = br.readLine()) != null) {
-        if (line.startsWith("r ")) {
-          rLine = line;
-        } else if (line.startsWith("s ") && line.contains(" Running")) {
-          String[] parts = rLine.split(" ");
-          if (parts.length < 3) {
-            System.out.println("Illegal line '" + rLine + "' in "
-                + consensus + ". Skipping consensus.");
-            continue;
-          } else {
-            String fingerprint = parts[2];
-            if (fingerprint.length() !=
-                "AAAAAAAAAAAAAAAAAAAAAAAAAAA".length()) {
-              System.out.println("Illegal line '" + rLine + "' in "
-                  + consensus + ". Skipping consensus.");
-              continue;
-            }
-            fingerprints.add(fingerprint);
-          }
-        } else if (line.startsWith("directory-signature ")) {
-          reachedEnd = true;
-          break;
-        }
-      }
-      br.close();
-      if (!reachedEnd) {
-        System.out.println("Did not reach the consensus end of "
-            + consensus + ". Skipping consensus.");
-        continue;
-      }
-
-      /* Increment weighted uptime for all running relays by 3600
-       * seconds. */
-      for (String fingerprint : fingerprints) {
-        if (!knownRelays.containsKey(fingerprint)) {
-          knownRelays.put(fingerprint, new long[] { 3600L, 0L });
-        } else {
-          knownRelays.get(fingerprint)[0] += 3600L;
-        }
-      }
-
-      /* Increment total weighted time for all relays by 3600 seconds. */
-      for (long[] w : knownRelays.values()) {
-        w[1] += 3600L;
-      }
-
-      /* Read previously calculated future WFUs from disk. */
-      Map<String, Long> fwfus = new HashMap<String, Long>();
-      File fwfuFile = new File("fwfu", consensus.getName());
-      if (!fwfuFile.exists()) {
-        System.out.println("Could not find file " + fwfuFile
-            + ". Exiting!");
-        System.exit(1);
-      }
-      br = new BufferedReader(new FileReader(fwfuFile));
-      while ((line = br.readLine()) != null) {
-        String[] parts = line.split(" ");
-        fwfus.put(parts[0], Long.parseLong(parts[1]));
-      }
-
-      /* Run the simulation for the relays in the current consensus for
-       * various required WFUs. */
-      bw.write(isoFormatter.format(validAfter));
-      for (long requiredWFU : requiredWFUs) {
-        long selectedRelays = 0L,
-            totalRelays = (long) fingerprints.size(), totalFwfu = 0L;
-        List<Long> fwfuList = new ArrayList<Long>();
-        for (String fingerprint : fingerprints) {
-          long[] pwfu = knownRelays.get(fingerprint);
-          long wfu = (10000L * pwfu[0]) / pwfu[1];
-          if (wfu >= requiredWFU) {
-            selectedRelays += 1L;
-            if (fwfus.containsKey(fingerprint)) {
-              long fwfu = fwfus.get(fingerprint);
-              totalFwfu += fwfu;
-              fwfuList.add(fwfu);
-            }
-          }
-        }
-        if (selectedRelays == 0L) {
-          bw.write(",NA,NA,NA,NA");
-        } else {
-          Collections.sort(fwfuList, Collections.reverseOrder());
-          long perc85 = fwfuList.get((85 * fwfuList.size()) / 100);
-          long perc90 = fwfuList.get((90 * fwfuList.size()) / 100);
-          long perc95 = fwfuList.get((95 * fwfuList.size()) / 100);
-          bw.write("," + (totalFwfu / selectedRelays) + "," + perc85
-              + "," + perc90 + "," + perc95);
-        }
-        bw.write("," + (10000L * selectedRelays / totalRelays));
-      }
-      bw.write("\n");
-    }
-    bw.close();
-
-    /* Print how long this execution took and exit. */
-    System.out.println("Execution took " + ((System.currentTimeMillis()
-        - started) / (60L * 1000L)) + " minutes.");
-  }
-}
-
diff --git a/task-2911/wfu-sim/wfu-sim.R b/task-2911/wfu-sim/wfu-sim.R
deleted file mode 100644
index 149ce6d..0000000
--- a/task-2911/wfu-sim/wfu-sim.R
+++ /dev/null
@@ -1,57 +0,0 @@
-library(ggplot2)
-data <- read.csv("wfu-sim.csv", stringsAsFactors = FALSE)
-
-d <- data[data$time >= '2010' & data$time < '2011', ]
-d <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)), mean)
-d <- rbind(
-  data.frame(x = d$guards9000, y = d$wfu9000, sim = "90 %"),
-  data.frame(x = d$guards9500, y = d$wfu9500, sim = "95 %"),
-  data.frame(x = d$guards9800, y = d$wfu9800, sim = "98 % (default)"),
-  data.frame(x = d$guards9900, y = d$wfu9900, sim = "99 %"),
-  data.frame(x = d$guards9990, y = d$wfu9990, sim = "99.9 %"))
-ggplot(d, aes(x = x / 10000.0, y = y / 10000.0)) +
-geom_path() +
-facet_wrap(~ sim) +
-scale_x_continuous("\nFraction of relays meeting WFU requirement",
-  formatter = "percent") +
-scale_y_continuous("Mean WFU in the future\n", formatter = "percent")
-ggsave(filename = "wfu-sim.pdf", width = 8, height = 5, dpi = 100)
-
-## Commented out, because graph is meaningless in b/w.
-#d <- data[data$time >= '2010' & data$time < '2011', ]
-#d <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)), mean)
-#d <- rbind(
-#  data.frame(x = d$guards9000, y = d$wfu9000, sim = "90 %"),
-#  data.frame(x = d$guards9500, y = d$wfu9500, sim = "95 %"),
-#  data.frame(x = d$guards9800, y = d$wfu9800, sim = "98 % (default)"),
-#  data.frame(x = d$guards9900, y = d$wfu9900, sim = "99 %"),
-#  data.frame(x = d$guards9990, y = d$wfu9990, sim = "99.9 %"))
-#ggplot(d, aes(x = x / 10000.0, y = y / 10000.0, colour = sim)) +
-#geom_path() +
-#scale_x_continuous("\nFraction of relays meeting WFU requirement",
-#  formatter = "percent") +#, trans = "reverse") +
-#scale_y_continuous("Mean WFU    \nin the future    ",
-#  formatter = "percent") +
-#scale_colour_hue("Required WFU") +
-#opts(axis.title.x = theme_text(size = 12 * 0.8, face = "bold",
-#  hjust = 0.5),
-#  axis.title.y = theme_text(size = 12 * 0.8, face = "bold", vjust = 0.5,
-#  hjust = 1))
-#ggsave(filename = "wfu-sim.pdf", width = 8, height = 5, dpi = 100)
-
-## Commented out, because the time plot is not as useful as expected.
-#simulations <- paste("wfu", rev(c(9000, 9200, 9400, 9600, 9800)),
-#  sep = "")
-#d <- data[data$time >= '2010' & data$time < '2011',
-#  c("time", simulations)]
-#d <- aggregate(d[, 2:length(d)], by = list(date = as.Date(d$time)), mean)
-#d <- melt(d, id.vars = 1)
-#ggplot(d, aes(x = date, y = value / 10000.0, colour = variable)) +
-#geom_line() +
-#scale_x_date("", major = "3 months", minor = "1 month",
-#  format = "%b %Y") +
-#scale_y_continuous("Empirical future WFU\n", formatter = "percent") +
-#scale_colour_hue("Required past WFU\n", breaks = simulations,
-#  labels = paste(as.numeric(substr(simulations, 4, 9)) / 100.0, "%"))
-#ggsave(filename = "wfu-sim-time.pdf", width = 8, height = 5, dpi = 100)
-



_______________________________________________
tor-commits mailing list
tor-commits@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-commits
Prev by Author: [tor-commits] [metrics-tasks/master] Merge branch 'task-2911'
Next by Author: [tor-commits] [metrics-web/master] Add relay stability tech report to metrics website.
Previous by thread: [tor-commits] [metrics-tasks/master] Merge branch 'task-2911'
Next by thread: [tor-commits] [arm/master] Implement bandwidth stats in graph labels.
Index(es):
- Author
- Thread