[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[tor-commits] [bridgedb/develop] Add specification for BridgeDB's metrics format.



commit 39b6285b28e2826af09bce6a3563f0b1138eac7e
Author: Philipp Winter <phw@xxxxxxxxx>
Date:   Wed Sep 18 13:42:47 2019 -0700

    Add specification for BridgeDB's metrics format.
    
    We implemented BridgeDB's metrics in <https://bugs.torproject.org/9316>
    but haven't specified its format until this patch.
    
    This patch also makes our implementation consistent with our (slightly
    updated) specification.  In particular:
    
    * For naming consistency, we changed "bridgedb-stats-version" to
      "bridgedb-metrics-version" and "bridgedb-stats-end" to
      "bridgedb-metrics-end".
    
    * For simplicity, we also changed our version from a major and minor
      number to a single number.
    
    * Instead of appending to our metrics file, we now overwrite the file
      because our specification requires "bridgedb-metrics-end" and
      "bridgedb-metrics-version" to be there exactly once.
---
 CHANGELOG                     |  8 +++++
 bridgedb/main.py              |  2 +-
 bridgedb/metrics.py           | 13 ++++----
 bridgedb/test/test_metrics.py |  4 +--
 doc/bridgedb-metrics-spec.txt | 74 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 91 insertions(+), 10 deletions(-)

diff --git a/CHANGELOG b/CHANGELOG
index 06968d2..c2fca89 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,3 +1,11 @@
+Changes in version A.B.C - YYYY-MM-DD
+
+        * FIXES https://bugs.torproject.org/31780
+        We implemented BridgeDB's metrics in #9316 but haven't specified its
+        format until now.  In addition to adding a specification, this patch
+        also makes our implementation consistent with our (slightly updated)
+        specification.
+
 Changes in version 0.8.2 - 2019-09-20
 
         Updated translations for the following languages:
diff --git a/bridgedb/main.py b/bridgedb/main.py
index 94f4921..7c2df6d 100644
--- a/bridgedb/main.py
+++ b/bridgedb/main.py
@@ -85,7 +85,7 @@ def writeMetrics(filename, measurementInterval):
     logging.debug("Dumping metrics to file: '%s'" % filename)
 
     try:
-        with open(filename, 'a') as fh:
+        with open(filename, 'w') as fh:
             metrics.export(fh, measurementInterval)
     except IOError as err:
         logging.error("Failed to write metrics to '%s': %s" % (filename, err))
diff --git a/bridgedb/metrics.py b/bridgedb/metrics.py
index 4e1c880..5e14146 100644
--- a/bridgedb/metrics.py
+++ b/bridgedb/metrics.py
@@ -9,7 +9,7 @@
 # :license: see LICENSE for licensing information
 # _____________________________________________________________________________
 
-"""API for keeping track of BridgeDB statistics, e.g., the demand for bridges
+"""API for keeping track of BridgeDB metrics, e.g., the demand for bridges
 over time.
 """
 
@@ -53,9 +53,9 @@ SUBNET_CTR_PREFIX_LEN = 20
 # All of the pluggable transports BridgeDB currently supports.
 SUPPORTED_TRANSPORTS = None
 
-# Major and minor version number for our statistics format.
-METRICS_MAJOR_VERSION = 1
-METRICS_MINOR_VERSION = 0
+# Version number for our metrics format.  We increment the version if our
+# format changes.
+METRICS_VERSION = 1
 
 
 def setProxies(proxies):
@@ -120,11 +120,10 @@ def export(fh, measurementInterval):
         logging.debug("Metrics module knows about %d proxies." % numProxies)
 
     now = datetime.datetime.utcnow()
-    fh.write("bridgedb-stats-end %s (%d s)\n" % (
+    fh.write("bridgedb-metrics-end %s (%d s)\n" % (
              now.strftime("%Y-%m-%d %H:%M:%S"),
              measurementInterval))
-    fh.write("bridgedb-stats-version %d.%d\n" % (METRICS_MAJOR_VERSION,
-                                                 METRICS_MINOR_VERSION))
+    fh.write("bridgedb-metrics-version %d\n" % METRICS_VERSION)
 
     httpsLines = httpsMetrix.getMetrics()
     for line in httpsLines:
diff --git a/bridgedb/test/test_metrics.py b/bridgedb/test/test_metrics.py
index a870fc2..a27431c 100644
--- a/bridgedb/test/test_metrics.py
+++ b/bridgedb/test/test_metrics.py
@@ -110,8 +110,8 @@ class StateTest(unittest.TestCase):
         self.assertTrue(len(pseudo_fh.getvalue()) > 0)
 
         lines = pseudo_fh.getvalue().split("\n")
-        self.assertTrue(lines[0].startswith("bridgedb-stats-end"))
-        self.assertTrue(lines[1].startswith("bridgedb-stats-version"))
+        self.assertTrue(lines[0].startswith("bridgedb-metrics-end"))
+        self.assertTrue(lines[1].startswith("bridgedb-metrics-version"))
         self.assertTrue(lines[2] ==
                         "bridgedb-metric-count https.obfs4.de.success.None 10")
 
diff --git a/doc/bridgedb-metrics-spec.txt b/doc/bridgedb-metrics-spec.txt
new file mode 100644
index 0000000..14c38f9
--- /dev/null
+++ b/doc/bridgedb-metrics-spec.txt
@@ -0,0 +1,74 @@
+                      BridgeDB metrics (version 1)
+
+BridgeDB exports usage metrics once every 24 hours.  These metrics
+encode how many approximate successful/failed requests BridgeDB has seen
+per distribution mechanism, per pluggable transport, per country code or
+email provider.  For example, one of these metrics lines can tell us
+that over the last 24 hours, BridgeDB has seen between 21 and 30
+successful requests for obfs4 over moat from Zimbabwe.
+
+This section specifies the format of BridgeDB's metrics.  Each metrics
+file is formatted as follows:
+
+  "bridgedb-metrics-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
+      [At start, exactly once.]
+
+      YYYY-MM-DD HH:MM:SS defines the end (in UTC) of the included
+      measurement interval of length NSEC seconds (86400 seconds by
+      default).
+
+      Example:
+        bridgedb-metrics-end 2019-09-18 00:33:44 (86400 s)
+
+  "bridgedb-metrics-version" VERSION NL
+      [Exactly once.]
+
+      VERSION determines the version of the metrics format.  As the
+      format changes over time, we will increment VERSION.  The latest
+      version is 1 -- the first iteration of the metrics format.
+
+      Example:
+        bridgedb-metrics-version 1
+
+  "bridgedb-metric-count" METRIC_KEY COUNT NL
+      [Any number.]
+
+      METRIC_KEY determines a metrics key, which consists of several
+      fields, separated by a period:
+
+      DISTRIBUTION "." TRANSPORT "." CC/EMAIL "." "success" | "fail" "." RESERVED
+
+      DISTRIBUTION is BridgeDB's distribution mechanism, which includes
+      "https", "email", and "moat".  These distribution mechanisms may
+      change in the future.
+
+      TRANSPORT refers to a pluggable transport protocol.  This includes
+      "obfs2", "obfs3", "obfs4", "scramblesuit", and "fte".  These
+      pluggable transports will change in the future.
+
+      CC/EMAIL refers to a two-letter country code of the user's IP
+      address iff DISTRIBUTION is "moat" or "https"; or to an email
+      provider iff DISTRIBUTION is "email".  We use two reserved country
+      codes, "??" and "zz".  "??" denotes that we couldn't map an IP
+      address to its country, e.g., because our geolocation API was
+      unable to.  "zz" denotes a proxy IP address, e.g., Tor exit
+      relays.  The two allowed email providers are "gmail" and "riseup".
+
+      The next field is either "success" or "fail", depending on if the
+      BridgeDB request was successful or not.  A request is successful
+      if BridgeDB attempts to provide the user with bridges, even if
+      BridgeDB currently has no bridges available.  A request has failed
+      if BridgeDB won't provide the user with bridges, for example, if
+      the user could not solve the CAPTCHA.
+
+      The field RESERVED is reserved for an anomaly score.  It is
+      currently set to "none" and should be ignored by implementations.
+
+      COUNT is the approximate number of user requests for the given
+      METRIC_KEY.  We round up the number of requests to the next
+      multiple of 10 to preserve some user privacy.
+
+      Examples:
+        bridgedb-metric-count https.scramblesuit.zz.fail.none 100
+        bridgedb-metric-count moat.obfs4.??.success.none 3550
+        bridgedb-metric-count email.fte.gmail.fail.none 10



_______________________________________________
tor-commits mailing list
tor-commits@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-commits