[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-commits] [metrics-web/master] Retire old user number estimates.
commit b9ce7127ccb722bfe2a368450ba4569d68cd11e3
Author: Karsten Loesing <karsten.loesing@xxxxxxx>
Date: Mon Oct 28 15:44:37 2013 +0100
Retire old user number estimates.
web/WEB-INF/users.jsp | 407 +++++++++++++++++--------------------------------
1 file changed, 143 insertions(+), 264 deletions(-)
diff --git a/web/WEB-INF/users.jsp b/web/WEB-INF/users.jsp
index 06d7f4a..788b3a8 100644
--- a/web/WEB-INF/users.jsp
+++ b/web/WEB-INF/users.jsp
@@ -16,238 +16,11 @@
<h2>Tor Metrics Portal: Users</h2>
-<a name="direct-users"></a>
-<h3><a href="#direct-users" class="anchor">Directly connecting Tor
-<p>After being connected to the Tor network, users need to refresh their
-list of running relays on a regular basis. They send their requests to one
-out of a few hundred directory mirrors to save bandwidth of the directory
-authorities. The following graphs show an estimate of recurring Tor users
-based on the requests seen by a few dozen directory mirrors.</p>
-<p><b>Daily directly connecting users:</b></p>
-<img src="direct-users.png${direct_users_url}"
- width="576" height="360" alt="Direct users graph">
-<form action="users.html#direct-users">
- <div class="formrow">
- <input type="hidden" name="graph" value="direct-users">
- <p>
- <label>Start date (yyyy-mm-dd):</label>
- <input type="text" name="start" size="10"
- value="<c:choose><c:when test="${fn:length(direct_users_start) == 0}">${default_start_date}</c:when><c:otherwise>${direct_users_start[0]}</c:otherwise></c:choose>">
- <label>End date (yyyy-mm-dd):</label>
- <input type="text" name="end" size="10"
- value="<c:choose><c:when test="${fn:length(direct_users_end) == 0}">${default_end_date}</c:when><c:otherwise>${direct_users_end[0]}</c:otherwise></c:choose>">
- </p><p>
- Source: <select name="country">
- <option value="all"<c:if test="${direct_users_country[0] eq 'all'}"> selected</c:if>>All users</option>
- <c:forEach var="country" items="${countries}" >
- <option value="${country[0]}"<c:if test="${direct_users_country[0] eq country[0]}"> selected</c:if>>${country[1]}</option>
- </c:forEach>
- </select>
- </p><p>
- Show possible censorship events if available (<a
- href="http://research.torproject.org/techreports/detector-2011-09-09.pdf">BETA</a>)
- <select name="events">
- <option value="off">Off</option>
- <option value="on"<c:if test="${direct_users_events[0] eq 'on'}"> selected</c:if>>On: both points and expected range</option>
- <option value="points"<c:if test="${direct_users_events[0] eq 'points'}"> selected</c:if>>On: points only, no expected range</option>
- </select>
- </p><p>
- <input class="submit" type="submit" value="Update graph">
- </p>
- </div>
-<p>Download graph as
-<a href="direct-users.pdf${direct_users_url}">PDF</a> or
-<a href="direct-users.svg${direct_users_url}">SVG</a>.</p>
-<a name="direct-users-table"></a>
-<p><b>Top-10 countries by directly connecting users:</b></p>
-<form action="users.html#direct-users-table">
- <div class="formrow">
- <input type="hidden" name="table" value="direct-users">
- <p>
- <label>Start date (yyyy-mm-dd):</label>
- <input type="text" name="start" size="10"
- value="<c:choose><c:when test="${fn:length(direct_users_start) == 0}">${default_start_date}</c:when><c:otherwise>${direct_users_start[0]}</c:otherwise></c:choose>">
- <label>End date (yyyy-mm-dd):</label>
- <input type="text" name="end" size="10"
- value="<c:choose><c:when test="${fn:length(direct_users_end) == 0}">${default_end_date}</c:when><c:otherwise>${direct_users_end[0]}</c:otherwise></c:choose>">
- </p><p>
- <input class="submit" type="submit" value="Update table">
- </p>
- </div>
- <tr>
- <th>Country</th>
- <th>Mean daily users</th>
- </tr>
- <c:forEach var="row" items="${direct_users_tabledata}">
- <tr>
- <td><a href="users.html?graph=direct-users&country=${row['cc']}#direct-users">${row['country']}</a> </td>
- <td>${row['abs']} (<fmt:formatNumber type="number" minFractionDigits="2" value="${row['rel']}" /> %)</td>
- </tr>
- </c:forEach>
-<a name="censorship-events"></a>
-<p><b>Top-10 countries by possible censorship events (<a
- href="http://research.torproject.org/techreports/detector-2011-09-09.pdf">BETA</a>):</b></p>
-<form action="users.html#censorship-events">
- <div class="formrow">
- <input type="hidden" name="table" value="censorship-events">
- <p>
- <label>Start date (yyyy-mm-dd):</label>
- <input type="text" name="start" size="10"
- value="<c:choose><c:when test="${fn:length(censorship_events_start) == 0}">${default_start_date}</c:when><c:otherwise>${censorship_events_start[0]}</c:otherwise></c:choose>">
- <label>End date (yyyy-mm-dd):</label>
- <input type="text" name="end" size="10"
- value="<c:choose><c:when test="${fn:length(censorship_events_end) == 0}">${default_end_date}</c:when><c:otherwise>${censorship_events_end[0]}</c:otherwise></c:choose>">
- </p><p>
- <input class="submit" type="submit" value="Update table">
- </p>
- </div>
- <tr>
- <th>Country</th>
- <th>Downturns</th>
- <th>Upturns</th>
- </tr>
- <c:forEach var="row" items="${censorship_events_tabledata}">
- <tr>
- <td><a href="users.html?graph=direct-users&country=${row['cc']}&events=on#direct-users">${row['country']}</a> </td>
- <td>${row['downturns']}</td>
- <td>${row['upturns']}</td>
- </tr>
- </c:forEach>
-<p><a href="csv/direct-users.csv">CSV</a> file containing daily directly
-connecting users by country.</p>
-<p><a href="csv/monthly-users-peak.csv">CSV</a> file containing peak daily
-Tor users (direct and bridge) per month by country.</p>
-<p><a href="csv/monthly-users-average.csv">CSV</a> file containing average
-daily Tor users (direct and bridge) per month by country.</p>
-<a name="bridge-users"></a>
-<h3><a href="#bridge-users" class="anchor">Tor users via bridges</a></h3>
-<p>Users who cannot connect directly to the Tor network instead connect
-via bridges, which are non-public relays. The following graphs display an
-estimate of Tor users via bridges based on the unique IP addresses as seen
-by a few hundred bridges.</p>
-<img src="bridge-users.png${bridge_users_url}"
- width="576" height="360" alt="Bridge users graph">
-<form action="users.html#bridge-users">
- <div class="formrow">
- <input type="hidden" name="graph" value="bridge-users">
- <p>
- <label>Start date (yyyy-mm-dd):</label>
- <input type="text" name="start" size="10"
- value="<c:choose><c:when test="${fn:length(bridge_users_start) == 0}">${default_start_date}</c:when><c:otherwise>${bridge_users_start[0]}</c:otherwise></c:choose>">
- <label>End date (yyyy-mm-dd):</label>
- <input type="text" name="end" size="10"
- value="<c:choose><c:when test="${fn:length(bridge_users_end) == 0}">${default_end_date}</c:when><c:otherwise>${bridge_users_end[0]}</c:otherwise></c:choose>">
- </p><p>
- Source: <select name="country">
- <option value="all"<c:if test="${bridge_users_country[0] eq 'all'}"> selected</c:if>>All users</option>
- <c:forEach var="country" items="${countries}" >
- <option value="${country[0]}"<c:if test="${bridge_users_country[0] eq country[0]}"> selected</c:if>>${country[1]}</option>
- </c:forEach>
- </select>
- </p><p>
- <input class="submit" type="submit" value="Update graph">
- </p>
- </div>
-<p>Download graph as
-<a href="bridge-users.pdf${bridge_users_url}">PDF</a> or
-<a href="bridge-users.svg${bridge_users_url}">SVG</a>.</p>
-<a name="bridge-users-table"></a>
-<p><b>Top-10 countries by bridge users:</b></p>
-<form action="users.html#bridge-users-table">
- <div class="formrow">
- <input type="hidden" name="table" value="bridge-users">
- <p>
- <label>Start date (yyyy-mm-dd):</label>
- <input type="text" name="start" size="10"
- value="<c:choose><c:when test="${fn:length(bridge_users_start) == 0}">${default_start_date}</c:when><c:otherwise>${bridge_users_start[0]}</c:otherwise></c:choose>">
- <label>End date (yyyy-mm-dd):</label>
- <input type="text" name="end" size="10"
- value="<c:choose><c:when test="${fn:length(bridge_users_end) == 0}">${default_end_date}</c:when><c:otherwise>${bridge_users_end[0]}</c:otherwise></c:choose>">
- </p><p>
- <input class="submit" type="submit" value="Update table">
- </p>
- </div>
- <tr>
- <th>Country</th>
- <th>Mean daily users</th>
- </tr>
- <c:forEach var="row" items="${bridge_users_tabledata}">
- <tr>
- <td><a href="users.html?graph=bridge-users&country=${row['cc']}#bridge-users">${row['country']}</a> </td>
- <td>${row['abs']} (<fmt:formatNumber type="number" minFractionDigits="2" value="${row['rel']}" /> %)</td>
- </tr>
- </c:forEach>
-<p><a href="csv/bridge-users.csv">CSV</a> file containing all data.</p>
-<p><a href="csv/monthly-users-peak.csv">CSV</a> file containing peak daily
-Tor users (direct and bridge) per month by country.</p>
-<p><a href="csv/monthly-users-average.csv">CSV</a> file containing average
-daily Tor users (direct and bridge) per month by country.</p>
-<a name="userstats"></a>
-<h3><a href="#userstats" class="anchor">New approach to estimating daily
-Tor users (BETA)</a></h3>
-<p>As of April 2013, we are experimenting with a new approach to estimating
-daily Tor users.
-The new approach works very similar to the existing approach to estimate
-directly connecting users, but can also be applied to bridge users.
-This new approach can break down user numbers by country, pluggable
-transport, and IP version.
-See the tech report on
-<a href="https://research.torproject.org/techreports/counting-daily-bridge-users-2012-10-24.pdf">counting daily bridge users</a>
-and the
-<a href="https://gitweb.torproject.org/metrics-tasks.git/tree/HEAD:/task-8462">source code</a>
-for details.
<a name="userstats-relay-country"></a>
-<p><b>Direct users by country (BETA):</b></p>
-<font color="red">
-<p>This graph is quite similar to the graphs above,
-except for the following differences:</p>
-<li>In contrast to the graphs above, this graph is based on
-requests to directory mirrors <i>and</i> directory authorities.
-The idea is that we want to estimate both new and recurring users.
-That is why the numbers here are higher.</li>
-<li>This graph uses byte histories for written <i>directory bytes</i>
-rather than general byte history to weight what fraction of directory
-requests a relay has answered in the network.</li>
-<li>The implementation behind this graph is much more efficient, which
-reduces time to graph from about 3 days to about 1 day.</li>
+<p><b>Direct users by country:</b></p>
<img src="userstats-relay-country.png${userstats_relay_country_url}"
- width="576" height="360" alt="Direct users by country graph (BETA)">
+ width="576" height="360" alt="Direct users by country graph">
<form action="users.html#userstats-relay-country">
<div class="formrow">
<input type="hidden" name="graph" value="userstats-relay-country">
@@ -283,7 +56,7 @@ reduces time to graph from about 3 days to about 1 day.</li>
<a href="userstats-relay-country.svg${userstats_relay_country_url}">SVG</a>.</p>
<a name="userstats-relay-table"></a>
-<p><b>Top-10 countries by directly connecting users (BETA):</b></p>
+<p><b>Top-10 countries by directly connecting users:</b></p>
<form action="users.html#userstats-relay-table">
<div class="formrow">
<input type="hidden" name="table" value="userstats-relay">
@@ -349,16 +122,10 @@ reduces time to graph from about 3 days to about 1 day.</li>
<a name="userstats-bridge-country"></a>
-<p><b>Bridge users by country (BETA):</b></p>
-<font color="red">In contrast to the bridge-user graph above, this graph
-uses directory requests to estimate user numbers, not unique IP address sets.
-It's yet to be decided which approach is more correct.</font>
+<p><b>Bridge users by country:</b></p>
<img src="userstats-bridge-country.png${userstats_bridge_country_url}"
- width="576" height="360" alt="Bridge users by country graph (BETA)">
+ width="576" height="360" alt="Bridge users by country graph">
<form action="users.html#userstats-bridge-country">
<div class="formrow">
<input type="hidden" name="graph" value="userstats-bridge-country">
@@ -386,7 +153,7 @@ It's yet to be decided which approach is more correct.</font>
<a href="userstats-bridge-country.svg${userstats_bridge_country_url}">SVG</a>.</p>
<a name="userstats-bridge-table"></a>
-<p><b>Top-10 countries by bridge users (BETA):</b></p>
+<p><b>Top-10 countries by bridge users:</b></p>
<form action="users.html#userstats-bridge-table">
<div class="formrow">
<input type="hidden" name="table" value="userstats-bridge">
@@ -418,19 +185,10 @@ It's yet to be decided which approach is more correct.</font>
<a name="userstats-bridge-transport"></a>
-<p><b>Bridge users by transport (BETA):</b></p>
-<font color="red">Almost none of the currently running bridges report the
-transport name of connecting users, which is why non-OR transport usage is
-so low.
-By default, we consider all users of a bridge OR transport users, unless told
-Non-OR transport numbers will become more accurate over time.</font>
+<p><b>Bridge users by transport:</b></p>
<img src="userstats-bridge-transport.png${userstats_bridge_transport_url}"
- width="576" height="360" alt="Bridge users by transport graph (BETA)">
+ width="576" height="360" alt="Bridge users by transport graph">
<form action="users.html#userstats-bridge-transport">
<div class="formrow">
<input type="hidden" name="graph" value="userstats-bridge-transport">
@@ -460,18 +218,10 @@ Non-OR transport numbers will become more accurate over time.</font>
<a name="userstats-bridge-version"></a>
-<p><b>Bridge users by IP version (BETA):</b></p>
-<font color="red">Not all of the currently running bridges report the
-IP version of connecting users.
-By default, we consider all users of a bridge IPv4 users, unless told
-IPv6 numbers will become more accurate over time.</font>
+<p><b>Bridge users by IP version:</b></p>
<img src="userstats-bridge-version.png${userstats_bridge_version_url}"
- width="576" height="360" alt="Bridge users by IP version graph (BETA)">
+ width="576" height="360" alt="Bridge users by IP version graph">
<form action="users.html#userstats-bridge-version">
<div class="formrow">
<input type="hidden" name="graph" value="userstats-bridge-version">
@@ -498,14 +248,143 @@ IPv6 numbers will become more accurate over time.</font>
<p><a href="csv/userstats.csv">CSV</a> file containing new user
-estimates (BETA).</p>
<p><a href="csv/monthly-userstats-peak.csv">CSV</a> file containing peak
-daily Tor users (direct and bridge) per month by country (BETA).</p>
+daily Tor users (direct and bridge) per month by country.</p>
<p><a href="csv/monthly-userstats-average.csv">CSV</a> file containing
-average daily Tor users (direct and bridge) per month by country
+average daily Tor users (direct and bridge) per month by country.</p>
+<a name="questions-and-answers"></a>
+<p><b>Questions and answers</b></p>
+Q: How is it even possible to count users in an anonymity network?<br/>
+A: We actually don't count users, but we count requests to the directories
+that clients make periodically to update their list of relays and estimate
+user numbers indirectly from there.
+Q: Do all directories report these directory request numbers?<br/>
+A: No, but we can see what fraction of directories reported them, and then
+we can extrapolate the total number in the network.
+Q: How do you get from these directory requests to user numbers?<br/>
+A: We put in the assumption that the average client makes 10 such requests
+per day. A tor client that is connected 24/7 makes about 15 requests per
+day, but not all clients are connected 24/7, so we picked the number 10
+for the average client. We simply divide directory requests by 10 and
+consider the result as the number of users.
+Q: So, are these distinct users per day, average number of users connected
+over the day, or what?<br/>
+A: Average number of users connected over the day. We can't say how many
+distinct users there are.
+Q: Are these tor clients or users? What if there's more than one user
+behind a tor client?<br/>
+A: Then we count those users as one. We really count clients, but it's
+more intuitive for most people to think of users, that's why we say users
+and not clients.
+Q: What if a user runs tor on a laptop and changes their IP address a few
+times per day? Don't you overcount that user?<br/>
+A: No, because that user updates their list of relays as often as a user
+that doesn't change IP address over the day.
+Q: How do you know which countries users come from?<br/>
+A: The directories resolve IP addresses to country codes and report these
+numbers in aggregate form. This is one of the reasons why tor ships with
+a GeoIP database.
+Q: Why are there so few bridge users that are not using the default OR
+protocol or that are using IPv6?<br/>
+A: Very few bridges report data on transports or IP versions yet, and by
+default we consider requests to use the default OR protocol and IPv4.
+Once more bridges report these data, the numbers will become more
+Q: Why do the graphs end 2 days in the past and not today?<br/>
+A: Relays and bridges report some of the data in 24-hour intervals which
+may end at any time of the day. And after such an interval is over relays
+and bridges might take another 18 hours to report the data. We cut off
+the last two days from the graphs, because we want to avoid that the last
+data point in a graph indicates a recent trend change which is in fact
+just an artifact of the algorithm.
+Q: But I noticed that the last data point went up/down a bit since I last
+looked a few hours ago. Why is that?<br/>
+A: You're an excellent observer! The reason is that we publish user
+numbers once we're confident enough that they won't change significantly
+anymore. But it's always possible that a directory reports data a few
+hours after we were confident enough, but which then slightly changed the
+Q: Why are no numbers available before September 2011?<br/>
+A: We do have descriptor archives from before that time, but those
+descriptors didn't contain all the data we use to estimate user numbers.
+We do have older user numbers from an earlier estimation approach here
+(add link), but we believe the current approach is more accurate.
+Q: Why do you believe the current approach to estimate user numbers is
+more accurate?<br/>
+A: For direct users, we include all directories which we didn't do in the
+old approach. We also use histories that only contain bytes written to
+answer directory requests, which is more precise than using general byte
+Q: And what about the advantage of the current approach over the old one
+when it comes to bridge users?<br/>
+A: Oh, that's a whole different story. We wrote a 13 page long
+<a href="https://research.torproject.org/techreports/counting-daily-bridge-users-2012-10-24.pdf">technical
+report</a> explaining the reasons for retiring the old approach. But the
+old data is still <a href="/data/old-user-number-estimates.tar.gz">available</a>.
+tl;dr: in the old approach we measured the wrong thing, and now we measure
+the right thing.
+Q: Are the data and the source code for estimating these user numbers
+A: Sure, <a href="/data.html">data</a> and
+<a href="https://gitweb.torproject.org/metrics-tasks.git/tree/HEAD:/task-8462">source
+code</a> are publicly available.
+Q: What are these red and blue dots indicating possible censorship
+A: We run an anomaly-based censorship-detection system that looks at
+estimated user numbers over a series of days and predicts the user number
+in the next days. If the actual number is higher or lower, this might
+indicate a possible censorship event or release of censorship. For more
+details, see our
+<a href="https://research.torproject.org/techreports/detector-2011-09-09.pdf">technical
<div class="bottom" id="bottom">
tor-commits mailing list