[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-commits] [metrics-web/master] Explain sorting more prominently.
commit 76688eec38bc02a2813ae9bf9a72f1f1c2c239c3
Author: iwakeh <iwakeh@xxxxxxxxxxxxxx>
Date: Tue Nov 14 08:39:29 2017 +0000
Explain sorting more prominently.
Also make the point that normal web log analyzers can operate on sanitized logs.
Improvements were suggested by Sebastian, cf. ticket-23243.
---
.../src/main/resources/spec/web-server-logs.xml | 17 +++++++------
.../main/resources/web/WEB-INF/web-server-logs.jsp | 29 +++++++++++++---------
2 files changed, 27 insertions(+), 19 deletions(-)
diff --git a/website/src/main/resources/spec/web-server-logs.xml b/website/src/main/resources/spec/web-server-logs.xml
index d8efe53..13cfad7 100644
--- a/website/src/main/resources/spec/web-server-logs.xml
+++ b/website/src/main/resources/spec/web-server-logs.xml
@@ -20,7 +20,7 @@
</front>
<middle>
<section title="Purpose of this document">
- <t>BETA: As of November 8, 2017, this document is still under
+ <t>BETA: As of November 14, 2017, this document is still under
discussion and subject to change without prior notice. Feel free
to <eref target="/about.html#contact">contact us</eref> for questions or
concerns regarding this document.</t>
@@ -174,6 +174,12 @@ mod_log_config module</eref>.</t>
<section title="Re-assembling log files">
<t>Rewritten log lines are re-assembled into sanitized log files based
on physical host, virtual host, and request start date.</t>
+ <t>All rewritten log lines are sorted alphabetically, so that request
+ order cannot be inferred from sanitized log files.</t>
+ <t>Many of the sanitized log lines will now be identical.
+ But in order to not remove too much useful information we keep the
+ identical log lines and thus enable typical web log analyzers to
+ operate on the sanitized log files. </t>
<t>The naming convention for sanitized log files is:
<list>
<t><virtual-host>_<physical-host>_access.log_YYYYMMDD[.xz]</t>
@@ -190,12 +196,9 @@ mod_log_config module</eref>.</t>
'dist.torproject.org', are more familiar to the public and were therefore
chosen to be the first naming component.
</t>
- <t>As last and certainly not least important sanitizing step, all
- rewritten log lines are sorted alphabetically, so that request order
- cannot be inferred from sanitized log files.</t>
- <t>Sanitized log files are typically compressed before publication. In
- particular the sorting step allows for highly efficient compression
- rates. We typically use XZ for compression, which is indicated by
+ <t>Sanitized log files are typically compressed before publication.
+ The sorting step also allows for highly efficient compression rates.
+ We typically use XZ for compression, which is indicated by
appending ".xz" to log file names, but this is subject to change.</t>
</section>
</section>
diff --git a/website/src/main/resources/web/WEB-INF/web-server-logs.jsp b/website/src/main/resources/web/WEB-INF/web-server-logs.jsp
index b1505df..5e9cc79 100644
--- a/website/src/main/resources/web/WEB-INF/web-server-logs.jsp
+++ b/website/src/main/resources/web/WEB-INF/web-server-logs.jsp
@@ -22,7 +22,7 @@
"#rfc.section.1">1.</a> <a href=
"#n-purpose-of-this-document">Purpose of this document</a></h2>
<div id="rfc.section.1.p.1">
-<p>BETA: As of November 8, 2017, this document is still under
+<p>BETA: As of November 14, 2017, this document is still under
discussion and subject to change without prior notice. Feel free to
<a href="/about.html#contact">contact us</a> for questions or
concerns regarding this document.</p>
@@ -254,6 +254,16 @@ of processing that format.</p>
based on physical host, virtual host, and request start date.</p>
</div>
<div id="rfc.section.4.3.p.2">
+<p>All rewritten log lines are sorted alphabetically, so that
+request order cannot be inferred from sanitized log files.</p>
+</div>
+<div id="rfc.section.4.3.p.3">
+<p>Many of the sanitized log lines will now be identical. But in
+order to not remove too much useful information we keep the
+identical log lines and thus enable typical web log analyzers to
+operate on the sanitized log files.</p>
+</div>
+<div id="rfc.section.4.3.p.4">
<p>The naming convention for sanitized log files is:</p>
<ul class="empty">
<li>
@@ -262,7 +272,7 @@ based on physical host, virtual host, and request start date.</p>
<p>The underscore is a separator symbol between the various parts
of the filename.</p>
</div>
-<div id="rfc.section.4.3.p.3">
+<div id="rfc.section.4.3.p.5">
<p>Sanitized log files may additionally be sorted into directories
by virtual host and date as in:</p>
<ul class="empty">
@@ -273,17 +283,12 @@ by virtual host and date as in:</p>
'dist.torproject.org', are more familiar to the public and were
therefore chosen to be the first naming component.</p>
</div>
-<div id="rfc.section.4.3.p.4">
-<p>As last and certainly not least important sanitizing step, all
-rewritten log lines are sorted alphabetically, so that request
-order cannot be inferred from sanitized log files.</p>
-</div>
-<div id="rfc.section.4.3.p.5">
+<div id="rfc.section.4.3.p.6">
<p>Sanitized log files are typically compressed before publication.
-In particular the sorting step allows for highly efficient
-compression rates. We typically use XZ for compression, which is
-indicated by appending ".xz" to log file names, but this is subject
-to change.</p>
+The sorting step also allows for highly efficient compression
+rates. We typically use XZ for compression, which is indicated by
+appending ".xz" to log file names, but this is subject to
+change.</p>
</div>
</section>
</div> <!-- container -->
_______________________________________________
tor-commits mailing list
tor-commits@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-commits