[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #23243 [Metrics/Website]: Write a specification for Tor web server logs
#23243: Write a specification for Tor web server logs
-----------------------------+--------------------------------
Reporter: iwakeh | Owner: metrics-team
Type: enhancement | Status: needs_revision
Priority: Medium | Milestone:
Component: Metrics/Website | Version:
Severity: Normal | Resolution:
Keywords: metrics-2017 | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-----------------------------+--------------------------------
Changes (by iwakeh):
* status: merge_ready => needs_revision
Comment:
The spec might need to be extended:
The implementation of the CollecTor webstats module triggered more
questions about the way original logs are supplied. One piece of
information that is so far only supplied indirectly is the cue for when a
log is finished. In detail:
* Functionality for bulk imports of log files is necessary. Thus, the
implementation cannot rely on the system date anymore to decide when a log
day is complete. (distinguishing between reference date as defined in the
spec and the 'log for a day' which means all log lines for a given date
are available).
* Implicit assumption: input log files can be empty or not contain any
valid lines as long as there naming pattern matches the rules.
* The current spec allows only for one input log per reference date (per
virtual plus physical host).
* Log lines for a particular log day could be spread over two successive
log files (as defined in the current spec).
* Implicit cue: all log lines are available for a certain reference date
when the log for the reference date and its successor are available. This
also means a log for a day without an immediate successor is not complete,
i.e. won't be processed. The cue in form of the successor could be given
as an empty successor log file. This cue has to be supplied from outside
and cannot be determined from the implementation.
Related is another question from #22428 comment:36
> Here's another, related question: what happens if a web server rotates
logs more often than once per day? At least that's something that we write
in the specification. I'm not sure how this would work with file names, so
maybe we in fact require that logs are rotated exactly once per day, and
we just didn't write that in the specification yet. However, it seems
rather restrictive to prescribe exact log rotation intervals in order to
sanitize logs subsequently. Maybe we should be less restrictive here.
It doesn't really matter, if the log lines for a certain day are spread
over two or more input files. Currently, only one input file per
reference date is possible (the first wins).
More input files could be supplied by extending the input log name pattern
with a dash followed by an integer, i.e., `scrubbed.torproject.org-
access.log-20171006-77.gz`. In such a case it should be required that
* counting starts with one (arbitrary).
* there are no gaps, i.e., if there is a file with 3, there have to be
files with 2 and 1 for the same virtual, physical host, and date
combination.
Again, a cue is needed for when the log day is complete. As above this
could be the input file for the immediate successor by reference date with
number 1. And, this cue could be an empty file.
Remarks:
The way the cue is given is arbitrary, but the current implementation
suggestion already works with the method described above.
The naming pattern is just an arbitrary suggestion. So improvements are
welcome.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23243#comment:45>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs