[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #22428 [Metrics/CollecTor]: Add webstats module
#22428: Add webstats module
-------------------------------+---------------------------------
Reporter: iwakeh | Owner: iwakeh
Type: enhancement | Status: needs_revision
Priority: High | Milestone: CollecTor 1.5.0
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: metrics-2017 | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------+---------------------------------
Changes (by karsten):
* status: needs_review => needs_revision
Comment:
Alright, I finished an initial review of commit 086e904 in your
task-22428-4 branch. I have several trivial or minor findings, but I'd
like to postpone them until we have resolved one that I consider major:
I'm unclear whether the sibling approach is robust enough to cover all
cases and edge cases. Maybe even worse, I'm unclear whether we'd notice if
we'd be running into an uncovered edge case or if we'd silently not
process and therefore lose data.
For example, what happens if we sanitize logs from a server that receives
''very'' few requests, maybe only a few requests per week? Consider these
original log files (where I scrubbed the virtual host name):
- `scrubbed.torproject.org-access.log-20171001.gz` contains requests from
2017-09-30 and 2017-10-01.
- `scrubbed.torproject.org-access.log-20171002.gz` contains requests from
2017-10-01 only.
- `scrubbed.torproject.org-access.log-20171004.gz` contains requests from
2017-10-03 only.
- `scrubbed.torproject.org-access.log-20171006.gz` contains requests from
2017-10-05 and 2017-10-06.
Would the existing code produce logs for 2017-10-01, -03, -05, and -06
with exactly the sanitized log lines from these original log files? (I
didn't run it, I only read the code and am unclear about this.)
Here's another, related question: what happens if a web server rotates
logs more often than once per day? At least that's something that we write
in the specification. I'm not sure how this would work with file names, so
maybe we in fact require that logs are rotated exactly once per day, and
we just didn't write that in the specification yet. However, it seems
rather restrictive to prescribe exact log rotation intervals in order to
sanitize logs subsequently. Maybe we should be less restrictive here.
Is there a way to make this approach more robust? And is there a way to
ensure that we'll learn about any broken assumptions as early as possible?
Ah, and do you mind doing another round of JavaDoc editing and variable
renaming towards finding a middle ground between 2-characters-is-almost-
verbose and 80-characters-can-fit-in-a-line-so-let-us-not-use-more-
than-79? As a fixup/squash commit without rebasing, please. :) Thank you!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22428#comment:36>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs