> On 18 Jul 2017, at 04:05, Karsten Loesing <karsten@xxxxxxxxxxxxxx> wrote: > > Hello list, > > it's been almost two years since we started collecting sanitized Apache > web server logs. During this time the number of Tor Browser initial > downloads rarely went below 70,000 per day. > > https://metrics.torproject.org/webstats-tb.html > > Either there must be a steady demand for fresh binaries, or there is a > non-zero number of bots downloading the Tor Browser binary several times > per day. > > I already double-checked our aggregation code that takes sanitized web > server logs as input and produces daily totals as output. It looks okay > to me. > > I'd also like to double-check whether there's anything unexpected > happening before the sanitizing step. For example, could it be that > there are a few IP addresses making hundreds or thousands of requests? > > Or are there lots of requests with same referrers or common user agents > indicating bots? > > My plan is to ask our admins to temporarily add a second Apache log file > on one of the dist.torproject.org hosts with the default Apache log file > format without the sanitizing that is usually applied. > > A snapshot of 15 or 30 minutes would likely be sufficient as sample. I'd > analyze this log file on the server, delete it, and report my findings here. > > This message has two purposes: > > 1. Is this approach acceptable? If not, are there more acceptable > approaches yielding similar results? Can you get similar results with a default apache log file, with the following changes: * remove timestamps * sort lines to destroy the original order Without precise timing information, the data would be a lot less sensitive. It might also be useful to know the distribution of requests over a 24 hour period, without any other details. This might help you work out how the activity is being triggered. > 2. Are there any theories what might keep the numbers from dropping > below those 70,000 requests per day? What should I be looking for? There are 86,400 seconds in a day, which means that we're getting about 1 request per second. This could be a single bot caught in a loop. Are you only counting GET requests? Do you count incomplete downloads? (A continually failing automated download process could cause this.) T -- Tim Wilson-Brown (teor) teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n xmpp: teor at torproject dot org ------------------------------------------------------------------------
Attachment:
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ tor-project mailing list tor-project@xxxxxxxxxxxxxxxxxxxx https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project