[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Sanitizing and publishing our web server logs



What exactly are we hoping to gain from the analysis of the (hopefully correctly) stripped logs?

On 09/02/2011 09:06 AM, Sebastian Hahn wrote:
On Sep 2, 2011, at 2:46 PM, Karsten Loesing wrote:

Hi Andrew,

On 9/2/11 2:18 AM, Andrew Lewman wrote:
On Thursday, August 25, 2011 04:08:00 Karsten Loesing wrote:
we have been discussing sanitizing and publishing our web server logs
for quite a while now.  The idea is to remove all potentially sensitive
parts from the logs, publish them in monthly tarballs on the metrics
website, and analyze them for top visited pages, top downloaded
packages, etc.  See the tickets #1641 and #2489 for details.
My concern is that we have the data at all.  We shouldn't have any
sensitive information logged on the webservers. Therefore sanitizing the
logs should not be necessary.
My concern is that we remove details from the logs and learn in a few
months that we wanted to analyze them.  I'd like to sanitize the
existing logs first, make them available for people to analyze, and only
change the Apache configuration once we're really sure we found the
level of detail that we want.  There's no rush in changing the Apache
configuration now, right?
So, if we decide in a few months that we need more detail, we can
change the logging then. Sure, we won't have history, but that just
means that the graphs we make start in 2012 instead of 2007.

Finally, we'll have to find a way to encode the country code in the logs
and still keep Apache's Combined Log Format.  And do we still care about
the HTTP vs. HTTPS bit?  Because if we use the IP column for the country
code, we'll have to encode the HTTP/HTTPS thing somewhere else.
IP addresses have plenty of bits for a country code and http/https
encoding, we could for example use the first bytes for country code.

So, it should be possible to implement GeoIP lookups in the future.  I'd
like to consider that a separate task from sanitizing the existing web
logs, though.
It's separate, but without the on-the-fly geoip lookups we won't have
any, because the sanitizing process doesn't get them magically.

All the best
Sebastian
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev