[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[or-cvs] r20734: {projects} Add a Python version of the directory archive parsing script (projects/archives/trunk/exonerator)
Author: kloesing
Date: 2009-10-03 11:40:31 -0400 (Sat, 03 Oct 2009)
New Revision: 20734
Added:
projects/archives/trunk/exonerator/exonerator.py
Modified:
projects/archives/trunk/exonerator/ExoneraTor.java
projects/archives/trunk/exonerator/HOWTO
Log:
Add a Python version of the directory archive parsing script.
Modified: projects/archives/trunk/exonerator/ExoneraTor.java
===================================================================
--- projects/archives/trunk/exonerator/ExoneraTor.java 2009-10-02 15:57:37 UTC (rev 20733)
+++ projects/archives/trunk/exonerator/ExoneraTor.java 2009-10-03 15:40:31 UTC (rev 20734)
@@ -377,16 +377,16 @@
if (inTooOldConsensuses && !inTooNewConsensuses)
System.out.println("\nNote that we found a matching relay in "
+ "consensuses that were published between 5:00 and 3:00 "
- + "hours before " + timestampStr + ". ");
+ + "hours before " + timestampStr + ".");
else if (!inTooOldConsensuses && inTooNewConsensuses)
System.out.println("\nNote that we found a matching relay in "
+ "consensuses that were published up to 2:00 hours after "
- + timestampStr + ". ");
+ + timestampStr + ".");
else
System.out.println("\nNote that we found a matching relay in "
+ "consensuses that were published between 5:00 and 3:00 "
+ "hours before and in consensuses that were published up "
- + "to 2:00 hours after " + timestampStr + ". ");
+ + "to 2:00 hours after " + timestampStr + ".");
System.out.println("Make sure that the timestamp you provided is "
+ "in the correct timezone: UTC (or GMT).");
}
Modified: projects/archives/trunk/exonerator/HOWTO
===================================================================
--- projects/archives/trunk/exonerator/HOWTO 2009-10-02 15:57:37 UTC (rev 20733)
+++ projects/archives/trunk/exonerator/HOWTO 2009-10-03 15:40:31 UTC (rev 20734)
@@ -24,14 +24,51 @@
prints out all intermediate steps in answering this, so that users can
confirm the correctness of the result themselves.
+This script is available in two versions written in Python and in Java with
+equivalent functionality.
+
---------------------------------------------------------------------------
-Quick Start:
+Python Quick Start:
-In order to run this script, you need to install and download the following
-software and data (please note that all instructions are written for Linux;
-commands for Windows or Mac OS X may vary):
+In order to run the Python version of this script, you need to install and
+download the following software and data (please note that all instructions
+are written for Linux; commands for Windows or Mac OS X may vary):
+- Install Python 2.6.2 or higher. (Previous Python versions might work,
+ too, but have not been tested.)
+
+- Copy the consensuses-* and server-descriptors-* files of the relevant
+ time from http://archive.torproject.org/tor-directory-authority-archive/
+ and extract them to a directory in your working directory, e.g.
+ /home/you/exonerator/data/ . Don't rename the extracted directories or
+ any of the contained files, or the script won't find the contained
+ descriptors.
+
+- Run the script, providing it with the parameters it needs:
+
+ python exonerator.py <descriptor archive directory>
+ <IP address in question>
+ <timestamp, in UTC, formatted as YYYY-MM-DD hh:mm:ss>
+ [<target address>[:<target port>]]
+
+ Make sure that the timestamp is provided in UTC, which is similar to GMT,
+ and not in your local timezone! Otherwise, results will very likely be
+ wrong.
+
+ A sample invocation might be:
+
+ $ python exonerator.py data/ 209.17.171.104 2009-08-15 16:05:00 \
+ 209.85.129.104:80
+
+---------------------------------------------------------------------------
+
+Java Quick Start:
+
+In order to run the Java version of this script, you need to install and
+download the following software and data (please note that all instructions
+are written for Linux; commands for Windows or Mac OS X may vary):
+
- Install Java 6 or higher.
- Download the BouncyCastle provider that includes Base 64 decoding from
@@ -80,26 +117,34 @@
- Positive result of echelon1+2 being a relay:
+ $ python exonerator.py data/ 209.17.171.104 2009-08-15 16:05:00
$ java -cp .:bcprov-jdk16-143.jar ExoneraTor data/ 209.17.171.104 \
2009-08-15 16:05:00
- Positive result of echelon1+2 exiting to google.com on any port
+ $ python exonerator.py data/ 209.17.171.104 2009-08-15 16:05:00 \
+ 209.85.129.104
$ java -cp .:bcprov-jdk16-143.jar ExoneraTor data/ 209.17.171.104 \
2009-08-15 16:05:00 209.85.129.104
- Positive result of echelon1+2 exiting to google.com on port 80
+ $ python exonerator.py data/ 209.17.171.104 2009-08-15 16:05:00 \
+ 209.85.129.104:80
$ java -cp .:bcprov-jdk16-143.jar ExoneraTor data/ 209.17.171.104 \
2009-08-15 16:05:00 209.85.129.104:80
- Negative result of echelon1+2 exiting to google.com, but not on port 25
+ $ python exonerator.py data/ 209.17.171.104 2009-08-15 16:05:00 \
+ 209.85.129.104:25
$ java -cp .:bcprov-jdk16-143.jar ExoneraTor data/ 209.17.171.104 \
2009-08-15 16:05:00 209.85.129.104:25
- Negative result with IP address of echelon1+2 changed in the last octet
+ $ python exonerator.py data/ 209.17.171.50 2009-08-15 16:05:00
$ java -cp .:bcprov-jdk16-143.jar ExoneraTor data/ 209.17.171.50 \
2009-08-15 16:05:00
Added: projects/archives/trunk/exonerator/exonerator.py
===================================================================
--- projects/archives/trunk/exonerator/exonerator.py (rev 0)
+++ projects/archives/trunk/exonerator/exonerator.py 2009-10-03 15:40:31 UTC (rev 20734)
@@ -0,0 +1,369 @@
+#!/usr/bin/env python
+# Copyright 2009 The Tor Project -- see LICENSE for licensing information
+
+import binascii
+import os
+import sys
+import time
+
+# check parameters
+if len(sys.argv) not in (5, 6):
+ print "\nUsage: python exonerator.py <descriptor archive directory> " \
+ "<IP address in question> <timestamp, in UTC, formatted as " \
+ "YYYY-MM-DD hh:mm:ss> [<target address>[:<target port>]]\n"
+ sys.exit()
+archiveDirectory = sys.argv[1]
+if not os.path.isdir(archiveDirectory):
+ print "\nDescriptor archive directory %s does not exist or is not a " \
+ "directory.\n" % os.path.abspath(archiveDirectory)
+ sys.exit()
+archiveDirectory = os.path.dirname(archiveDirectory)
+relayIP = sys.argv[2]
+timestampStr = "%s %s" % (sys.argv[3], sys.argv[4])
+os.environ['TZ'] = 'UTC'
+time.tzset()
+timestamp = time.strptime(timestampStr, "%Y-%m-%d %H:%M:%S")
+# if a target is given, parse address and possibly port part of it
+target = None
+targetIP = None
+targetPort = None
+if len(sys.argv) == 6:
+ target = sys.argv[5]
+ targetParts = target.split(":")
+ targetIP = targetParts[0]
+ if len(targetParts) == 2:
+ targetPort = targetParts[1]
+ targetIPParts = targetIP.split(".")
+DELIMITER = "-----------------------------------------------------------" \
+ "----------------"
+targetHelpStr = ""
+if target:
+ targetHelpStr = " permitting exiting to %s" % target
+print "\nTrying to find out whether %s was running a Tor relay at " \
+ "%s%s...\n\n%s\n" % (relayIP, timestampStr, targetHelpStr, DELIMITER)
+
+# check that we have the required archives
+timestampTooOld = time.gmtime(time.mktime(timestamp) - 300 * 60)
+timestampFrom = time.gmtime(time.mktime(timestamp) - 180 * 60)
+timestampTooNew = time.gmtime(time.mktime(timestamp) + 120 * 60)
+timestampTooOldStr = time.strftime("%Y-%m-%d %H:%M:%S", timestampTooOld)
+timestampFromStr = time.strftime("%Y-%m-%d %H:%M:%S", timestampFrom)
+timestampTooNewStr = time.strftime("%Y-%m-%d %H:%M:%S", timestampTooNew)
+print "\nChecking that relevant archives between %s and %s are " \
+ "available..." % (timestampTooOldStr, timestampTooNewStr)
+
+requiredDirs = set()
+requiredDirs.add(time.strftime("consensuses-%Y-%m", timestampTooOld))
+requiredDirs.add(time.strftime("consensuses-%Y-%m", timestampTooNew))
+if target is not None:
+ requiredDirs.add(time.strftime("server-descriptors-%Y-%m",
+ timestampTooOld))
+ requiredDirs.add(time.strftime("server-descriptors-%Y-%m",
+ timestampTooNew))
+
+consensusDirs = list()
+descriptorsDirs = list()
+directoriesLeftToParse = list()
+directoriesLeftToParse.append(archiveDirectory)
+
+while len(directoriesLeftToParse) > 0:
+ directoryOrFile = directoriesLeftToParse.pop()
+ basename = os.path.basename(directoryOrFile)
+ if basename.startswith("consensuses-"):
+ if basename in requiredDirs:
+ requiredDirs.remove(basename)
+ consensusDirs.append(directoryOrFile)
+ elif basename.startswith("server-descriptors-"):
+ if basename in requiredDirs:
+ requiredDirs.remove(basename)
+ descriptorsDirs.append(directoryOrFile)
+ else:
+ for filename in os.listdir(directoryOrFile):
+ entry = "%s/%s" % (directoryOrFile, filename)
+ if os.path.isdir(entry):
+ directoriesLeftToParse.append(entry)
+
+consensusDirs.sort()
+for file in consensusDirs:
+ print " %s" % file
+descriptorsDirs.sort()
+for file in descriptorsDirs:
+ print " %s" % file
+
+if len(requiredDirs) > 0:
+ print "\nWe are missing consensuses and/or server descriptors. " \
+ "Please download these archives and extract them to your data " \
+ "directory. Be sure NOT to rename the extracted directories " \
+ "or the contained files."
+ missingFiles = list()
+ for file in sorted(requiredDirs):
+ print " %s.tar.bz2" % file
+ sys.exit()
+
+# look for consensus files
+print "\nLooking for relevant consensuses between %s and %s..." % \
+ (timestampFromStr, timestampStr)
+tooOldConsensuses = set()
+relevantConsensuses = set()
+tooNewConsensuses = set()
+directoriesLeftToParse = list()
+for file in consensusDirs:
+ directoriesLeftToParse.append(file)
+while len(directoriesLeftToParse) > 0:
+ directoryOrFile = directoriesLeftToParse.pop()
+ if os.path.isdir(directoryOrFile):
+ for filename in os.listdir(directoryOrFile):
+ entry = "%s/%s" % (directoryOrFile, filename)
+ directoriesLeftToParse.append(entry)
+ else:
+ basename = os.path.basename(directoryOrFile)
+ if (basename.endswith("consensus")):
+ consensusTime = time.strptime(basename[0:19],
+ "%Y-%m-%d-%H:%M:%S")
+ if consensusTime >= timestampTooOld and \
+ consensusTime < timestampFrom:
+ tooOldConsensuses.add(directoryOrFile)
+ elif consensusTime >= timestampFrom and \
+ consensusTime <= timestamp:
+ relevantConsensuses.add(directoryOrFile)
+ elif consensusTime > timestamp and \
+ consensusTime <= timestampTooNew:
+ tooNewConsensuses.add(directoryOrFile)
+allConsensuses = set()
+for file in tooOldConsensuses:
+ allConsensuses.add(file)
+for file in relevantConsensuses:
+ allConsensuses.add(file)
+for file in tooNewConsensuses:
+ allConsensuses.add(file)
+if len(allConsensuses) == 0:
+ print " None found!\n\n%s\n\nResult is INDECISIVE!\n\nWe cannot " \
+ "make any statement about IP address %s being a relay at %s " \
+ "or not! We did not find any relevant consensuses preceding " \
+ "the given time. This either means that you did not download " \
+ "and extract the consensus archives preceding the hours " \
+ "before the given time, or (in rare cases) that the directory " \
+ "archives are missing the hours before the timestamp. Please " \
+ "check that your directory archives contain consensus files " \
+ "of the interval 5:00 hours before and 2:00 hours after the " \
+ "time you are looking for.\n" % \
+ (DELIMITER, relayIP, timestampStr)
+ sys.exit()
+for file in sorted(relevantConsensuses):
+ print " %s" % file
+
+# parse consensuses to find descriptors belonging to the IP address
+print "\nLooking for descriptor identifiers referenced in \"r \" lines " \
+ "in these consensuses containing IP address %s..." % relayIP
+positiveConsensusesNoTarget = set()
+addressesInSameNetwork = set()
+relevantDescriptors = dict()
+for consensus in allConsensuses:
+ if consensus in relevantConsensuses:
+ print " %s" % consensus
+ file = open(consensus, "r")
+ line = file.readline()
+ while line:
+ if line.startswith("r "):
+ address = line.split(" ")[6]
+ if address == relayIP:
+ hexDesc = binascii.b2a_hex(binascii.a2b_base64(
+ line.split(" ")[3] + "=="))
+ if hexDesc not in relevantDescriptors.keys():
+ relevantDescriptors[hexDesc] = set()
+ relevantDescriptors[hexDesc].add(consensus)
+ positiveConsensusesNoTarget.add(consensus)
+ if consensus in relevantConsensuses:
+ print " \"%s\" references descriptor %s" % \
+ (line.rstrip(), hexDesc)
+ elif relayIP.startswith(address[0:address.rfind(".")]):
+ addressesInSameNetwork.add(address)
+ line = file.readline()
+ file.close()
+if len(relevantDescriptors) == 0:
+ print " None found!\n\n%s\n\nResult is NEGATIVE with moderate " \
+ "certainty!\n\nWe did not find IP address %s in any of the " \
+ "consensuses that were published between %s and %s.\n\nA " \
+ "possible reason for false negatives is that the relay is " \
+ "using a different IP address when generating a descriptor " \
+ "than for exiting to the Internet. We hope to provide better " \
+ "checks for this case in the future." % \
+ (DELIMITER, relayIP, timestampTooOldStr, timestampTooNewStr)
+ if len(addressesInSameNetwork) > 0:
+ print "\nThe following other IP addresses of Tor relays were " \
+ "found in the mentioned consensus files that are in the " \
+ "same /24 network and that could be related to IP address " \
+ "%s:" % relayIP
+ for addr in addressesInSameNetwork:
+ print " %s" % addr
+ print ""
+ sys.exit()
+
+# parse router descriptors to check exit policies
+positiveConsensuses = set()
+missingDescriptors = set()
+if target is not None:
+ print "\nChecking if referenced descriptors permit exiting to " \
+ "%s..." % target
+ descriptors = relevantDescriptors.keys()
+ for desc in descriptors:
+ missingDescriptors.add(desc)
+ directoriesLeftToParse = list()
+ for descriptorsDir in descriptorsDirs:
+ directoriesLeftToParse.append(descriptorsDir)
+ while len (directoriesLeftToParse) > 0:
+ directoryOrFile = directoriesLeftToParse.pop()
+ if os.path.isdir(directoryOrFile):
+ for filename in os.listdir(directoryOrFile):
+ entry = "%s/%s" % (directoryOrFile, filename)
+ directoriesLeftToParse.append(entry)
+ else:
+ basename = os.path.basename(directoryOrFile)
+ for descriptor in descriptors:
+ if basename == descriptor:
+ missingDescriptors.remove(descriptor)
+ file = open(directoryOrFile, "r")
+ line = file.readline()
+ while line:
+ if line.startswith("reject ") or \
+ line.startswith("accept "):
+ ruleAccept = line.split()[0] == "accept"
+ ruleAddress = line.split()[1].split(":")[0]
+ if ruleAddress != "*":
+ if '/' not in ruleAddress and \
+ ruleAddress != targetIP:
+ # IP address does not match
+ line = file.readline()
+ continue
+ ruleIPParts = ruleAddress.split("/")[0]. \
+ split(".")
+ ruleNetwork = int(ruleAddress. \
+ split("/")[1])
+ for i in range(0, 4):
+ if ruleNetwork == 0:
+ break
+ elif ruleNetwork >= 8:
+ if ruleIPParts[i] == \
+ targetIPParts[i]:
+ ruleNetwork -= 8
+ else:
+ break
+ else:
+ mask = 255 ^ 255 >> ruleNetwork
+ if int(ruleIPParts[i]) & mask == \
+ int(targetIPParts[i]) & mask:
+ ruleNetwork = 0
+ break
+ if ruleNetwork > 0:
+ # IP address does not match
+ line = file.readline()
+ continue
+ rulePort = line.split()[1].split(":")[1]
+ if targetPort is None and not ruleAccept and \
+ rulePort != "*":
+ # with no port given, we only consider
+ # reject :* rules as matching
+ line = file.readline()
+ continue
+ if targetPort and rulePort != "*" and \
+ targetPort != rulePort:
+ # ports do not match
+ line = file.readline()
+ continue
+ relevantMatch = False
+ for f in relevantDescriptors.get(descriptor):
+ if f in relevantConsensuses:
+ relevantMatch = True
+ if relevantMatch:
+ if ruleAccept:
+ print " %s permits exiting to %s " \
+ "according to rule \"%s\"" % \
+ (directoryOrFile, target,
+ line.rstrip())
+ else:
+ print " %s does not permit exiting " \
+ "to %s according to rule " \
+ "\"%s\"" % (directoryOrFile,
+ target, line.rstrip())
+ if ruleAccept:
+ for consensus in \
+ relevantDescriptors.get(descriptor):
+ positiveConsensuses.add(consensus)
+ break;
+ line = file.readline()
+ file.close()
+
+# print out result
+matches = None
+if target:
+ matches = positiveConsensuses
+else:
+ matches = positiveConsensusesNoTarget
+lastConsensus = sorted(relevantConsensuses)[len(relevantConsensuses) - 1]
+if lastConsensus in matches:
+ print "\n%s\n\nResult is POSITIVE with high certainty!\n\nWe found " \
+ "one or more relays on IP address %s%s in the most recent " \
+ "consensus preceding %s that clients were likely to know.\n" % \
+ (DELIMITER, relayIP, targetHelpStr, timestampStr)
+ sys.exit()
+resultIndecisive = target and len(missingDescriptors) > 0
+if resultIndecisive:
+ print "\n%s\n\nResult is INDECISIVE!\n\nAt least one referenced " \
+ "descriptor could not be found. This is a rare case, but one " \
+ "that (apparently) happens. We cannot make any good statement " \
+ "about exit relays without these descriptors. The following " \
+ "descriptors are missing:" % DELIMITER
+ for desc in missingDescriptors:
+ print " %s" % desc
+inOtherRelevantConsensus = False
+inTooOldConsensuses = False
+inTooNewConsensuses = False
+for f in matches:
+ if f in relevantConsensuses:
+ inOtherRelevantConsensus = True
+ elif f in tooOldConsensuses:
+ inTooOldConsensuses = True
+ elif f in tooNewConsensuses:
+ inTooNewConsensuses = True
+if inOtherRelevantConsensus:
+ if not resultIndecisive:
+ print "\n%s\n\nResult is POSITIVE with moderate certainty!" % \
+ DELIMITER
+ print "\nWe found one or more relays on IP address %s%s, but not in " \
+ "the consensus immediately preceding %s. A possible reason " \
+ "for the relay being missing in the last consensus preceding " \
+ "the given time might be that some of the directory " \
+ "authorities had difficulties connecting to the relay. " \
+ "However, clients might still have used the relay." % (relayIP,
+ targetHelpStr, timestampStr)
+else:
+ if not resultIndecisive:
+ print "\n%s\n\nResult is NEGATIVE with high certainty!" % \
+ DELIMITER
+ print "\nWe did not find any relay on IP address %s%s in the " \
+ "consensuses 3:00 hours preceding %s." % (relayIP, targetHelpStr,
+ timestampStr)
+ if inTooOldConsensuses or inTooNewConsensuses:
+ if inTooOldConsensuses and not inTooNewConsensuses:
+ print "\nNote that we found a matching relay in consensuses " \
+ "that were published between 5:00 and 3:00 hours " \
+ "before %s." % timestampStr
+ elif not inTooOldConsensuses and inTooNewConsensuses:
+ print "\nNote that we found a matching relay in consensuses " \
+ "that were published up to 2:00 hours after %s." % \
+ timestampStr
+ else:
+ print "\nNote that we found a matching relay in consensuses " \
+ "that were published between 5:00 and 3:00 hours " \
+ "before and in consensuses that were published up to " \
+ "2:00 hours after %s." % timestampStr
+ print "Make sure that the timestamp you provided is in the " \
+ "correct timezone: UTC (or GMT)."
+if target:
+ if len(positiveConsensuses) == 0 and \
+ len(positiveConsensusesNoTarget) > 0:
+ print "\nNote that although the found relay(s) did not permit " \
+ "exiting to %s there have been one or more relays running " \
+ "at the given time." % target
+print ""
+
Property changes on: projects/archives/trunk/exonerator/exonerator.py
___________________________________________________________________
Added: svn:executable
+ *