[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-bugs] #5047 [Obfsproxy]: Implement basic usage statistics in obfsproxy



#5047: Implement basic usage statistics in obfsproxy
-------------------------+--------------------------------------------------
 Reporter:  karsten      |          Owner:  asn
     Type:  enhancement  |         Status:  new
 Priority:  normal       |      Milestone:     
Component:  Obfsproxy    |        Version:     
 Keywords:               |         Parent:     
   Points:               |   Actualpoints:     
-------------------------+--------------------------------------------------

Comment(by atagar):

 > def writestats(statsstart, ips, geoipfile, fingerprint):

 Personally I found this very difficult to read, and I suspect most of the
 work and temporary variables are unnecessary. Here's a completely untested
 function that, I think, will do the same thing...

 {{{
 def writestats(statsstart, ips, geoip_path, fingerprint):
   if not ips: return # nothing to write

   # matches against lines from the maxmind geoip db, such as...
   # 18939904,19005439,JP

   geoip_line = re.compile('^([\d]*),([\d]*),([A-Z0-9]{2})$')

   # mapping of country codes to their ips
   locale_to_ip = {}

   # Ips in the geoip file are sorted, so sorting our listing so we can
 just
   # compare to the first element.
   ips.sort()

   geoip_processing: # used for targeted breaks, double check syntax (I
 don't use this much...)
   with open(geoip_path) as geoip_file:
     for line in geoip_file:
       m = geoip_line.match(line)

       # skip line if it's not an entry (probably a blank line or comment)
       if not m: continue

       ip_start, ip_end, locale = m.groups()

       # It's possible that we come before the current entry, which means
 the
       # file doesn't contain our ip. Pop off ips until that's no longer
 the
       # case.

       while ip_start > ips[0]:
         if not "??" in locale_to_ip:
           locale_to_ip["??"] = []

         locale_to_ip["??"].append(ips.pop(0))
         if not ips: break geoip_processing

       if ip_end < ips[0]:
         continue # nope, our entry is later
       elif ip_start < ips[0] and ip_end > ips[0]:
         # entry matches
         if not locale in locale_to_ip:
           locale_to_ip[locale] = []

         locale_to_ip[locale].append(ips.pop(0))
         if not ips: break geoip_processing

   # any remaining ips after processing the file are unknown
   while ips:
     if not "??" in locale_to_ip:
       locale_to_ip["??"] = []

     locale_to_ip["??"].append(ips.pop(0))

   # now move on with printing stuff, probably something like...
   for locale in locale_to_ip:
     connection_count = len(locale_to_ip[locale])
     unique_connection_count = len(set(locale_to_ip[locale]))
 }}}

 Besides that looks good, just some nitpicks...

 > starting = re.compile('^([\\d\-:TZ]* )?\[info\] Starting.$')
 > exiting = re.compile('^([\\d\-:TZ]* )?\[info\] Exiting.$')
 > connect = re.compile('^([\\d\-:TZ]* )?\[[a-z]*\] ([\\d\\.]+):[\\d]+ ' +
 >                      '\([a-z0-9]+\): trying to connect to [\\d\\.:]+$')

 Move the starting/ending/connect regexes into header constants? When
 regexes get substantial like this it's also nice to have a comment with
 examples for what they match against (greatly helps readability).

 > for idx, ree in enumerate(re_list):

 Neat, the enumerate builtin is new to me. In writestats you use it more
 than you need to (since you never use the 'idx'), and personally I'd do
 away with re_list here so it just uses a tuple inline. But es no importa.

 > for line in open(obfsproxylog, 'r'):

 You actually probably want...

 {{{
 with open(obfsproxylog) as obsfproxylog_file:
   for line in obsfproxylog_file:
     ...
 }}}

 This will close the file when you're done. Also, 'r' is the default mode
 for the open method.

 > if statsstart > 0:

 If you initialize statsstart to sys.maxint then this isn't needed.

 > del ips[:]

 Why not just do "ips = []"? In either case you're only removing
 references.

 > ipn = int("%02x%02x%02x%02x" %
 >       (int(st[0]), int(st[1]), int(st[2]), int(st[3])), 16)

 Looks like the map function could help you here.

 {{{
 >>> st = ["1", "2", "3", "4"]
 >>> map(int, st)
 [1, 2, 3, 4]
 }}}

 Cheers! -Damian

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5047#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs