Flaw in bridge descriptor sanitizing process

Hi everyone,

back in Nov 2009, I posted our approach to sanitize bridge descriptors
before publishing them for everyone to analyze:


Now I found a flaw in this process that affects roughly 2.5 % of bridge
descriptors:  When a bridge is configured to use the default exit policy,
it adds a reject line containing its own IP address which we don't
sanitize.  Here's an example of a sanitized bridge descriptor.  Note that
I manually sanitized part of the bridge's IP address with '---' here!

  router Unnamed 9001 0 8030
  platform Tor (r6e5496a2407ee589) on FreeBSD i386
  opt protocols Link 1 2 Circuit 1
  published 2010-12-21 17:59:41
  opt fingerprint 16D6 49EA 83CF 23A1 BAD3 90DC 921E E83F 310A CC7F
  uptime 64
  bandwidth 51200 102400 0
  opt extra-info-digest F2BD0C7AB741648D9CFC982C790B9221B8C04516
  opt hidden-service-dir
  contact somebody
  reject 80.---.--.---:*       <--- bridge's IP address
  accept *:21
  accept *:23
  accept *:80
  accept *:110
  accept *:143
  accept *:443
  reject *:*

The fix is to replace the bridge's IP address in reject lines with, too.  See this commit for details:


I started sanitizing the bridge descriptors since May 2008 once again, but
this takes at least another week to finish.  Here are the newly sanitized
bridge descriptors:


Can other people look at the sanitized descriptors and try to find other
sensitive parts (IP addresses, fingerprints, contact information, etc.)
that we should remove?  It took me 1 year to find this flaw, and I only
found it by chance when refactoring the code.  More eyes needed!