[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[or-cvs] r9884: Cobbled together a TODO file from sleepless ramblings. Made (in torflow/trunk: . TorCtl)



Author: mikeperry
Date: 2007-03-20 00:48:10 -0400 (Tue, 20 Mar 2007)
New Revision: 9884

Added:
   torflow/trunk/TODO
Modified:
   torflow/trunk/TorCtl/PathSupport.py
   torflow/trunk/metatroller.py
Log:
Cobbled together a TODO file from sleepless ramblings. Made it coherent (I
hope).
 
Fixed a bug using OrderedExitGenerator. Cleared up a pending circuit issue
with newnym. Also made stream+circ id members more uniform.



Added: torflow/trunk/TODO
===================================================================
--- torflow/trunk/TODO	2007-03-20 03:26:51 UTC (rev 9883)
+++ torflow/trunk/TODO	2007-03-20 04:48:10 UTC (rev 9884)
@@ -0,0 +1,73 @@
+- Add an ORCONN_BW event to Tor to emit read/write info and also queue sizes
+  - See tordiffs/orconn-bw.diff but it probably should be a separate event,
+    not hacked onto ORCONN
+  - Use nodemon.py to rank nodes based on total bytes, queue sizes, and the 
+    ratio of these two
+    - Does it agree with results from metatroller's bandwidth stats?
+
+- More NodeRestrictions/PathRestrictions in TorCtl/PathSupport.py
+  - BwWeightedGenerator
+  - NodeRestrictions:
+    - Uptime/LongLivedPorts (Does/should hibernation count?)
+    - Published/Updated
+    - GeoIP (http://www.maxmind.com/app/python)
+      - NodeCountry
+  - PathRestrictions:
+    - Family
+    - GeoIP (http://www.maxmind.com/app/python)
+      - OceanPhobicRestrictor (avoids Pacific Ocean or two atlantic crossings)
+        or ContinentRestrictor (avoids doing more than N continent crossings)
+      - EchelonPhobicRestrictor
+        - Does not cross international boundaries for client->Entry or
+          Exit->destination hops
+  - Perform statistical analysis on paths
+    - How often does Tor choose foolish paths normally? 
+      - (4 atlantic/pacific crossings)
+    - What is the distribution for Pr(ClientLocation|MiddleNode,ExitNode)
+      and Pr(EntryNode|MiddleNode,ExitNode) for these various path choices?
+      - Mathematical analysis probably required because this is a large joint
+        distribution (not GSoC)
+      - Empirical observation possible if you limit to the top 10% of the
+        nodes (which carry something like 90% of bandwidth anyways). 
+        - Make few million paths without actually building real 
+          circuits and tally them up in a 3D table
+        - See PathSupport.py unit tests for some examples on this
+  - See also:
+    http://swiki.cc.gatech.edu:8080/ugResearch/uploads/7/ImprovingTor.pdf
+    - You can also perform predecessor observation of this strategy
+      empirically. But it is likely the GeoIP stuff is easier to implement 
+      and just as effective.
+
+- Create a PathWatcher that StatsHandler can extend from so people can gather
+  stats from regular Tor usage
+
+- Use GeoIP to make a map of tor servers color coded by their reliability
+  - Or augment an existing Tor map project with this data
+
+- Add circuit prebuilding and port history learning for keeping an optimal
+  pool of circuits available for use
+  - Build circuits in parallel to speed up scanning
+
+- Rewrite soat.pl in python/C++ and leverage an html parser to extract
+  object/script tags to make a fingerprint of a dynamic page. 
+   - Scan for changes to this fingerprint and also to any original embedded
+     objects
+   - Make a multilingual keyword list of commonly censored terms to google for
+     using this scanner
+   - Improve checking of changes to documents outside of Tor
+   - Improve SSL handling/verification. openssl client is broken.
+   - Parallelize scanning
+     - Improve interaction between soat+metatroller so soat knows
+       which exit was responsible for a given ip/url
+
+- Design Reputation System (not for GSoC)
+  - Emit some kind of penalty multiplier based on circuit/stream failure rate
+    and the ratio of directory "observed" bandwidth vs avg stream bandwidth
+    - Add keyword to directory for clients to use instead of observed
+      bandwidth for routing decisions
+      - Make sure scanners don't listen to this keyword to avoid 
+        "Creeping Death" 
+    - Queue lengths from the node monitor can also figure into this penalty
+      multiplier
+  - Figure out interface to report this and also BadExit determinations
+    - Probably involves voting among many scanners

Modified: torflow/trunk/TorCtl/PathSupport.py
===================================================================
--- torflow/trunk/TorCtl/PathSupport.py	2007-03-20 03:26:51 UTC (rev 9883)
+++ torflow/trunk/TorCtl/PathSupport.py	2007-03-20 04:48:10 UTC (rev 9884)
@@ -87,6 +87,9 @@
     self.sorted_r = sorted_r
     self.rewind()
 
+  def reset_restriction(self, rstr_list):
+    self.rstr_list = rstr_list
+
   def rewind(self):
     self.routers = copy.copy(self.sorted_r)
 
@@ -104,14 +107,14 @@
     if pathlen == 1:
       circ.exit = path_sel.exit_chooser(circ.path)
       circ.path = [circ.exit]
-      circ.cid = self.extend_circuit(0, circ.id_path())
+      circ.circ_id = self.extend_circuit(0, circ.id_path())
     else:
       circ.path.append(path_sel.entry_chooser(circ.path))
       for i in xrange(1, pathlen-1):
         circ.path.append(path_sel.middle_chooser(circ.path))
       circ.exit = path_sel.exit_chooser(circ.path)
       circ.path.append(circ.exit)
-      circ.cid = self.extend_circuit(0, circ.id_path())
+      circ.circ_id = self.extend_circuit(0, circ.id_path())
     return circ
 
 ######################## Node Restrictions ########################
@@ -436,6 +439,7 @@
     if self.order_exits:
       if self.__ordered_exit_gen:
         exitgen = self.__ordered_exit_gen
+        exitgen.reset_restriction(self.exit_rstr)
       else:
         exitgen = self.__ordered_exit_gen = \
           OrderedExitGenerator(80, sorted_r, self.exit_rstr)
@@ -457,7 +461,7 @@
 
 class Circuit:
   def __init__(self):
-    self.cid = 0
+    self.circ_id = 0
     self.path = [] # routers
     self.exit = None
     self.built = False
@@ -470,7 +474,7 @@
 
 class Stream:
   def __init__(self, sid, host, port, kind):
-    self.sid = sid
+    self.strm_id = sid
     self.detached_from = [] # circ id #'s
     self.pending_circ = None
     self.circ = None
@@ -603,19 +607,21 @@
       self.new_nym = False
       plog("DEBUG", "Obeying new nym")
       for key in self.circuits.keys():
-        if len(self.circuits[key].pending_streams):
+        if (not self.circuits[key].dirty
+            and len(self.circuits[key].pending_streams)):
           plog("WARN", "New nym called, destroying circuit "+str(key)
              +" with "+str(len(self.circuits[key].pending_streams))
              +" pending streams")
           unattached_streams.extend(self.circuits[key].pending_streams)
+          self.circuits[key].pending_streams.clear()
         # FIXME: Consider actually closing circ if no streams.
         self.circuits[key].dirty = True
       
     for circ in self.circuits.itervalues():
-      if circ.built and not circ.dirty and circ.cid not in badcircs:
+      if circ.built and not circ.dirty and circ.circ_id not in badcircs:
         if circ.exit.will_exit_to(stream.host, stream.port):
           try:
-            self.c.attach_stream(stream.sid, circ.cid)
+            self.c.attach_stream(stream.strm_id, circ.circ_id)
             stream.pending_circ = circ # Only one possible here
             circ.pending_streams.append(stream)
           except TorCtl.ErrorReply, e:
@@ -639,10 +645,10 @@
           plog("NOTICE", "Error building circ: "+str(e.args))
       for u in unattached_streams:
         plog("DEBUG",
-           "Attaching "+str(u.sid)+" pending build of "+str(circ.cid))
+           "Attaching "+str(u.strm_id)+" pending build of "+str(circ.circ_id))
         u.pending_circ = circ
       circ.pending_streams.extend(unattached_streams)
-      self.circuits[circ.cid] = circ
+      self.circuits[circ.circ_id] = circ
     self.last_exit = circ.exit
 
   def circ_status_event(self, c):
@@ -658,16 +664,17 @@
     if c.status == "EXTENDED":
       self.circuits[c.circ_id].last_extended_at = c.arrived_at
     elif c.status == "FAILED" or c.status == "CLOSED":
+      # XXX: Can still get a STREAM FAILED for this circ after this
       circ = self.circuits[c.circ_id]
       del self.circuits[c.circ_id]
       for stream in circ.pending_streams:
-        plog("DEBUG", "Finding new circ for " + str(stream.sid))
+        plog("DEBUG", "Finding new circ for " + str(stream.strm_id))
         self.attach_stream_any(stream, stream.detached_from)
     elif c.status == "BUILT":
       self.circuits[c.circ_id].built = True
       try:
         for stream in self.circuits[c.circ_id].pending_streams:
-          self.c.attach_stream(stream.sid, c.circ_id)
+          self.c.attach_stream(stream.strm_id, c.circ_id)
       except TorCtl.ErrorReply, e:
         # No need to retry here. We should get the failed
         # event for either the circ or stream next
@@ -709,8 +716,16 @@
       if s.strm_id not in self.streams:
         plog("NOTICE", "Succeeded stream "+str(s.strm_id)+" not found")
         return
-      self.streams[s.strm_id].circ = self.streams[s.strm_id].pending_circ
-      self.streams[s.strm_id].circ.pending_streams.remove(self.streams[s.strm_id])
+      if s.circ_id and self.streams[s.strm_id].pending_circ.circ_id != s.circ_id:
+        # Hrmm.. this can happen on a new-nym.. Very rare, putting warn
+        # in because I'm still not sure this is correct
+        plog("WARN", "Mismatch of pending: "
+          +str(self.streams[s.strm_id].pending_circ.circ_id)+" vs "
+          +str(s.circ_id))
+        self.streams[s.strm_id].circ = self.circuits[s.circ_id]
+      else:
+        self.streams[s.strm_id].circ = self.streams[s.strm_id].pending_circ
+      self.streams[s.strm_id].pending_circ.pending_streams.remove(self.streams[s.strm_id])
       self.streams[s.strm_id].pending_circ = None
       self.streams[s.strm_id].attached_at = s.arrived_at
     elif s.status == "FAILED" or s.status == "CLOSED":

Modified: torflow/trunk/metatroller.py
===================================================================
--- torflow/trunk/metatroller.py	2007-03-20 03:26:51 UTC (rev 9883)
+++ torflow/trunk/metatroller.py	2007-03-20 04:48:10 UTC (rev 9884)
@@ -450,6 +450,7 @@
         else: start_f = len(c.path)-1 
 
         # Count failed
+        # XXX: Differentiate between extender and extendee
         for r in self.circuits[c.circ_id].path[start_f:len(c.path)+1]:
           r.circ_failed += 1
           if not reason in r.reason_failed:
@@ -533,12 +534,12 @@
         return
 
       # Verify circ id matches stream.circ
-      if s.status not in ("NEW" or "NEWRESOLVE"):
+      if s.status not in ("NEW", "NEWRESOLVE", "REMAP"):
         circ = self.streams[s.strm_id].circ
         if not circ: circ = self.streams[s.strm_id].pending_circ
-        if circ and circ.cid != s.circ_id:
+        if circ and circ.circ_id != s.circ_id:
           plog("WARN", str(s.strm_id) + " has mismatch of "
-                +str(s.circ_id)+" v "+str(circ.cid))
+                +str(s.circ_id)+" v "+str(circ.circ_id))
       
       if s.status == "DETACHED":
         if self.streams[s.strm_id].attached_at:
@@ -558,7 +559,7 @@
         # Update strm_chosen count
         for r in self.circuits[s.circ_id].path: r.strm_chosen += 1
 
-        # Update bw stats
+        # Update bw stats. XXX: Don't do this for resolve streams
         if self.streams[s.strm_id].attached_at:
           lifespan = self.streams[s.strm_id].lifespan(s.arrived_at)
           for r in self.streams[s.strm_id].circ.path:
@@ -623,6 +624,7 @@
       else:
         s.write("250 LASTEXIT=0 (0) OK\r\n")
     elif command == "NEWEXIT" or command == "NEWNYM":
+      # XXX: Seperate this
       clear_dns_cache(c)
       h.new_nym = True # GIL hack
       plog("DEBUG", "Got new nym")