[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[or-cvs] first pass over HACKING doc



Update of /home/or/cvsroot/doc
In directory moria.mit.edu:/home2/arma/work/onion/cvs/doc

Modified Files:
	HACKING 
Log Message:
first pass over HACKING doc


Index: HACKING
===================================================================
RCS file: /home/or/cvsroot/doc/HACKING,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -d -r1.4 -r1.5
--- HACKING	9 Oct 2003 02:05:13 -0000	1.4
+++ HACKING	9 Oct 2003 08:33:54 -0000	1.5
@@ -56,7 +56,7 @@
   
    [General-purpose modules]
 
-     or.h -- Common header file: includes everything, define everything.
+     or.h -- Common header file: include everything, define everything.
 
      buffers.c -- Implements a generic buffer interface.  Buffers are 
         fairly opaque string holders that can read to or flush from:
@@ -65,7 +65,7 @@
         Also implements parsing functions to read HTTP and SOCKS commands
         from buffers.
 
-     tree.h -- A splay tree implementatio by Niels Provos.  Used only by
+     tree.h -- A splay tree implementation by Niels Provos.  Used only by
         dns.c.
 
      config.c -- Code to parse and validate the configuration file.
@@ -88,7 +88,7 @@
         results; clients use routers.c to parse them.
 
      dirserv.c -- Code to manage directory contents and generate
-        directories. [Directory only] 
+        directories. [Directory server only] 
 
      routers.c -- Code to parse directories and router descriptors; and to
         generate a router descriptor corresponding to this OR's
@@ -109,7 +109,7 @@
 
      connection_edge.c -- Code used only by edge connections.
 
-     command.c -- Code to handle specific cell types. [OR only]
+     command.c -- Code to handle specific cell types.
 
      connection_or.c -- Code to implement cell-speaking connections.
 
@@ -151,29 +151,29 @@
      [Edge connections]
        CONN_TYPE_EXIT -- A TCP connection from an onion router to a
           Stream's destination. [OR only]
-       CONN_TYPE_AP -- A SOCKS proxy connection from the end user to the
-          onion proxy.  [OP only]
+       CONN_TYPE_AP -- A SOCKS proxy connection from the end user
+          application to the onion proxy.  [OP only]
 
      [Listeners]
        CONN_TYPE_OR_LISTENER [OR only]
        CONN_TYPE_AP_LISTENER [OP only]
-       CONN_TYPE_DIR_LISTENER [Directory only]
+       CONN_TYPE_DIR_LISTENER [Directory server only]
           -- Bound network sockets, waiting for incoming connections.
 
      [Internal]
        CONN_TYPE_DNSWORKER -- Connection from the main process to a DNS
-          worker. [OR only]
+          worker process. [OR only]
        
        CONN_TYPE_CPUWORKER -- Connection from the main process to a CPU
-          worker. [OR only]
+          worker process. [OR only]
 
    Connection states are documented in or.h.
 
    Every connection has two associated input and output buffers.
-   Listeners don't use them.  With other connections, incoming data is
-   appended to conn->inbuf, and outgoing data is taken from the front of
-   conn->outbuf.  Connections differ primarily in the functions called
-   to fill and drain these buffers.
+   Listeners don't use them.  For non-listener connections, incoming
+   data is appended to conn->inbuf, and outgoing data is taken from the
+   front of conn->outbuf.  Connections differ primarily in the functions
+   called to fill and drain these buffers.
 
 1.3. All about circuits.
 
@@ -192,9 +192,10 @@
 
 1.4. Asynchronous IO and the main loop.
 
-   Tor uses the poll(2) system call [or a substitute based on select(2)]
-   to handle nonblocking (asynchonous) IO.  If you're not familiar with
-   nonblocking IO, check out the links at the end of this document.
+   Tor uses the poll(2) system call (or it wraps select(2) to act like
+   poll, if poll is not available) to handle nonblocking (asynchronous)
+   IO.  If you're not familiar with nonblocking IO, check out the links
+   at the end of this document.
         
    All asynchronous logic is handled in main.c.  The functions
    'connection_add', 'connection_set_poll_socket', and 'connection_remove'
@@ -205,18 +206,23 @@
    individual connections.)
 
    To trap read and write events, connections call the functions
-   'connection_{is|stop|start}_{reading|writing}'.
+   'connection_{is|stop|start}_{reading|writing}'. If you want
+   to completely reset the events you're watching for, use
+   'connection_watch_events'.
 
-   When connections get events, main.c calls conn_read and conn_write.
-   These functions dispatch events to connection_handle_read and
-   connection_handle_write as appropriate.
+   Every time poll() finishes, main.c calls conn_read and conn_write on
+   every connection. These functions dispatch events that have something
+   to read to connection_handle_read, and events that have something to
+   write to connection_handle_write, respectively.
 
-   When connection need to be closed, they can respond in two ways.  Most
-   simply, they can make connection_handle_* to return an error (-1),
-   which will make conn_{read|write} close them.  But if the connection
-   needs to stay around [XXXX explain why] until the end of the current
-   iteration of the main loop, it marks itself for closing by setting
-   conn->connection_marked_for_close.
+   When connections need to be closed, they can respond in two ways.  Most
+   simply, they can make connection_handle_* return an error (-1),
+   which will make conn_{read|write} close them.  But if it's not
+   convenient to return -1 (for example, processing one connection causes
+   you to realize that a second one should close), then you can also
+   mark a connection to close by setting conn->marked_for_close. Marked
+   connections will be closed at the end of the current iteration of
+   the main loop.
 
    The main loop handles several other operations: First, it checks
    whether any signals have been received that require a response (HUP,
@@ -227,23 +233,26 @@
    that were blocking for more bandwidth, and maintaining statistics.
 
    A word about TLS: Using TLS on OR connections complicates matters in
-   two ways.  First, a TLS stream has its own read buffer independent of
-   the connection's read buffer.  (TLS needs to read an entire frame from
+   two ways.
+   First, a TLS stream has its own read buffer independent of the
+   connection's read buffer.  (TLS needs to read an entire frame from
    the network before it can decrypt any data.  Thus, trying to read 1
-   byte from TLS can require that several KB be read from the network and
-   decrypted.  The extra data is stored in TLS's decrypt buffer.)  Second,
-   the TLS stream's events do not correspond directly to network events:
-   sometimes, before a TLS stream can read, the network must be ready to
-   write -- or vice versa.
-
-   [XXXX describe the consequences of this for OR connections.]
+   byte from TLS can require that several KB be read from the network
+   and decrypted.  The extra data is stored in TLS's decrypt buffer.)
+   Because the data hasn't been read by tor (it's still inside the TLS),
+   this means that sometimes a connection "has stuff to read" even when
+   poll() didn't return POLLIN. The tor_tls_get_pending_bytes function is
+   used in main.c to detect TLS objects with non-empty internal buffers.
+   Second, the TLS stream's events do not correspond directly to network
+   events: sometimes, before a TLS stream can read, the network must be
+   ready to write -- or vice versa.
 
 1.5. How data flows (An illustration.)
 
-   Suppose an OR receives 50 bytes along an OR connection.  These 50 bytes
-   complete a data relay cell, which gets decrypted and delivered to an
-   edge connection.  Here we give a possible call sequence for the
-   delivery of this data.
+   Suppose an OR receives 256 bytes along an OR connection.  These 256
+   bytes turn out to be a data relay cell, which gets decrypted and
+   delivered to an edge connection.  Here we give a possible call sequence
+   for the delivery of this data.
 
    (This may be outdated quickly.)
 
@@ -264,22 +273,29 @@
                  makes sure the circuit is live, then passes the cell to:
            circuit_deliver_relay_cell -- Passes the cell to each of: 
             relay_crypt -- Strips a layer of encryption from the cell and
-                 notice that the cell is for local delivery.
+                 notices that the cell is for local delivery.
             connection_edge_process_relay_cell -- extracts the cell's
                  relay command, and makes sure the edge connection is
                  open.  Since it has a DATA cell and an open connection,
                  calls:
-             circuit_consider_sending_sendme -- [XXX]
+             circuit_consider_sending_sendme -- check if the total number
+                 of cells received by all streams on this circuit is
+                 enough that we should send back an acknowledgement
+                 (requesting that more cells be sent to any stream).
              connection_write_to_buf -- To place the data on the outgoing
                  buffer of the correct edge connection, by calling:
               connection_start_writing -- To tell the main poll loop about
                  the pending data.
               write_to_buf -- To actually place the outgoing data on the
                  edge connection.
-             connection_consider_sending_sendme -- [XXX]
+             connection_consider_sending_sendme -- if the outbuf waiting
+                 to flush to the exit connection is not too full, check
+                 if the total number of cells received on this stream
+                 is enough that we should send back an acknowledgement
+                 (requesting that more cells be sent to this stream).
 
-   [In a subsequent iteration, main notices that the edge connection is
-    ready for writing.]
+   In a subsequent iteration, main notices that the edge connection is
+   ready for writing:
 
    do_main_loop -- Calls poll(2), receives a POLLOUT event on a struct
                  pollfd, then calls:
@@ -294,7 +310,12 @@
                  calls:
         connection_stop_writing -- Tells the main poll loop that this
                  connection has no more data to write.
-        connection_consider_sending_sendme -- [XXX]
+        connection_consider_sending_sendme -- now that the outbuf
+                 is empty, check again if the total number of cells
+                 received on this stream is enough that we should send
+                 back an acknowledgement (requesting that more cells be
+                 sent to this stream).
+
 
 1.6. Routers, descriptors, and directories
 
@@ -302,7 +323,7 @@
    several reasons:
        - OPs need to establish connections and circuits to ORs.
        - ORs need to establish connections to other ORs.
-       - OPs and ORs need to fetch directories from a directory servers.
+       - OPs and ORs need to fetch directories from a directory server.
        - ORs need to upload their descriptors to directory servers.
        - Directory servers need to know which ORs are allowed onto the
          network, what the descriptors are for those ORs, and which of
@@ -321,8 +342,8 @@
    'desc_routerinfo' and 'descriptor' static variables in routers.c.
 
    Additionally, a directory server keeps track of a list of the
-   router descriptors it knows in a separte list in dirserv.c.  It
-   uses this list, plus the open connections in main.c, to build
+   router descriptors it knows in a separate list in dirserv.c.  It
+   uses this list, checking which OR connections are open, to build
    directories.
 
 1.7. Data model
@@ -372,14 +393,14 @@
   Log convention: use only these four log severities.
 
     ERR is if something fatal just happened.
-    WARNING is something bad happened, but we're still running. The
+    WARN if something bad happened, but we're still running. The
       bad thing is either a bug in the code, an attack or buggy
       protocol/implementation of the remote peer, etc. The operator should
       examine the bad thing and try to correct it.
     (No error or warning messages should be expected during normal OR or OP
-      operation.. I expect most people to run on -l warning eventually. If a
+      operation. I expect most people to run on -l warn eventually. If a
       library function is currently called such that failure always means
-      ERR, then the library function should log WARNING and let the caller
+      ERR, then the library function should log WARN and let the caller
       log ERR.)
     INFO means something happened (maybe bad, maybe ok), but there's nothing
       you need to (or can) do about it.
@@ -397,7 +418,7 @@
 
      See http://freehaven.net/tor/
          http://freehaven.net/tor/cvs/doc/tor-spec.txt
-         http://freehaven.net/tor/cvs/doc/tor-dessign.tex
+         http://freehaven.net/tor/cvs/doc/tor-design.tex
          http://freehaven.net/tor/cvs/doc/FAQ
 
   About anonymity