[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Tor client performance (was Re: URGENT: patch needed ASAP for authority bug)



On Wed, Apr 21, 2010 at 12:01:40PM +0200, Hans Schnehl wrote:
> During all the 'testing', vallenator lost it's guard and stable flag.
> For a try, I left it completely idle for a few days, restarted it
> *with another IP*, only to find the node with 20.000 states/connections
> within some 2 hours. Again, this is the upper limit of connections this
> box will accept.

For those who like patching their Tor, I've attached a diff that applies
to current git. (For developers, it should be easy enough to merge in
to other recent Tor versions too.) It will tell you, once a second at
log-level notice, how many OR conns you have open and how many were used
as one-hop directory fetches. The lines look like:

Apr 21 06:47:30.286 [notice] 7265 (2623,4360) of 9149/9272 used for begindir

Meaning I have 9272 total connections, of which 9149 are TLS connections,
and 7265 of those were used for one-hop directory fetches. Of those 7265,
2623 of those have no circuits attached currently (meaning the client
did its directory fetch, expired the circuit, and hasn't bothered closing
the TLS conn), and 4360 have one circuit attached currently (meaning they
haven't expired the circuit yet). That leaves 7265-2623-4360=282 TLS conns
used for directory fetches that have more than one circuit open currently.

Once I've slept and have thought about it more, I'm going to write a
test patch that will preemptively close these TLS connections earlier
than the client would otherwise close them. Done right, it should help
a lot without screwing up too much -- *if* the problem is that we have
way more directory fetches going on that we anticipated. The problem
might also be that guards have too many incoming connections; we'll
tackle that one separately.

If you want to help out, please run your Tor with the patch for a while
(several hours) and mail me your notice lines. Be sure to tell me which
relay is yours, so I can check out its flags, etc.

> You may see this behaviour for quite a few more nodes by going through
> your historic consensuses or, a little more comfortable, wacthing the
> graphs on one of the TNS sites.    

Yeah. Part of the challenge here is that we have a huge influx of users,
or at least user connections, and that causes some relays to give up,
meaning that the huge influx focuses even more on the ones that remain.

So, all of you fine relay operators, please bear with us rather than
giving up. :)

> BTW, only in very rare cases the bandwidth n the consensus for this node
> was anywhere close to what was actually annoounced or it was willing to
> provide.  Looks to me as if the authorities method of determining nodes
> bandwidth is somewhat insufficient and therefore may not necessarily be a
> base for anything.

Right -- as described earlier in the thread, the numbers in the consensus
are weights, not bandwidths.

Thanks!
--Roger

diff --git a/src/or/connection.c b/src/or/connection.c
index 7b1493b..f6bf616 100644
--- a/src/or/connection.c
+++ b/src/or/connection.c
@@ -694,6 +694,7 @@ void
 connection_expire_held_open(void)
 {
   time_t now;
+  int num_or=0, num_begindir=0, num_empty=0, num_one=0;
   smartlist_t *conns = get_connection_array();
 
   now = time(NULL);
@@ -703,6 +704,16 @@ connection_expire_held_open(void)
     /* If we've been holding the connection open, but we haven't written
      * for 15 seconds...
      */
+    if (conn->type == CONN_TYPE_OR) {
+      num_or++;
+      if (conn->have_handled_begindir) {
+        num_begindir++;
+        if (TO_OR_CONN(conn)->n_circuits == 0)
+          num_empty++;
+        if (TO_OR_CONN(conn)->n_circuits == 1)
+          num_one++;
+      }
+    }
     if (conn->hold_open_until_flushed) {
       tor_assert(conn->marked_for_close);
       if (now - conn->timestamp_lastwritten >= 15) {
@@ -722,6 +733,8 @@ connection_expire_held_open(void)
       }
     }
   });
+  log_notice(LD_DIR, "%d (%d,%d) of %d/%d used for begindir",
+             num_begindir, num_empty, num_one, num_or, smartlist_len(conns));
 }
 
 /** Create an AF_INET listenaddr struct.
diff --git a/src/or/connection_edge.c b/src/or/connection_edge.c
index a173dc1..4867627 100644
--- a/src/or/connection_edge.c
+++ b/src/or/connection_edge.c
@@ -2784,9 +2784,15 @@ connection_exit_connect_dir(edge_connection_t *exitconn)
 {
   dir_connection_t *dirconn = NULL;
   or_circuit_t *circ = TO_OR_CIRCUIT(exitconn->on_circuit);
+  connection_t *or_conn = TO_CONN(circ->p_conn);
 
   log_info(LD_EXIT, "Opening local connection for anonymized directory exit");
 
+  if (or_conn->have_handled_begindir == 0 && circ->is_first_hop) {
+    log_info(LD_DIR, "conn %d first used for begindir", or_conn->s);
+    or_conn->have_handled_begindir = 1;
+  }
+
   exitconn->_base.state = EXIT_CONN_STATE_OPEN;
 
   dirconn = dir_connection_new(AF_INET);
diff --git a/src/or/or.h b/src/or/or.h
index ad863dc..ce9ad77 100644
--- a/src/or/or.h
+++ b/src/or/or.h
@@ -956,6 +956,9 @@ typedef struct connection_t {
   /** CONNECT/SOCKS proxy client handshake state (for outgoing connections). */
   unsigned int proxy_state:4;
 
+  /** True iff this is an OR conn that has received a begindir request */
+  unsigned int have_handled_begindir:1;
+
   /** Our socket; -1 if this connection is closed, or has no socket. */
   evutil_socket_t s;
   int conn_array_index; /**< Index into the global connection array. */