[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #5635 [Tor Directory Authority]: Hundreds of thousands of "While fetching directory info, no running dirservers known"
#5635: Hundreds of thousands of "While fetching directory info, no running
dirservers known"
-------------------------------------+--------------------------------------
Reporter: ln5 | Owner:
Type: defect | Status: new
Priority: major | Milestone: Tor: 0.2.3.x-final
Component: Tor Directory Authority | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------------------+--------------------------------------
Comment(by nickm):
Actually, that's type and state, not purpose. Fooey.
Sebastian reports some more interesting log messages at info:
{{{
10:58 < nickm> So, something is calling directory_get_from_dirserver over
and over and over.
10:58 < nickm> And passing purpose == DIR_PURPOSE_FETCH_V2_NETWORKSTATUS
to it
10:58 < nickm> there are only a couple of places that could happen, I
think.
10:58 < nickm> (Please double-check me on all of this if you can, btw)
11:02 < nickm> The first place is indeed
update_v2_networkstatus_cache_downloads, but unless I am missing some
obvious sign error, that one should get rate-limited to happen not too
often.
11:02 < nickm> the second possible place seems to be
connection_dir_download_v2_networkstatus_failed.
11:03 < nickm> Assuming that I'm right about everything above, it has to
be connection_dir_download_v2_networkstatus_failed that's going crazy.
11:05 < nickm> That one can get called when a v2 networkstatus fetch gives
us a non-200 status, or when a directory connection for a v2 networkstatus
"fails".
11:07 < Sebastian> I have info logs
11:07 < Sebastian> in case they might help
11:07 < nickm> (IOW, when it's about to close without having got an answer
(state < DIR_CONN_STATE_CLIENT_FINISHED), or when connection_connect()
says -1.
11:07 < Sebastian> (I am in a lecture so I cannot really follow your
explanations or the code in too close detail. I will do that when I'm out
of here)
11:08 < nickm> Sebastian: ok
11:09 < nickm> Sebastian: it would help to know if there are 2-5 things in
the log immediately preceding each "while fetching directory info..."
message
11:19 < Sebastian> Apr 17 11:53:31.000 [warn] Weighted bandwidth is
0.000000 in node selection for rule weight as directory
11:20 < Sebastian> repeated 5 times
11:20 < Sebastian> nickm: ^
11:20 < nickm> Sebastian: seeing any "No roter found for "...
11:20 < nickm> *router
11:20 < nickm> or any "Deferring serverdesc fetch: all authorities are in
use"
11:20 < Sebastian> karsten@tor-dirauth:~$ tail -n 1000000
gabelmoo/notice.log| grep "No router found for" | head
11:21 < Sebastian> karsten@tor-dirauth:~$ tail -n 1000000
gabelmoo/notice.log| grep "Deferring" | head
11:21 < Sebastian> both no output
11:21 < Sebastian> oh let me use info log
11:21 < Sebastian> sorry
11:21 < Sebastian> Apr 17 13:12:36.000 [info]
directory_get_from_dirserver(): No router found for network-status fetch;
falling back to dirserver list.
11:21 < nickm> hmmmmm
11:21 < Sebastian> here"
11:22 < Sebastian> :
11:22 < Sebastian> Apr 17 13:12:36.000 [info]
router_pick_published_address(): Success: chose address '212.112.245.170'.
11:22 < Sebastian> Apr 17 13:12:36.000 [info]
connection_dir_request_failed(): Giving up on directory server at
'193.23.244.244'; retrying
11:22 < Sebastian> Apr 17 13:12:36.000 [info]
connection_dir_client_reached_eof(): Received networkstatus objects (size
0) from server '85.24.215.13:9030'
11:22 < nickm> Is that happening all the time (or a lot of the time)
before the repeated-message-in-question ?
11:22 < Sebastian> Apr 17 13:12:36.000 [info]
router_pick_published_address(): Success: chose address '212.112.245.170'.
11:22 < Sebastian> Apr 17 13:12:36.000 [info]
directory_get_from_dirserver(): No router found for network-status fetch;
falling back to dirserver list.
11:22 < Sebastian> Apr 17 13:12:36.000 [notice] While fetching directory
info, no running dirservers known. Will try again later. (purpose 5)
11:23 < Sebastian> yes
11:23 < Sebastian> $ tail -n 1000000 gabelmoo/info.log| grep -B 5 While |
grep "No router found for" | wc -l
11:23 < Sebastian> 50943
11:23 < Sebastian> $ tail -n 1000000 gabelmoo/info.log| grep While| wc
-l50915
11:23 < nickm> oh heck. There's the bug then.
11:24 < nickm> I think.
11:24 < nickm> Maybe
11:24 < nickm> or wait.
11:25 < nickm> yeah. It launches a request, the requests don't get
fulfilled for a few times; it marks all the v2_ns_dl_status thingies as
failed so it should retry later ...
11:26 < nickm> no, that still doesn't really make sense.
11:27 < nickm> connection_dir_download_v2_networkstatus_failed just
shouldn't be getting called unless there is some connection getting
launched.
11:32 < nickm> Sebastian: Is the "received networkstatus objects (size 0)"
also repeated a lot? (The "size 0" specifically feels important)
11:34 < nickm> I note that 85.24.215.13 doesn't appear to be an authority;
we shouldn't be asking them for v2 networkstatuses!
11:34 < Sebastian> $ tail -n 1000000 gabelmoo/info.log| grep 'received
networkstatus' | wc -l
11:34 < Sebastian> 0
11:34 < nickm> capital R ?
11:35 < Sebastian> yup
11:35 < Sebastian> $ tail -n 1000000 gabelmoo/info.log| grep 'Received
networkstatus' | grep "size 0" | wc -l
11:35 < Sebastian> 38867
11:35 < Sebastian> so, not as often
11:35 < Sebastian> but often
11:36 < nickm> Mind if I copy and paste our conversation onto the bug?
11:37 < Sebastian> go right ahead!
}}}
So my current guess is that we seem to be in a loop where after we have
failed to get a v2 networkstatus from any of the authorities, we ask a
non-authority for the v2 networkstatus. That fails, and we ask again.
Over and over.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5635#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs