[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #25685 [Core Tor/Tor]: Tor relays publish a new descriptor but authorities drop it because they think it's only cosmetically different, and then the relay waits 18 more hours to publish, thus falling out of the consensus
#25685: Tor relays publish a new descriptor but authorities drop it because they
think it's only cosmetically different, and then the relay waits 18 more
hours to publish, thus falling out of the consensus
------------------------------+----------------------------------
Reporter: arma | Owner: (none)
Type: defect | Status: new
Priority: Medium | Milestone:
Component: Core Tor/Tor | Version:
Severity: Normal | Keywords: 034-roadmap-proposed
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
------------------------------+----------------------------------
We have a design flaw, or at least an impedance mismatch, in our
descriptor publishing algorithm.
Relays publish a new descriptor when they think something has sufficiently
changed (e.g. bandwidth, IP address, exit policy, etc) or when 18 hours
have passed.
Directory authorities accept the new descriptor when *they* think it has
sufficiently changed. If they think it hasn't, they quietly drop it:
{{{
log_info(LD_DIRSERV,
"Not replacing descriptor from %s (source: %s); "
"differences are cosmetic.",
router_describe(ri), source);
}}}
The trouble comes when things get out of sync: the relay thinks it
published recently so it is still early in its 18 hour timer, but the
authorities discarded that descriptor. Then when the "current" descriptor
becomes 24 hours old, it gets discarded, and the relay falls out of the
consensus.
I don't have stats on how frequently this out-of-sync actually happens,
but it's enough to have tickets filed about it (#23638) and it's enough to
have confused/sad posts from relay operators about it every month:
https://lists.torproject.org/pipermail/tor-dev/2018-March/013030.html
https://lists.torproject.org/pipermail/tor-relays/2018-March/014764.html
We deployed a bandaid in 0.2.3.4-alpha (commit 1f4b694, #3327), that makes
relays look in the consensus and publish a new descriptor more
aggressively if they find they're not listed. That hack is apparently
needed quite often: in #21642 I said "So 426 of our ~7300 relays stayed in
the consensus in the last 12.5 hours because of this hack."
But I think we haven't actually explored whether the bandaid helps all of
the relays stay in the consensus all of the time, or if there are still
"holes" in it that mean some relays fall out sometimes. The reports above
make me think that yes there are still holes.
Potential ways forward:
* Match up the descriptor upload timings, as seen by a dir auth, with the
appearance of relays in the consensus. See how many of the relays
publishing for reason "version listed in consensus is quite old" are
missing any hours in the consensus.
* If there are some that fall out of the consensus entirely, think about
ways to make the republish more aggressive and earlier, or if it is
already more aggressive and earlier, figure out why it isn't sticking.
* Think about ways to make our relay-side decisions about "is it different
enough" synchronize better with our dirauth-side decisions. Now that we're
doing hourly consensus documents, can the dir auths be more lenient of
similar-ish descriptors, because there's only one "winner" of a descriptor
each hour? This poor synchronization is part of why we couldn't implement
proposal 275 when we wanted to.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25685>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs