[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #23817 [Core Tor/Tor]: Tor re-tries directory mirrors that it knows are missing microdescriptors
#23817: Tor re-tries directory mirrors that it knows are missing microdescriptors
-------------------------------------------------+-------------------------
Reporter: teor | Owner: (none)
Type: defect | Status: new
Priority: Medium | Milestone: Tor:
| 0.3.3.x-final
Component: Core Tor/Tor | Version:
Severity: Normal | Resolution:
Keywords: tor-guard, tor-hs, prop224, | Actual Points:
032-backport? 031-backport? |
Parent ID: #21969 | Points:
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Comment (by asn):
Replying to [comment:10 teor]:
> Replying to [comment:9 asn]:
> > Replying to [comment:7 teor]:
> > > Replying to [comment:6 asn]:
> > > > Here is an implementation plan of the failure cache idea from
comment:4 .
> > > >
> > > > First of all, the interface of the failure cache:
> > > >
> > > > We introduce a `digest256map_t *md_fetch_fail_cache` which maps
the 256-bit md hash to a smartlist of dirguards thru which we failed to
fetch the md.
> > > >
> > > > Now the code logic:
> > > >
> > > > 1) We populate `md_fetch_fail_cache` with dirguards in
`dir_microdesc_download_failed()`. We remove them from the failure cache
in `microdescs_add_to_cache()` when we successfuly add an md to the cache.
> > >
> > > Successfully add *that* md to the cache?
> > > Or any md from that dirguard?
> > >
> >
> > I meant, we remove *that* md from the `md_fetch_fail_cache` if we
manage to fetch *that* md from any dir.
> >
> > > I think this is ok, as long as we ask for mds in large enough
batches.
> > >
> > > > 2) We add another `entry_guard_restriction_t` restriction type in
`guards_choose_dirguard()`. We currently have one restriction type which
is designed to restrict guard nodes based on the exit node choice and its
family. We want another type which uses a smartlist and restricts
dirguards based on whether we have failed to fetch an md from that
dirguard. We are gonna use this in step 3.
> > > >
> > > > 3) In `directory_get_from_dirserver()` we query the md failure
cache and pass any results to `directory_pick_generic_dirserver()` and
then to `guards_choose_dirguard()` which uses the new restriction type to
block previously failed dirguards from being selected.
> > >
> > > Do we block dirguards that have failed to deliver an md from
downloads of that md?
> > > Or do we block dirguards that have failed to deliver any mds from
downloads of any md?
> > >
> >
> > Yes, that's a good question that I forgot to address in this proposal.
> >
> > I guess my design above was suggesting that we block dirguards "that
have failed to deliver any mds from downloads of any md", until those mds
get fetched from another dirserver and get removed from the failure cache.
>
> I think this is the behaviour we want: trying each dir server for each
specific md will mean that we time out, because there are more mds than
there are dir servers.
>
> There are two cases when this will cause us to fall back to the
authorities:
> * we download a new consensus from the authorities, and they are the
only ones with some of the mds in it
> * for some reason, an md referenced in the consensus is not mirrored
correctly by relays or authorities (this would be a serious bug in tor)
>
> To avoid this happening when it isn't necessary, we should expire
failure cache entries after a random time. Maybe it should be the time
when we expect dir servers to fetch a new consensus and new mds. I think
this is 1-2 hours, but check dir-spec for the exact details. Or we could
expire md failure caches each time we get a new consensus. That would be
easier.
ACK. Based on our discussion above, I think the implementation can be
simplified since we decided to go with the "block dirguards that have
failed to deliver any mds from downloads of any md" approach:
Hence, instead of keeping track of which mds we failed to download from
which relays, we can just keep track of the relays that have outdated md
information without caring about which mds they failed to serve us. This
simplifies things a lot from an engineering perspective since we don't
need to map mds to relays. We can just keep an `outdated_dirserver_list`
smartlist with the list of outdated dirservers and ignore those when we
try to fetch mds.
This is also consistent with my empirical experience, where most
dirservers will usually provide us with all the mds we asked them, except
from a few dirservers which would refuse to give us 10 mds or so for a
while. So we can just ban those dirservers for a while using the above
logic.
And I agree with you, we should wipe the `outdated_dirservers_list` list
everytime we fetch a new consensus. Should we also clear it at any other
point? Maybe we should not let it grow to a huge size (cap it?).
I started implementing the original implementation plan today, before I
realized the above simplification, so I will have to change some code to
do this new design. I estimate that I will have an initial branch here
early next week including unittests.
Onwards!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23817#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs