[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances
#18910: distributing descriptors accross CollecTor instances
-------------------------------+-----------------------------------
Reporter: iwakeh | Owner: iwakeh
Type: enhancement | Status: needs_information
Priority: Medium | Milestone:
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: ctip | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------+-----------------------------------
Comment (by karsten):
Replying to [comment:7 iwakeh]:
> Some thoughts:
>
> === The CollecTor side
> Maybe CollecTor (or the Metrics Team) needs a data collection and
handling policy?
> (Or, is there anything like that I didn't find yet other than the
license and of course the Tor-wide privacy goals?)
There is no explicit policy like that, but it would be useful to document
that in the medium term.
I guess a CollecTor policy would make more sense than one that applies to
all metrics-related products, because then we'd have to either enforce
that policy for all metrics-related tools or manually confirm that a tool
conforms to the policy. Other tools could have their own policies.
> In general, CollecTor shouldn't attempt to make received data better
than it is
> by dropping unwanted things.
Agreed, and a nice way to phrase this. :)
> At least not without some defined process.
> And collected data should only be changed when there is a reason for
obfuscation or
> when it is enhanced (e.g. by adding the @source tag).
Look, that's the beginning of a policy! I like that.
> === Handling of //unwanted// data
> Incomplete unreferenced server descs could be stored differently:
> * referenced server descs can be stored in the way it is done now and
> * unreferenced can be kept, but stored seperately.
>
> The synch-process could first concentrate on the referenced descriptors.
I'm not sold on this part with respect to the process. I can see how
we're switching from a model where we're trusting everyone (all relays and
bridges, all directory authorities, all other CollecTor instances) to just
a small set of nodes (for example, the set of directory authorities listed
in tor.git at a certain point in time). But doing so is a major
engineering effort, whereas continuing to trust everyone and risking to
get spammed is easy. Also, once we limit trust we can always go through
the tarballs and rip out everything we shouldn't have accepted. Hence,
I'd say let's handle all data, wanted or unwanted, the same for now.
But in the future, yes, let's consider doing this. Once we do we should
talk to ln5 about his plans to apply certificate transparency concepts to
create a Tor network data archive, where spam descriptors turned out to be
a major issue, too.
> === Regarding the repeated uploads:
> What is the reason for all these server descriptors gabelmoo received?
> Is there some benign explanation for the uploads?
Probably not. But even if we find the reason and fix this, we cannot undo
that it happened in the past, we cannot guarantee that there will be no
future bugs like this one, and we cannot prevent malicious relays from
flooding the directory authorities with random descriptors without there
being a bug. Or did you mean that directory authorities shouldn't accept
as many descriptors from a single source? I'm not sure how that would
work, and for the directory authorities it's not that much of a problem to
get spammed temporarily. So, I think we might not be able to fix our
issue with spam descriptors in the tor daemon.
> Maybe, we should actually search the old data for more upload frencies
like the one triggering this discussion?
We could, but what would we do once we find similar events? When does a
malicious descriptor flood begin and what's still expected behavior? I
think if we want to solve the descriptor spam problem we'll have to limit
ourselves to descriptors published by trusted entities and descriptors
referenced from such descriptors directly or indirectly.
Sorry for the long response. It's a difficult problem, it seems.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs