[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #20548 [Metrics]: Handle bad input more consistently in metrics code bases
#20548: Handle bad input more consistently in metrics code bases
-------------------------+---------------------
Reporter: karsten | Owner:
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Metrics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------+---------------------
Comment (by iwakeh):
Some thoughts:
One step is unifying the parsing process by replacing all parsing code
with metrics-lib provided parsing (which is already under way for
CollecTor). This addresses goal number one in the description above.
Goal number two (of the bullet point list in the description above) is
fine, too, as descriptors are separate data units and failure of parsing
one should not influence parsing and storing of subsequent descriptors
only because these happened to be stored in the same file temporarily.
Regarding the second list: privacy and client expectation, i.e. topics 3.
and 4., are the most important.
One way to combine storing-of-all-that-is-seen with privacy and client
expectation, would be to store invalid descriptors separately. The
separate location also can be public for relay descriptors and sanitized
bridge descriptors,i.e., public folders for download would be 'archive',
'relay', and 'substandard' (or some better name). All bridge descriptors
that cannot be sanitized should be stored too, but not yet be offered to
the public.
Advantages:
* privacy is ensured
* clients can choose the quality of descriptors they're interested in
* we'd get an overview of how many 'bad' descriptors show up every month
and can analyze them
* others can also analyze the 'substandard' descriptors, too, or use them,
if they choose to.
* Given that descriptors are not supposed to be altered other than for
privacy reasons, some still could be later integrated into the 'normal'
archives for example when more robust parsing is available.
Disadvantages:
* implementation of the third storage (alover, i.e. for 'recent', 'out',
and 'substandard'), but the implementation should be easy.
* maintenance of third storage location.
Concerning already archived data there are two options:
* leave them as thy are
* or re-parse and sort substandard historic descriptors into tarballs in
the 'substandard' directory.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/20548#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs