[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #12676 [Metrics Data Processor]: Bridge descriptors CollecTor's recent/ directory contain many duplicates
#12676: Bridge descriptors CollecTor's recent/ directory contain many duplicates
------------------------------------+---------------------
Reporter: karsten | Owner:
Type: defect | Status: new
Priority: minor | Milestone:
Component: Metrics Data Processor | Version:
Keywords: | Actual Points:
Parent ID: | Points:
------------------------------------+---------------------
The `recent/` directory should only contain new descriptors, and ideally
no duplicates. I just found that the latter is not the case:
{{{
$ grep -c "@type" recent/bridge-descriptors/server-
descriptors/2014-07-22-07-04-02-server-descriptors
18175
$ grep -c "@type" recent/bridge-descriptors/extra-
infos/2014-07-22-07-04-02-extra-infos
9723
}}}
Compare this to relay descriptors:
{{{
$ grep -c "@type" recent/relay-descriptors/server-
descriptors/2014-07-22-07-05-52-server-descriptors
931
$ grep -c "@type" recent/relay-descriptors/extra-infos/2014-07-22-07-05-52
-extra-infos
930
$ grep -c "@type" recent/relay-
descriptors/microdescs/micro/2014-07-22-07-05-52-micro
30
}}}
The reason is that only novel relay descriptors will be downloaded and
stored to disk, but the parsed bridge descriptor tarballs are full
snapshots of Tonga's cached descriptor files. We need to add a check
whether we already have a sanitized bridge descriptor and only store it if
not.
Priority is minor, because this only adds some additional load on clients
parsing descriptors more than once. But other than that it's mostly
harmless.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/12676>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs