[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #20228 [Metrics/CollecTor]: Append all votes with same valid-after time to a single file in `recent/`
#20228: Append all votes with same valid-after time to a single file in `recent/`
-------------------------------+---------------------
Reporter: karsten | Owner:
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------+---------------------
Comment (by karsten):
Replying to [comment:3 iwakeh]:
> * Regarding grouping by download vs. published time which came up in
#20234, too.
> Let's have the discussion for all descriptors here, if this is ok?
> 1. Grouping by published time brings more data consistency between
CollecTor instances, as their download times for the same descriptors
surely differ often.
Agreed, I guess we can assume that files in the `recent/` directories
might differ between CollecTor instances. But is that important, as long
as the set of contained descriptors with publication time in the past,
say, 60 hours is 99.9% the same? I mean, it's still possible and very
likely that files by publication hour would contain descriptors in
different orders. Do we care?
> 2. Grouping by download time means keeping track of a data item, i.e.
download time, that so far is not part of the Tor protocol. Why introduce
it for descriptors that provide a published time? Which is the download
time after syncing descriptors: the initial download by the supplying
CollecTor or the sync-download-time by the receiving one?
Right now, a CollecTor instance records the timestamp when starting to
download and uses that as file name for the descriptors file where it
appends all descriptors it learns about in that run. That would include
descriptors found via initial download or via synchronization from other
instances. And 72 hours later, when the file gets deleted, the download
time will not be relevant anymore.
> 3. Regarding #20234:comment:5: Clients might not be interested in past
or future (according published time) descriptors and just download the
file they consider current, if it changed since their last visit.
Right, this is an important argument for storing descriptors by published
hour, so that clients can retrieve them easily. However, the presumption
there is that the client knows the publication time of a descriptor before
downloading something, and that's not always the case. It might be that
the client would have to download several files and search for the
descriptor it's looking for.
And the most important argument against storing descriptors by published
hour is that clients that just want the new descriptors will have to
download about 8 files per hour (due to #20234) rather than 1, where 6 or
7 of these files contain mostly the same descriptors as before.
> * Regarding the notice: I think the two week time frame is fine.
Sounds good. Let's first conclude on something here and then tell the
world.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/20228#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs