[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #19791 [Metrics/metrics-lib]: Use CollecTor's index.json for download; adapt current download to use new date format
#19791: Use CollecTor's index.json for download; adapt current download to use new
date format
---------------------------------+-----------------------------------
Reporter: karsten | Owner: iwakeh
Type: defect | Status: merge_ready
Priority: High | Milestone: metrics-lib 1.4.0
Component: Metrics/metrics-lib | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
---------------------------------+-----------------------------------
Changes (by karsten):
* status: needs_review => merge_ready
Comment:
Replying to [comment:11 iwakeh]:
> Replying to [comment:10 karsten]:
> [...]
>
> > Regarding the new subpackage `org.torproject.descriptor.index`, I'm
yet unclear whether that's supposed to be "visible" to API users or not,
that is, if they're supposed to import and/or use those classes directly.
If it is, that's going to grow the overall interface quite a bit,
requiring us to keep the public parts stable in subsequent releases. For
example, I'm unclear whether CollecTor is supposed to instantiate its own
`*Node` classes when writing its `index.json*` files. Ideally, metrics-
lib would hide away those classes by providing some kind of
`DescriptorIndexGenerator` interface that takes a local directory as
parameter and writes one or more `index.json[*]` files, but I'm not sure
what your plan is there. If the plan is to hide that package from users,
how about we rename it to `org.torproject.descriptor.impl.index` to make
that clearer? We could even create similar implementation subpackages in
the future to clean up the `impl` package a bit, but that's not something
that users would have to care about.
>
> I'd like to add the package as '''alpha'''. Warning that it can change
still in unexpected ways. Thus, it can be used in CollecTor and from the
findings there we can decide to remove it from the public interface, or
change it w/o being backward compatible.
> I provide a commit with the package-info and more javadoc below.
>
> The *Node classes are of use on their own for clients that want to know
about a CollecTor's instance files. They are there anyway and maintained,
so others can just use them as well.
Okay, I'm fine postponing this decision, and adding that alpha notice
should be clear enough.
However, I'm yet unconvinced by the reasons you're giving. Ideally,
clients shouldn't have to deal with the details of an index.json file. We
should give them interfaces that are powerful enough to do all the things
they'd want to do. Also, my past experience is that maintaining code can
be very different depending on whether it's exposed in an API or not. I'd
rather want to provide fewer details in interfaces than what's already
there. But, no need to find a conclusion for this today, we can still
think about this for 1.5.0.
> > Can we avoid changing the parameter usage of
`DescriptorIndexCollector#collectDescriptors()` by overloading that method
in the interface? How about we add a new method `collectDescriptor(String
collecTorIndexUrl, String collecTorIndexPath, String[] remoteDirectories,
...)` and use an implementation-specific default for `collecTorIndexPath`
in the current method, such as `"/index.html"` for
`DescriptorCollectorImpl` and `"/index.json.xz"` for
`DescriptorIndexCollector`? If we retain backward compatibility here, I'd
say we could switch to `DescriptorIndexCollector` in 1.5.0 when this class
has seen some more testing. Otherwise I'd suggest waiting for 2.0.0.
>
> I wouldn't want to change the interface for that. Both are URL strings
and whoever uses the non-default implementation must know what their
doing, i.e. at least read javadoc of the class they explicitly request to
be used. At the point DescriptorIndexCollector becomes default we might
not want to keep maintaining the old implementation and simply remove it.
Then the additional method wouldn't make sense any more.
My point is that we could do the switch to the new implementation in a
version 1.5.0 if it remains backward-compatible, whereas we'd have to call
that version 2.0.0 otherwise. So, we can merge this code as is for 1.4.0,
but we should aim for a backward-compatible version if we want to make it
the new default in 1.5.0.
Here's an even better suggestion: we accept both, base URLs (like
`"https://collector.torproject.org"`) and full URLs to index files (like
`"https://collector.torproject.org/index.html"` or
`"https://collector.torproject.org/index.json.xz"`), in the current
`collectDescriptors()` method, and if no file is specified, we'll use an
implementation-specific default (`"/index.html"` in
`DescriptorCollectorImpl` and `"/index.json.xz"` in
`DescriptorIndexCollector`). That doesn't break current applications that
just provide a base URL, and it enables future applications to select
their favorite compression type. Likewise, no need to make a decision
today, but let's consider this for 1.5.0.
> [...]
>
> > Is `FileNode#lastModifiedMillis()` called often? If so, it may be
better to use `ParseHelper#getDateFormat()` to obtain a thread-safe
`DateFormat` instance for the given format and store it in a map. That
avoids creating single-use instances of that class over and over. We'd
have to make that method public for this, but that shouldn't be an issue.
However, if it's not called very often, we can skip this.
> >
>
> I'm aware of `ParseHelper#getDateFormat()`, but I didn't want to
introduce dependencies to the `impl` package.
> Actually, `ParseHelper` should be split into a utility class that is
part of the interface and the implementation specific part (should be part
of the redesign ticket #19640). Once, this class exists the
`getDateFormat` method should be used; until then it wont break I think.
Ah, fair enough. Happy to discuss this more on that other ticket.
> [...]
>
> > Can we pick a smaller `test.json` and compressed equivalents in
`src/test/resources/`? For example, we could take the `index.json` from
the main CollecTor instance and throw out all files that haven't been
modified in the past, say, four weeks. Or did you include a large file on
purpose to check something that cannot be tested with a smaller file?
> >
>
> I wanted to use the 'real' file as it is provided by the current main
CollecTor without any edits, because I only have these small toy test
files otherwise. There can be a timeout to the test method and we'd notice
if that takes too long?
Okay to leave these files in. No need to add a timeout to the test, as
we'll notice slow tests pretty easily, and then we can always take them
out. I was also more concerned about the file size than the test
execution time, but I admit that file size doesn't really matter here.
Never mind.
> > I'd say once we have resolved these questions, it's good to go into
1.4.0, even today. Thanks for working on this!
>
> Fine, thanks for all the things you spotted!
>
> Please review the [https://gitweb.torproject.org/user/iwakeh/metrics-
lib.git/?h=task-19791 new commits] based on your branch above.
All changes look good. I'll squash your latest commits and mine into
32f628f, run some tests, and release 1.4.0. Let's just not forget the
issues I'm raising above for 1.5.0.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/19791#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs