[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #19791 [Metrics/metrics-lib]: Use CollecTor's index.json for download; adapt current download to use new date format
#19791: Use CollecTor's index.json for download; adapt current download to use new
date format
---------------------------------+-----------------------------------
Reporter: karsten | Owner: iwakeh
Type: defect | Status: needs_revision
Priority: High | Milestone: metrics-lib 1.4.0
Component: Metrics/metrics-lib | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
---------------------------------+-----------------------------------
Changes (by karsten):
* status: needs_review => needs_revision
Comment:
Alright, let's go through the last four commits in that branch:
91fdb9b is already contained in master as 1dab40d.
a74563a looks good and is merged to my branch (see below) as 7a0e109.
1276c14 looks good, too, and is merged to my branch as e9c0731. Thanks
for caring about tests!
32f628f is preliminary merged to my branch as 82eb1cf, but I have a couple
of questions and change requests, as follows:
Commit bf5ba78 in [https://gitweb.torproject.org/user/karsten/metrics-
lib.git/log/?h=task-19791 my task-19791 branch] contains a few trivial
fixes. Please take a look and let me know if you're okay with those.
Regarding the new subpackage `org.torproject.descriptor.index`, I'm yet
unclear whether that's supposed to be "visible" to API users or not, that
is, if they're supposed to import and/or use those classes directly. If
it is, that's going to grow the overall interface quite a bit, requiring
us to keep the public parts stable in subsequent releases. For example,
I'm unclear whether CollecTor is supposed to instantiate its own `*Node`
classes when writing its `index.json*` files. Ideally, metrics-lib would
hide away those classes by providing some kind of
`DescriptorIndexGenerator` interface that takes a local directory as
parameter and writes one or more `index.json[*]` files, but I'm not sure
what your plan is there. If the plan is to hide that package from users,
how about we rename it to `org.torproject.descriptor.impl.index` to make
that clearer? We could even create similar implementation subpackages in
the future to clean up the `impl` package a bit, but that's not something
that users would have to care about.
Can we avoid changing the parameter usage of
`DescriptorIndexCollector#collectDescriptors()` by overloading that method
in the interface? How about we add a new method `collectDescriptor(String
collecTorIndexUrl, String collecTorIndexPath, String[] remoteDirectories,
...)` and use an implementation-specific default for `collecTorIndexPath`
in the current method, such as `"/index.html"` for
`DescriptorCollectorImpl` and `"/index.json.xz"` for
`DescriptorIndexCollector`? If we retain backward compatibility here, I'd
say we could switch to `DescriptorIndexCollector` in 1.5.0 when this class
has seen some more testing. Otherwise I'd suggest waiting for 2.0.0.
The condition `if (!filepath.exists() && filepath.mkdirs())` in
`DescriptorIndexCollector#fetchRemoteFiles()` looks wrong. Shouldn't it
be `!filepath.mkdirs()` there? I didn't test this, but the API says that
`mkdirs()` returns "true if and only if the directory was created, along
with all necessary parent directories; false otherwise".
Is `FileNode#lastModifiedMillis()` called often? If so, it may be better
to use `ParseHelper#getDateFormat()` to obtain a thread-safe `DateFormat`
instance for the given format and store it in a map. That avoids creating
single-use instances of that class over and over. We'd have to make that
method public for this, but that shouldn't be an issue. However, if it's
not called very often, we can skip this.
In `IndexNode#retrieveFiles()` I'm not yet clear what situations lead to
returning the half-done map in the middle of the loop. If one invalid
remote directory leads to that case, I think we should either warn and
skip that remote directory or return an empty map. Otherwise, the order
of remote directories would affect the result, and that might lead to bugs
in user applications that are quite hard to find.
Can we pick a smaller `test.json` and compressed equivalents in
`src/test/resources/`? For example, we could take the `index.json` from
the main CollecTor instance and throw out all files that haven't been
modified in the past, say, four weeks. Or did you include a large file on
purpose to check something that cannot be tested with a smaller file?
I'd say once we have resolved these questions, it's good to go into 1.4.0,
even today. Thanks for working on this!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/19791#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs