[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #4439 [Metrics Utilities]: Develop a Java/Python API that wraps relay descriptor sources and provides unified access to them
#4439: Develop a Java/Python API that wraps relay descriptor sources and provides
unified access to them
-------------------------------+--------------------------------------------
Reporter: karsten | Owner: karsten
Type: task | Status: new
Priority: normal | Milestone:
Component: Metrics Utilities | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------------+--------------------------------------------
Comment(by karsten):
Replying to [comment:11 atagar]:
> I favor the iterator for that reason - callers that want everything
buffered can read everything into a list (simple to do with both python
and java).
Right. Makes sense.
> The callback is bad because you're having the handler block reads,
Oh, right, haven't thought of that.
> and stores are bad for the reasons mentioned earlier. If we went with an
iterator then it would be the best of both worlds: unblocked reads,
limited memory usage if the handler is faster than reads, and can be
converted into a store too. The only advantage to a callback is that it
would guarantee constant memory usage (if your handlers slow then you
could consume as much memory as your buffer size which would probably be
unbouned). On second thought that would be likely to come up when reading
local cached descriptors... lets do both.
We could even suspend adding new descriptors to the queue if the handler
is slow. That would work both for downloads and for reading from disk.
And we could implement descriptor parsing on demand, that is, when a
handler runs the first getter of a descriptor they received from the
queue. That would save quite some memory, too.
But! These are ideas to optimize something that's not even there.
I'd like to start with a single pattern. We can always make it more
complex later on.
> Iterator would just be a simple producer/consumer. The producer thread
adds descriptors to a buffer as they're read and the consumer pops
elements off and provides them to the caller (blocking if there's no
input). Iirc this would be handled in both python and java by a
synchronized queue (I forget the class...
java.util.concurrent.BlockingQueue?).
Cool. I think I like that pattern most. (Let me update the API and
example applications, and hopefully I'll still like it afterwards.)
> Requesting descriptors via the control socket can be for individual
relays. I was thinking there may be some counterpart for 'give me
descriptor for fingerprint X' via directory mirrors and authorities but on
second thought tor wouldn't use that so it would be odd if that capability
existed. Oh well...
Well, you can ask for the descriptor for fingerprint X. But the better
approach is to ask by descriptor ID, not by fingerprint. And it's better
to ask for more than one descriptor at a time, because it causes less
overhead for the directory. When you're bored, look at dir-spec.txt and
search for "http" to see what fancy things the directory protocol allows
you to do.
Anyway, let's focus on the iterator idea first.
I'll ask Sebastian to create two personal "DescripTor" repositories for
us. That way you can make changes to the code or documentation and tell
me to pull them, rather than having to describe your suggested changes
here. And once we agree on a project name, we can create an official
repository.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4439#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs