[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Searchable metrics archive - Onionoo-like API available online for probing



On 8/23/13 3:12 PM, Kostas Jakeliunas wrote:
> Hello,
> 
> this accompanies my status report [1], and includes info how to query the
> searchable metrics archive for anyone curious. I also refer to the original
> (now semi-outdated) project proposal/document. [0] Only sending to
> tor-dev@for now.
> 
> The Onionoo-like backend is listening on
> 
> http://ts.mkj.lt:5555/
> 
> (backup URI where it's actually running (domain unrelated):
> ravinesmp.com:5555)
> 
> This document details how it can be queried:
> 
> https://github.com/wfn/torsearch/blob/master/docs/onionoo_api.md
> 
> (It is, by design, an almost-subset (it does change some things as of now,
> though) of the Onionoo API [2].)

Hi Kostas,

I finally managed to test your service and take a look at the
specification document.

The few tests I tried ran pretty fast!  I didn't hammer the service, so
maybe there are still bottlenecks that I didn't find.  But AFAICS, you
did a great job there!

Thanks for writing down the specification.

So, would it be accurate to say that you're mostly not touching summary,
status, bandwidth, and weights resources, but that you're adding a new
fifth resource statuses?

In other words, does the attached diagram visualize what you're going to
add to Onionoo?  Some explanations:

- summary and details documents contain only the last known information
about a relay or bridge, but those are on a pretty high detail level (at
least for details documents).  In contrast to the current Onionoo, your
service returns summary and details documents for relays that didn't run
in the last week, so basically since 2007.  However, you're not going to
provide summary or details for arbitrary points in time, right?  (Which
is okay, I'm just asking if I understood this correctly.)

- bandwidth and weights documents always contain information covering
the whole lifetime of a relay or bridge, where recent events have higher
detail level.  Again, you're not going to change anything here besides
providing these documents for relays and bridges that are offline for
more than a week.

- statuses have the same level of detail for any time in the past.
These documents are new.  They're designed for the relay search service
and for a simplified version of ExoneraTor (which doesn't care about
exit policies and doesn't provide original descriptor contents).  There
are no statuses documents for bridges, right?

If this is correct (and please tell me if it's not), this seems like a
plausible extension of Onionoo.

A few ideas on statuses documents: how about you change the format of
statuses, so that there's no more one document per relay and valid-after
time, but exactly one document per relay?  That document could then
contain an array of status objects saying when the relay was contained
in the network status, together with information about its addresses.

It might be useful to group consecutive valid-after times when all
addresses and other relevant information about a relay stayed the same.
 So, rather than adding "valid_after", put in "valid_after_from" and
"valid_after_to".  And maybe you can compress information even more by
putting all relevant IP addresses in a list and refer to them by list
index.  Compare this to bandwidth and weights documents which are
optimized for size, too.

Maybe you could even generate these statuses documents in advance once
per hour and store them as JSON documents in the database, similar to
what's the plan for the other document types?  That might reduce
database load a lot, though you'll still need most of your database foo
for the search part.

Happy to chat more about these ideas on IRC.

> Please report any inconsistencies / errors / time-outs / anything that
> takes a few seconds or more to execute. I'm logging the queries (together
> with IP addresses for now - for shame!), so will be able to later correlate
> activity with database load, which will hopefully provide some realistic
> semi-benchmark-like data.

I could imagine that you'll get more testers if you provide instructions
for using your service as relay search or ExoneraTor replacement.  Maybe
you could write down the five most common searches that people could
perform to search for a relay or find out whether an IP address was a
Tor relay at a given time?  If you want, I can link to such a page from
the relay search and the ExoneraTor page.

All in all, great work!  Nice!

Thanks,
Karsten

Attachment: onionoo-metrics-search.png
Description: PNG image

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev