Re: [tor-dev] Incorporating your torsearch changes into Onionoo

On Wed, Oct 23, 2013 at 2:32 PM, Karsten Loesing <karsten@xxxxxxxxxxxxxx> wrote:

On 10/11/13 4:05 PM, Kostas Jakeliunas wrote:

Oops! ÂSorry for the delay in responding! ÂResponding now.

> On Fri, Oct 11, 2013 at 12:00 PM, Karsten Loesing <karsten@xxxxxxxxxxxxxx>wrote:
>
>> Hi Kostas,
>>
>> should we move this thread to tor-dev@?
>>
>
> Hi Karsten!
>
> sure.
>
>>From our earlier conversation about your GSoC project:
>>> In particular, we should discuss how to integrate your project into
>>> Onionoo. ÂI could imagine that we:
>>>
>>> Â- create a database on the Onionoo machine;
>>> Â- run your database importer cronjob right after the current Onionoo
>>> cronjob;
>>> Â- make your code produce statuses documents and store them on disk,
>>> similar to details/weights/bandwidth documents;
>>> Â- let the ResourceServlet use your database to return the
>>> fingerprints to return documents for; and
>>> Â- extend the ResourceServlet to support the new statuses documents.
>>>
>>> Maybe I'm overlooking something and you have a better plan? ÂIn any
>>> case, we should take the path that implies writing as little code as
>>> possible to integrate your code in Onionoo.
>>
>> Let me know what you think!
>>
>
> Sounds good. Responding to particular points:
>
>> Â- create a database on the Onionoo machine;
>> Â- run your database importer cronjob right after the current Onionoo
>> cronjob;
>
> These should be no problem and make perfect sense. It's always best to use
> raw SQL table creation routines to make sure the database looks exactly
> like the one on the dev machine I guess (cf. using SQLAlchemy abstractions
> to do that (I did that before)).
>
> Current SQL script to do that is at [1]. I'll look over it. For example,
> I'd (still) like to generate some plots showing the chances of two
> fingerprints having the same substring (this is for the intermediate
> fingerprint table.) (One axis would be substring length, another would be
> the possibility in (portions of) %.) As of now, we still use
> substr(fingerprint, 0, 12), and it is reflected in the schema.
>
> Overall though, no particular snags here.

I don't follow. ÂBut before we get into details here, I must admit that
I was too optimistic about running your code on the current Onionoo
machine. ÂI ran a few benchmark tests on it last week to compare it to
new hardware, and those tests almost made it fall over. ÂWe should not
even think about adding new load to the current machine.

New plan: can you run an Onionoo instance with your changes on a
different machine? Â(If you need anything from me, like a tarball of the
status/ and out/ directories, I'm happy to provide them to you.) ÂI
think we should run this instance for a while to see how reliable it is.
ÂAnd once we're confident enough, we'll likely have new hardware for the
new Onionoo, so that we can move it there.

This sounds like a very good idea. Ok, I can try and do this. Sorry for delaying my response as well, I'll try and follow up with what I need (if anything).

>> Â- make your code produce statuses documents and store them on disk,
>> similar to details/weights/bandwidth documents;
>
> Right, so if we are planning to support all V3 network statuses for all
> fingerprints, how are we to store all the status documents? The idea is to
> preprocess and serve static JSON documents, correct (as in the current
> Onionoo)? (cf. the idea of simply caching documents: if we serve a
> particular status document, it gets cached, and depending on the query
> parameters (date range restriction, e.g.) it may be set not to expire at
> all.)
>
> Or should we try and actually store all the statuses (the condensed status
> document version [2], of course)?

Let's do it as the current Onionoo does it. ÂThis code does not exist,
right?

I've done some small testing on a local system, it seems the Onionoo way is plausible, since the generation of all the old(er) status etc. documents needs to happen only once (obviously, but now I understand this means the number of resulting status documents and their size is not such a big deal after all.) I don't have good code for it as of yet.

>> Â- let the ResourceServlet use your database to return the
>> fingerprints to return documents for; and
>> Â- extend the ResourceServlet to support the new statuses documents.
>
> Sounds good. I assume you are very busy with other things as well, so
> ideally maybe you had in mind that I could try and do the Java part? :)
> Though, since you are much more familiar with (your own) code, you could
> probably do it faster than me. Not sure.
> Any particular technical issues/nuances here (re: ResourceServlet)?

Can you give it a try? ÂHappy to help with specific questions about
ResourceServlet, and I'll try hard to reply faster this time. ÂAgain,
sorry for the delay!

Okay! I've been tinkering a bit, actually. Will see if I can produce something decent and reliable.

Best wishes

Kostas.

>
> [1]: https://github.com/wfn/torsearch/blob/master/db/db_create.sql
> [2]:
> https://github.com/wfn/torsearch/blob/master/docs/onionoo_api.md#network-status-entry-documents
> (e.g.
> http://ts.mkj.lt:5555/statuses?lookup=9695DFC35FFEB861329B9F1AB04C46397020CE31&condensed=true
> Â)
>