[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Python ExoneraTor

To: Damian Johnson <atagar@xxxxxxxxxxxxxx>
Subject: Re: [tor-dev] Python ExoneraTor
From: Karsten Loesing <karsten@xxxxxxxxxxxxxx>
Date: Mon, 09 Jun 2014 09:22:01 +0200
Cc: "tor-dev@xxxxxxxxxxxxxxxxxxxx" <tor-dev@xxxxxxxxxxxxxxxxxxxx>
Delivered-to: archiver@xxxxxxxx
Delivery-date: Mon, 09 Jun 2014 03:22:15 -0400
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=/eSvVoFdDE25Yvz9CLv89/V95c0kEYgwu4Y3iaN4i1Q=; b=sYVQNYdYG6aP1lmYgvbkQ78HMxCh3sSmEjVsWVDHbjDfTg/0KC/+Mk/svynrs+ZF9u ZNHSvJAL1LmDZ6HhqgvjG284MOaeu6M3AFdN5X6WJiqTOzrAL9ffpmtrHtnNk5jByMYc nU/Mw4VpVu1RMAQ8hY/17S9jPrmnpRIwjW4Dz+G5/UxKNUs36+602fGacM6d8nBmJuHl 1bC9SSfSiVvqE8hJsAGwAUhoI0DtdJ6GHHq2OpzXtZtRaxmj1rOW2hofMzJTFijzUWg6 o5M0obQOKt7SJ+fqCOuCruAcG8kI9F0svVRipECmAhPOerHcyTyt+k4aDoGqn4aOfOUL Or/Q==
In-reply-to: <CAJdkzENy6unhXPAWWLvxaWqpvv-Z_h5rtdh86f_3W5oRs4xLHg@xxxxxxxxxxxxxx>
List-archive: <http://lists.torproject.org/pipermail/tor-dev/>
List-help: <mailto:tor-dev-request@lists.torproject.org?subject=help>
List-id: discussion regarding Tor development <tor-dev.lists.torproject.org>
List-post: <mailto:tor-dev@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=unsubscribe>
References: <CAJdkzENZ3G4oFJDWeABYf_cwQOCP4tvmuburDN4GtcE2QAjBSQ@xxxxxxxxxxxxxx> <CAJdkzENy6unhXPAWWLvxaWqpvv-Z_h5rtdh86f_3W5oRs4xLHg@xxxxxxxxxxxxxx>
Reply-to: tor-dev@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-dev" <tor-dev-bounces@xxxxxxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

On 09/06/14 01:26, Damian Johnson wrote:
> Oh, and another quick thought - you once mentioned that a descriptor
> search service would make ExoneraTor obsolete, and in looking it over
> I agree. The search functionality ExoneraTor provides is trivial. The
> only reason it requires such a huge database is because it's storing a
> copy of every descriptor ever made.
> 
> I suspect the actual right solution isn't to rewrite ExoneraTor at
> all, but rather develop a new service that can be queried for this
> descriptor data. That would make for a *much* more worthwhile project.
> 
> ExoneraTor? Nice to have. Descriptor archive service? Damn useful. :)

I agree, that was the idea behind Kostas' GSoC project last year.  And I
still think it's a good idea.  It's just not trivial to get right.

Regarding your comment about storing a copy of every descriptor ever
made, I believe that users trust ExoneraTor's results more if they see
the actual descriptors that lead to results.  Of course, I'm saying that
without knowing what ExoneraTor users actually want.  But let's not drop
descriptor copies from the database easily.

And, heh, when you say that the search functionality ExoneraTor provides
is trivial, a little part of me is dying.  It's the part that spent a
few weeks on getting the search functionality fast enough for
production.  That was not at all trivial.  The oraddress24, oraddress48,
and exitaddress24 fields as well as the indexes are the result of me
running lots and lots of sample queries and wondering about Postgres'
EXPLAIN ANALYZE results.  Just saying that it's not going to be trivial
to generalize the search functionality towards other fields than IP
addresses and dates.

If others want to follow, here's the SQL code I'm talking about:

https://gitweb.torproject.org/exonerator.git/blob/HEAD:/db/exonerator.sql

So, I'm happy to talk about writing a searchable descriptor archive.  It
could _start_ with ExoneraTor's functionality (minus the target address
and port thing discussed in that other email), and then we could
consider adding more searches.

Pretty sure that Kostas is reading this (in fact, I just cc'ed him), so
let me make one remark about optimizing Postgres defaults: I wrote quite
a few database queries in the past, and some of them perform horribly
(relay search) whereas others perform really well (ExoneraTor).  I
believe that the majority of performance gains can be achieved by
designing good tables, indexes, and queries.  Only as a last resort we
should consider optimizing the Postgres defaults.

You realize that a searchable descriptor archives focuses much more on
database optimization than the ExoneraTor rewrite from Java to Python
(which would leave the database untouched)?

All the best,
Karsten

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Follow-Ups:
- Re: [tor-dev] Python ExoneraTor
  - From: grarpamp
- Re: [tor-dev] Python ExoneraTor
  - From: Kostas Jakeliunas

References:
- [tor-dev] Python ExoneraTor
  - From: Damian Johnson
- Re: [tor-dev] Python ExoneraTor
  - From: Damian Johnson

Prev by Author: Re: [tor-dev] Python ExoneraTor
Next by Author: Re: [tor-dev] Python ExoneraTor
Previous by thread: Re: [tor-dev] Python ExoneraTor
Next by thread: Re: [tor-dev] Python ExoneraTor
Index(es):
- Author
- Thread