[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] [GSoC 2013] Status report - Searchable metrics archive

To: Kostas Jakeliunas <kostas@xxxxxxxxxxxxxx>
Subject: Re: [tor-dev] [GSoC 2013] Status report - Searchable metrics archive
From: Karsten Loesing <karsten@xxxxxxxxxxxxxx>
Date: Mon, 19 Aug 2013 14:49:50 +0200
Cc: tor-dev@xxxxxxxxxxxxxxxxxxxx
Delivered-to: archiver@xxxxxxxx
Delivery-date: Mon, 19 Aug 2013 08:50:07 -0400
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=zHqgE/pQ1zC0ywEUobZ4BHwqp82kJlEYqHz3MbyShTM=; b=TA5x3z+08usnryOVBLCp4zMQGWZFXbe5ZYP2CEJvPnoIt2nm5ij+H1PtQWo+JI7StP 2r/YHE7/Dxoc82EjPdEdwN3k567Ar2yhx+Z84bH27iomnHs81ZgGOmMzY0OuzvO4xs98 DxjxuIq/sjLY2NzKKJG/1IEDuOuEz8uMHhpHjIuev/hY+sUSiwIRZFvTAImdLIv16f8/ NV99cE95Z5KA7utP/aPDQsbmxJgmPB1V/aB5I24/ehz17dZpD9APC+AWZRrG9UZuX1MS hA7f54X3+ITIUYzxgTK9Uhiku3R9X4C4FR5vpk6WqSMqcVVHiZarRcDDgDk0/Lp21/01 g3dA==
In-reply-to: <CAN0KoyhNyKDcraYFzuP06pSRxPSCjoC4R64Y=2qxWnUK7jqxtA@xxxxxxxxxxxxxx>
List-archive: <http://lists.torproject.org/pipermail/tor-dev/>
List-help: <mailto:tor-dev-request@lists.torproject.org?subject=help>
List-id: discussion regarding Tor development <tor-dev.lists.torproject.org>
List-post: <mailto:tor-dev@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=unsubscribe>
References: <CAN0KoyhmNwQu08qHrufGX65y2=HrY8soNpVSkxijjP46F3xnZA@xxxxxxxxxxxxxx> <5208C406.2040307@xxxxxxxxxxxxxx> <CAN0KoyjeyHUBrU0xej3Qmf0aDyDME4bP+PaxF3vO5FCYYSVhDw@xxxxxxxxxxxxxx> <520A1550.6010907@xxxxxxxxxxxxxx> <CAN0Koygd15aKMSkyjFmCkALNaTjY_K+yCVx5KmsEHWFV+v7mKw@xxxxxxxxxxxxxx> <520B5CF2.90607@xxxxxxxxxxxxxx> <CAN0KoyhNyKDcraYFzuP06pSRxPSCjoC4R64Y=2qxWnUK7jqxtA@xxxxxxxxxxxxxx>
Reply-to: tor-dev@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-dev" <tor-dev-bounces@xxxxxxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8

Hi Kostas,

On 8/15/13 9:50 PM, Kostas Jakeliunas wrote:
> On Wed, Aug 14, 2013 at 1:33 PM, Karsten Loesing <karsten@xxxxxxxxxxxxxx>wrote:
> 
>>
>> Looks like pg_trgm is contained in postgresql-contrib-9.1, so it's more
>> likely that we can run something requiring this extension on a
>> torproject.org machine.  Still, requiring extensions should be the last
>> resort if no other solution can be found.  Leaving out searches for
>> nickname substrings is a valid solution for now.
> 
> 
> Got it.
> 
>  >> Do you have a list of searches you're planning to support?
>>>
>>>
>>> These are the ones that should *really* be supported:
>>>
>>>    - ?search=nickname
>>>    - ?search=fingerprint
>>>    - ?lookup=fingerprint
>>>    - ?search=address [done some limited testing, currently not focusing
>> on
>>>    this]
>>
>> The lookup parameter is basically the same as search=fingerprint with
>> the additional requirement that fingerprint must be 40 characters long.
>>  So, this is the current search parameter.
>>
>> I agree, these would be good to support.
>>
>> You might also add another parameter ?address=address for ExoneraTor.
>> That should, in theory, be just a subset of the search parameter.
>>
> 
> Oh yes, makes a lot of sense, OK.
> 
> By the way: I considered having the last consensus (all the data for at
> least the /summary document, or /details as well) be stored in memory (this
> is possible) (probably as a hashtable where key = fingerprint, value = all
> the fields we'd need to return) so that when the backend is queried without
> any search criteria, it would be possible to avoid hitting the database
> (which is always nice), and just dump the last consensus. (There's also
> caching of course, which we could discuss at a (probably quite a bit) later
> point.)

Okay.

>>>    - ?running=<boolean>
>>
>> This one is tricky.  So far, Onionoo looks only at the very latest
>> consensus or bridge status to decide if a relay or bridge is running or
>> not.
>>
>> But now you're adding archives to Onionoo, so that people can search for
>> a certain consensus or certain bridge status in the past, or they can
>> search for a time interval of consensuses or bridge statuses.  How do
>> you define that a relay or bridge is running, or more importantly
>> included as not running?
>>
> 
> Agree, this is not clear. (And whatever ends up being done, this should be
> well documented and clearly articulated (of course.))
> 
> For me at least, 'running' implies the clause whether a given relay/bridge
> is running *right now*, i.e. whether it is present in the very last
> consensus. (Here's where that hashtable (with fingerprints as keys) in
> memory might be able to help: no need to run a separate query / do an inner
> join / whatnot; it would depend on whether there's a LIMIT involved though,
> etc.)
> 
> I'm not sure which one is more useful (intuitively for me, the "whether it
> is running *right now*" is more useful.) Do you mean that it might make
> sense to have a field (or have "running" be it) indicating whether a given
> relay/bridge was present in the last consensus in the specified date range?
> If this is what you meant, then the "return all that are/were not running"
> clause would indeed be kind of..peculiar (semantically - it wouldn't be
> very obvious what's it doing.)
> 
> Maybe it'd be simpler to first answer, what would be the most useful case?
> 
>> How do you define that a relay or bridge [should be] included as not
> running?
> 
> Could you rephrase maybe? Do you mean that it might be difficult to
> construct sane queries to check for this condition? Or that the situation
> where
> 
>    - a "from..to" date range is specified
>    - ?running=false is specified
> 
> would be rather confusing ('exclude those nodes which are running *right
> now* ('now' possibly having nothing to do with the date range)?

I was referring to the situation you describe.  But yes, I agree that
your definition of whether a relay or bridge is running *right now* can
work here.  So, never mind my question/concern, this looks fine!

>  >    - ?flag=flag [every kind of clause which further narrows down the
>> query
>>>    is not bad; the current db model supports all the flags that Stem
>> does, and
>>>    each flag has its own column]
>>
>> I'd say leave this one out until there's an actual use case.
>>
> 
> Ok, I won't focus on these now; just wanted to say that these should be
> possible without much ado/problems.

Okay.

>>>    - ?first_seen_days=range
>>>    - ?last_seen_days=range
>>>
>>> As per the plan, the db should be able to return a list of status
>> entries /
>>> validafter ranges (which can be used in {first,last}_seen_days) given
>> some
>>> fingerprint.
>>
>> Oh, I think there's a misunderstanding of these two fields.  These
>> fields are only there to search for relays or bridges that have first
>> appeared or were last seen on a given day.
>>
>> You'll need two new parameters, say, from=datetime and to=datetime (or
>> start=datetime and end=datetime) to define a valid-after range for your
>> search.
>>
> 
> Ah! I wasn't paying attention here. :) Ok, all good.

Okay.

I wonder, is there a document describing the new API somewhere?  If not,
do you mind creating one?

All the best,
Karsten

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Follow-Ups:
- Re: [tor-dev] [GSoC 2013] Status report - Searchable metrics archive
  - From: Kostas Jakeliunas

References:
- [tor-dev] [GSoC 2013] Status report - Searchable metrics archive
  - From: Kostas Jakeliunas
- Re: [tor-dev] [GSoC 2013] Status report - Searchable metrics archive
  - From: Karsten Loesing
- Re: [tor-dev] [GSoC 2013] Status report - Searchable metrics archive
  - From: Kostas Jakeliunas
- Re: [tor-dev] [GSoC 2013] Status report - Searchable metrics archive
  - From: Karsten Loesing
- Re: [tor-dev] [GSoC 2013] Status report - Searchable metrics archive
  - From: Kostas Jakeliunas
- Re: [tor-dev] [GSoC 2013] Status report - Searchable metrics archive
  - From: Karsten Loesing
- Re: [tor-dev] [GSoC 2013] Status report - Searchable metrics archive
  - From: Kostas Jakeliunas

Prev by Author: Re: [tor-dev] Using Stem's descriptor fetching module to replace the Java consensus-health checker
Next by Author: Re: [tor-dev] Using Stem's descriptor fetching module to replace the Java consensus-health checker
Previous by thread: Re: [tor-dev] [GSoC 2013] Status report - Searchable metrics archive
Next by thread: Re: [tor-dev] [GSoC 2013] Status report - Searchable metrics archive
Index(es):
- Author
- Thread