[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Metrics Plans

To: Kostas Jakeliunas <kostas@xxxxxxxxxxxxxx>
Subject: Re: [tor-dev] Metrics Plans
From: Damian Johnson <atagar@xxxxxxxxxxxxxx>
Date: Wed, 29 May 2013 07:49:24 -0700
Cc: tor-dev@xxxxxxxxxxxxxxxxxxxx
Delivered-to: archiver@xxxxxxxx
Delivery-date: Wed, 29 May 2013 10:49:38 -0400
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=LMNGoiUuiCYRFHcLby3ZlXS05+OvtPKdfSdE47EZcQ8=; b=t5js34BVEQSXVZFaMjxh7PpDKydpjISL/OZyXJzAHED2em5J+FYZ0WDldjMNgDFodX uRXkUWzsHYuWVT8ksL2GmcYXjd6Wegu3aQUnE/0B/kmE9ufE1yizvFEXUBfKbJwRhZBj c8Um204a+n3DdOy97ixFjHVfyQxafklZdgiAwbgCYW+NEoYiHRc7HQz+HJifIZmtPGiJ pwR+rFWuIVn/cFDQE05Cx0/cKBBny+p2SaH16zXZEoUmd+UvikEx7OA6ErxxO++wl664 DKkLHE6Bos2RacLuGOZpvKYyV4ywWlcW6ugFmsL/KMxM8sTLTitCiMYQowbkO6hnmwet 9fOg==
In-reply-to: <CAN0Koyj4FfKx4UG17xJJMheuVita+8cvOr_-LeOA_Tj47c+A2g@xxxxxxxxxxxxxx>
List-archive: <http://lists.torproject.org/pipermail/tor-dev>
List-help: <mailto:tor-dev-request@lists.torproject.org?subject=help>
List-id: discussion regarding Tor development <tor-dev.lists.torproject.org>
List-post: <mailto:tor-dev@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=unsubscribe>
References: <CAJdkzEON8c4aV-qqpw-Wf1Za8zgPYhoPquVOaJk1-m1bsNWFeA@xxxxxxxxxxxxxx> <519E61CE.6080900@xxxxxxxxxxxxxx> <CAJdkzEP7m+=+tKkK_d66YSjPf3ayA_=83f+iJwAOiJe32HQBpQ@xxxxxxxxxxxxxx> <CAN0Koyj4FfKx4UG17xJJMheuVita+8cvOr_-LeOA_Tj47c+A2g@xxxxxxxxxxxxxx>
Reply-to: tor-dev@xxxxxxxxxxxxxxxxxxxx
Sender: tor-dev-bounces@xxxxxxxxxxxxxxxxxxxx

> Here, I think it is realistic to try and use and import all the fields available from metrics-db-*.
> My PoC is overly simplistic in this regard: only relay descriptors, and only a limited subset of data fields is used in the schema, for the import.

I'm not entirely sure what fields that would include. Two options come
to mind...

* Include just the fields that we need. This would require us to
update the schema and perform another backfill whenever we need
something new. I don't consider this 'frequent backfill' requirement
to be a bad thing though - this would force us to make it extremely
easy to spin up a new instance which is a very nice attribute to have.

* Make the backend a more-or-less complete data store of descriptor
data. This would mean schema updates whenever there's a dir-spec
addition [1]. An advantage of this is that the ORM could provide us
with stem Descriptor instances [2]. For high traffic applications
though we'd probably still want to query the backend directly since we
usually won't care about most descriptor attributes.

> The idea would be import all data as DB fields (so, indexable), but it makes sense to also import raw text lines to be able to e.g. supply the frontend application with raw data if needed, as the current tools do. But I think this could be made to be a separate table, with descriptor id as primary key, which means this can be done later on if need be, would not cause a problem. I guess there's no need to this right now.

I like this idea. A couple advantages that this could provide us are...

* The importer can provide warnings when our present schema is out of
sync with stem's Descriptor attributes (ie. there has been a new
dir-spec addition).

* After making the schema update the importer could then run over this
raw data table, constructing Descriptor instances from it and
performing updates for any missing attributes.

Cheers! -Damian

[1] https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt
[2] This might be a no-go. Stem Descriptor instances are constructed
from the raw descriptor content, and needs it for str(), get_bytes(),
and signature validation. If we don't care about those we can subclass
Descriptor and overwrite those methods.
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

References:
- [tor-dev] Metrics Plans
  - From: Damian Johnson
- Re: [tor-dev] Metrics Plans
  - From: Kostas Jakeliunas

Prev by Author: [tor-dev] Remote descriptor fetching
Next by Author: Re: [tor-dev] Tor Launcher settings UI feedback request
Previous by thread: Re: [tor-dev] Metrics Plans
Next by thread: [tor-dev] Remote descriptor fetching
Index(es):
- Author
- Thread