Re: [tor-dev] Onionoo protocol/implementation nuances / Onionoo-connected metrics project stuff

A couple of thoughts to add from my experience working with DB's:

-Consider a partitioning strategy for the data if it makes sense. Queries which will hit only a single partition will be much faster in general.

-Check out advanced indexing strategies like Solr. this can sometimes speed up certain queries by orders of magnitude, and works well with batch systems. Often it is easier to setup & maintain than advanced DB tuning.

-I second Karsten's comment on multi-column indexes. Designing the right indexes & columns to include isn't simple though - you have to know what is going to be searched.

-If you know in advance what kind's of where clauses you will see, consider implementing partial indexes.

If you have specific problem queries w/ explain plans, feel free to post them and I would be glad to spend a few minutes looking them over.

Charlie

On Tue, Jul 30, 2013 at 3:35 AM, Karsten Loesing <karsten@xxxxxxxxxxxxxx> wrote:

On 7/29/13 10:15 PM, Kostas Jakeliunas wrote:
> It should also be possible to do efficient *estimated* COUNTs (using
> reltuples [1, 2], provided the DB can be regularly VACUUMed + ANALYZEd
> (postgres-specific awesomeness)) - i.e. if everything is set up right,
> doing COUNTs would be efficient. This would be nice not only because one
> could run very quick queries asking e.g. "how many consensuses include
> nickname LIKE %moo% between [daterange1, daterange2]?" (if e.g. full text
> search is set up) but also, if we have to resort to sometimes returning an
> arbitrary subset of results (or sorted however we wish, but the sorting
> being done already on a small subset of results, if that makes sense), we'd
> be able to also supply info how many other results matching these
> particular criteria there are, and so on. The usefulness of all this really
> depends on intended use cases, and I suppose here some discussion could be
> had who / how would an Onionoo system covering all / most of all the
> descriptor+consensus archives and hopefully having an extended set of
> filter / result options be used?

I can see how estimated counts could be valuable information. Or not.
Do you want to first specify what type of queries you're planning to
support?

(I didn't spot anything in the other two mails that requires a reply.
If there are still open questions, please let me know.)

Best,
Karsten

>
> [1]: http://www.varlena.com/GeneralBits/120.php
> [2]: http://wiki.postgresql.org/wiki/Slow_Counting
>

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev