[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #17939 [Onionoo]: Optimise the construction of details documents with field constraints
#17939: Optimise the construction of details documents with field constraints
-----------------------------+-----------------
Reporter: fmap | Owner:
Type: enhancement | Status: new
Priority: Low | Milestone:
Component: Onionoo | Version:
Severity: Minor | Keywords:
Actual Points: | Parent ID:
Points: | Sponsor:
-----------------------------+-----------------
In a [https://lists.torproject.org/pipermail/metrics-
team/2015-December/000026.html recent post to metrics-team@], Karsten
pointed toward an expensive operation within the response builder:
> Once per hour, the updater fetches new data and in the end produces
JSON-formatted strings that it writes to disk. The servlet reads a
(comparatively) small index to memory that it uses to handle requests, and
when it builds responses, it tries hard to avoid (de-)serializing JSON.
>
> The only situation where this fails is when [a] request [to the /details
endpoint] contains the fields parameter. Only in that case we'll have to
deserialize, pick the fields we want, and serialize again. I could imagine
that this shows up in profiles pretty badly, and I'd love to fix this, I
just don't know how.
I think we can exploit a few properties of the updater to handle this case
in a more efficient manner.
It seems safe to assume that: (1) the produced response is always the
concatenation of a sequence of a substrings within the written document
^[#fn1 1]^; (2) that the documents on disk are legal JSON and correctly
typed (having been written by the updater, which we trust and control);
and (3) that the contents of the file are trivially parsed (belonging to a
restriction of JSON with known and non-redundant keys, the grammar is at
most context-free).
I believe these conditions admit introducing a relatively efficient parser
generator pair, one that avoids request-time de-serialisation. Given a
request, the result of the parser would be a sequence of pairs of indices
marking the boundaries of each field. The generator would reproduce the
input, but for excluding text regions corresponding to fields excluded by
the request.
No patch yet, but I've hacked together a small (inefficient mess of a..)
proof of concept that hopefully illustrates the basic idea:
http://hack.rs/~vi/onionoo/IndexJSON.hs
sha256: 14a09f26fadab8d989263dc76d368e41e63ba6c5279d37443878d6c1d0c87834
http://www.webcitation.org/6e3NEOLJg
{{{
% jq . 96B16C78BB54BA0F56EEA8721781C9BD01B7E9AE
{
"nickname": "Unnamed",
"hashed_fingerprint": "96B16C78BB54BA0F56EEA8721781C9BD01B7E9AE",
"or_addresses": [
"10.103.224.131:443"
],
"last_seen": "2015-11-23 03:40:44",
"first_seen": "2015-11-20 04:38:22",
"running": false,
"flags": [
"Valid"
],
"last_restarted": "2015-11-22 01:23:06",
"advertised_bandwidth": 49168,
"platform": "Tor 0.2.4.22 on Windows 8"
}
% index-json 96B16C78BB54BA0F56EEA8721781C9BD01B7E9AE
("nickname",(2,21,22))
("hashed_fingerprint",(23,85,86))
("or_addresses",(87,123,124))
("last_seen",(125,157,158))
("first_seen",(159,192,193))
("running",(194,208,209))
("flags",(210,226,227))
("last_restarted",(228,265,266))
("advertised_bandwidth",(267,294,295))
("platform",(296,333,333))
% cut -c1 -c23-158 -c194- 96B16C78BB54BA0F56EEA8721781C9BD01B7E9AE | jq .
{
"hashed_fingerprint": "96B16C78BB54BA0F56EEA8721781C9BD01B7E9AE",
"or_addresses": [
"10.103.224.131:443"
],
"last_seen": "2015-11-23 03:40:44",
"running": false,
"flags": [
"Valid"
],
"last_restarted": "2015-11-22 01:23:06",
"advertised_bandwidth": 49168,
"platform": "Tor 0.2.4.22 on Windows 8"
}
}}}
What do you think?
,,
[=#fn1 ^1^] There's a factor of surprise in the treatment of nullable
properties, but it turns out that the existing behaviour works in our
favour. GSON removes 'null'ed fields in writing documents to disk; e.g.
note the absence of an AS number here:
{{{
% pwd
/srv/onionoo.torproject.org/onionoo/out/details
% jq . $(ls | shuf -n1)
{
"nickname": "Unnamed",
"hashed_fingerprint": "CE0A4E1B6C545FF9F25A9CAF5926732559A2C0FE",
"or_addresses": [
"10.190.9.13:443"
],
"last_seen": "2015-12-16 22:41:56",
"first_seen": "2015-11-11 21:01:43",
"running": true,
"flags": [
"Fast",
"Valid"
],
"last_restarted": "2015-12-16 02:13:40",
"advertised_bandwidth": 59392,
"platform": "Tor 0.2.4.23 on Windows 8"
}
}}}
,,
But it *also* excludes them from /details responses, even when specified
by name using the 'fields' parameter:
{{{
% curl -s
'http://onionoo.local/details?lookup=CE0A4E1B6C545FF9F25A9CAF5926732559A2C0FE&fields=hashed_fingerprint,as_number'
| jq .bridges[]
{
"hashed_fingerprint": "CE0A4E1B6C545FF9F25A9CAF5926732559A2C0FE"
}
}}}
,,So it doesn't seem necessary to add any text atop the persisted
serialisation, even in this case.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/17939>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs