[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #17939 [Onionoo]: Optimise the construction of details documents with field constraints
#17939: Optimise the construction of details documents with field constraints
-------------------------+---------------------
Reporter: fmap | Owner:
Type: enhancement | Status: new
Priority: Low | Milestone:
Component: Onionoo | Version:
Severity: Minor | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Sponsor: |
-------------------------+---------------------
Comment (by fmap):
> I'm in favor of taking Gson out of the loop for two reasons: it's a
potential performance bottleneck (though I never measured that), and it's
a maintenance nightmare because it's just to easy to miss a new details
document field in that hacked part of the code.
Regarding a performance bottleneck: an eyeball of the
[http://hack.rs/~vi/onionoo/flame-56e0e01.svg flame graph we've been
discussing on the list] suggests 'formatNodeStatus' spends on average ten
times more time producing 'details' than any other document type. It looks
like about five percent of total CPU time over the sample, but there's a
few too many divorced frames to be sure (and I've lost the raw data
somewhere.) I'll make another recording later and report back with more
precise details.
> Regarding the approach, I'd favor one that doesn't require keeping
anything new in memory but instead process details document contents on
the fly. We'll have to read a details document from disk if we want to
include part of it in a response anyway, and once it's in memory it's
cheap to create such an index where fields start and end and only pick the
ones we want.
That sounds reasonable.
> It could just be that Gson adds some overhead that we could avoid here.
And of course the current approach has the downside of being hard to
maintain, which we could fix. Maybe we can try out different approaches
and compare them with respect to performance and robustness?
Do you mean avoiding Gson in producing a boundary index? I think there's
more to it than the performance overhead of a redundant parse. In
populating its result, the parser I referenced is sensitive to structure
that JSON parsers typically aren't: the length of what the JSON spec calls
'structural characters' (`/[:,\[\]{}]/`), as well as that of the (variable
length) whitespace allowed to surround them. I don't see anything in the
GSON user guide that would admit intelligent interpretation of those
tokens, and they're critical (in the general case at least) to the precise
determination of boundaries. That said.. given that written documents
don't presently include whitespace around structural tokens, it should be
possible (assuming Gson can retain the initial field ordering) to derive
the right coordinates from a serialisation into a JSON ADT. But that
approach strikes me as frail and indirect.
Though I worry I might've misread your message. Do have other approaches
in mind to produce a boundary index? Or perhaps you meant only to
benchmark the proposed implementation against the existing one?
> Bonus points: we could use this new approach to allow the `fields`
parameter for other documents than details documents.
Sounds good.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/17939#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs