[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #5232 [BridgeDB]: Import bridges into BridgeDB in a separate thread and database transaction
#5232: Import bridges into BridgeDB in a separate thread and database transaction
-------------------------+-------------------------------------------------
Reporter: karsten | Owner: sysrqb
Type: defect | Status: closed
Priority: major | Milestone:
Component: | Version:
BridgeDB | Keywords: bridgedb-email, bridgedb-db,
Resolution: fixed | bridgedb-https, bridgedb-0.1.7
Actual Points: | Parent ID:
Points: |
-------------------------+-------------------------------------------------
Changes (by isis):
* status: needs_review => closed
* keywords: bridgedb-email, bridgedb-db, bridgedb-https, bridgedb-0.1.x =>
bridgedb-email, bridgedb-db, bridgedb-https, bridgedb-0.1.7
* resolution: => fixed
Comment:
Replying to [comment:17 sysrqb]:
> Timings from another run:
>
> 1,000 bridges:
> {{{
> * Starting the servers took: 6s
> * Restarting (SIGHUP) took: 7s
> }}}
>
> 10,000 bridges:
> {{{
> * Starting the servers took: 1m 17s
> * Restarting (SIGHUP) took: 54s
> }}}
>
> Master, 10,000
> {{{
> * Starting the server took: 56m 44s
> }}}
>
> These numbers seem...inconsistent(?).
>
They do... that `56m 44s` is from a second run? I didn't timeit, but I got
something like ~15s startup on your
`bug5232_adding_concurrent_processing_squashed` branch just now, for 250
descriptors.
> Pushed
[https://gitweb.torproject.org/user/sysrqb/bridgedb.git/shortlog/refs/heads/bug5232_adding_concurrent_processing_squashed
bug5232_adding_concurrent_processing_squashed]. Please review
>
> https://travis-ci.org/sysrqbci/bridgedb/builds/21812112
Here goes!
1) For `43c6f8778b244ca4b19628677a293a83ae1d1e09`
`Reparse descriptors in background thread on signal`:
In `Main.updateBridgeHistory()`:
{{{
+ for ID in bridges:
+ bridge = bridges[ID]
}}}
I'm pretty sure we had this code like this previously, but I don't
know why we're not using:
{{{
+ for bridge in bridges.values():
}}}
instead. But this is shouldn't make any functional difference, it was
probably done like this way back when, probably to solve some weird
Python2.5 bug. Not really a problem, just nitpickings and archaeological
ponderings.
In the same function:
{{{
+ with bridgedb.Storage.getDB() as db:
[...]
+ for timestamp in ts:
+ logging.debug(
+ "Updating BridgeHistory timestamps
for %s: %s"
+ % (bridge.fingerprint, timestamp))
+ reactor.callFromThread(
+
bridgedb.Stability.addOrUpdateBridgeHistory,
+ bridge, timestamp)
+ logging.debug("Stability calculations complete")
+
+ reactor.callInThread(updateBridgeHistory, bridges, timestamps)
}}}
Calling `reactor.callFromThread` will run the call inside the main
reactor loop... unless the `with bridgedb.Storage.getDB() as db` is doing
some context magic to put it back into a separate thread again. Unless I'm
misunderstanding something, which I probably am. Or, does it call it in
the parent thread, i.e. the `reactor.callInThread(_reloadFn)` thread?
Because the `t.i.interfaces.IReactorThreads` docs make it seem like
`reactor.callFromThread()` will ''always'' call the main reactor thread.
Wow. This distributor swapping. Is just wow. OMGWTFBBQ, Python.
2) For `e2275120ea8d95e0b2c9f25f1f42a293957363dc`
`Add unit tests for bridgedb.Storage`:
I like this function:
{{{
+ def _runAndDie(self, timeout, func):
+ with func():
+ sleep(timeout)
}}}
Just thought I'd say that somewhere. :)
Thanks for adding unittests!
3) The context wrapping for the `with bridgedb.Storage.getDB() as db`
idiom is some serious hacking. I'm impressed. It works. :D
4) In `b67862d317a282d81e2f53726e4dbff8bb8cb1f1`
`More unit tests for Email Server`:
Why `from StringIO import StringIO`? Isn't that the old, deprecated
one?
5) In `91dc3a6e079b318a3af1bbd4bb63cf249e2bbdb6`
`Rewrite database lock acquisition function`:
A typo got accidentally added:
{{{
- Always return a :class:`bridgedb.Storage.Database` that is
+ lways return a :class:`bridgedb.Storage.Database` that is
}}}
6) In `3c9e04ac9006ff21f30aeb033f6e2560f09541f6`
`Make sure leekspin created our descriptors`:
Sorry for the `/usr/share/dict` problem in leekspin, it's a good idea
to add these checks.
7) Super! CHANGELOG entries! Awesome! :D
Overall, the context manager wrapping for the semaphores on db accesses is
a bit of madness, but doing threading in Twisted Python is a bit of
madness, and I'm amazed this is working. Threading in Twisted is seriously
no small feat. You did a really brilliant job on this!
Unless someone has something more to add, I'm merging this now for 0.1.7.
:D
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5232#comment:18>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs