[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #5232 [BridgeDB]: Import bridges into BridgeDB in a separate thread and database transaction
#5232: Import bridges into BridgeDB in a separate thread and database transaction
-------------------------+-------------------------------------------------
Reporter: karsten | Owner: sysrqb
Type: defect | Status: closed
Priority: major | Milestone:
Component: | Version:
BridgeDB | Keywords: bridgedb-email, bridgedb-db,
Resolution: fixed | bridgedb-https, bridgedb-0.1.7
Actual Points: | Parent ID:
Points: |
-------------------------+-------------------------------------------------
Comment (by sysrqb):
Replying to [comment:18 isis]:
> Replying to [comment:17 sysrqb]:
> > Pushed
[https://gitweb.torproject.org/user/sysrqb/bridgedb.git/shortlog/refs/heads/bug5232_adding_concurrent_processing_squashed
bug5232_adding_concurrent_processing_squashed]. Please review
> >
> > https://travis-ci.org/sysrqbci/bridgedb/builds/21812112
>
> Here goes!
I decided to rewrite history, again because they were relatively minor
changes.
>
> 1) For `43c6f8778b244ca4b19628677a293a83ae1d1e09`
> `Reparse descriptors in background thread on signal`:
> In `Main.updateBridgeHistory()`:
> {{{
> + for ID in bridges:
> + bridge = bridges[ID]
> }}}
> I'm pretty sure we had this code like this previously, but I don't
know why we're not using:
> {{{
> + for bridge in bridges.values():
> }}}
> instead. But this is shouldn't make any functional difference, it
was probably done like this way back when, probably to solve some weird
Python2.5 bug. Not really a problem, just nitpickings and archaeological
ponderings.
Fixed.
>
> In the same function:
> {{{
> + with bridgedb.Storage.getDB() as db:
> [...]
> + for timestamp in ts:
> + logging.debug(
> + "Updating BridgeHistory timestamps
for %s: %s"
> + % (bridge.fingerprint, timestamp))
> + reactor.callFromThread(
> +
bridgedb.Stability.addOrUpdateBridgeHistory,
> + bridge, timestamp)
> + logging.debug("Stability calculations complete")
> +
> + reactor.callInThread(updateBridgeHistory, bridges, timestamps)
> }}}
> Calling `reactor.callFromThread` will run the call inside the main
reactor loop... unless the `with bridgedb.Storage.getDB() as db` is doing
some context magic to put it back into a separate thread again. Unless I'm
misunderstanding something, which I probably am. Or, does it call it in
the parent thread, i.e. the `reactor.callInThread(_reloadFn)` thread?
Because the `t.i.interfaces.IReactorThreads` docs make it seem like
`reactor.callFromThread()` will ''always'' call the main reactor thread.
Nope, you're completey correct. I *think* that was a typo and I wanted to
use callInThread(), but I don't actually know why I wanted to do that
either, it's already in a different thread than reactor loop. I can't
justify spawning a couple thousand threads for this so I deleted that
invocation and now it directly calls addOrUpdateBridgeHistory().
>
> Wow. This distributor swapping. Is just wow. OMGWTFBBQ, Python.
>
> 3) The context wrapping for the `with bridgedb.Storage.getDB() as db`
idiom is some serious hacking. I'm impressed. It works. :D
>
Yeah. I know, me too!
> 4) In `b67862d317a282d81e2f53726e4dbff8bb8cb1f1`
> `More unit tests for Email Server`:
> Why `from StringIO import StringIO`? Isn't that the old,
deprecated one?
>
100% correct. That was my mistake. But then fixing that caused some
breakage because we used StringIO.StringIO in EmailServer.py. When I
switched that to io.StringIO it broke because io.StringIO only accepts
unicode and we used MimeWriter.MimeWriter which used string.
{{{
Traceback (most recent call last):
File "/home/sysrqb/.virtualenvs/virt-bridgedb/lib/python2.7/site-
packages/bridgedb-0.1.5_78_gca67179_dirty-
py2.7.egg/bridgedb/test/test_EmailServer.py", line 215, in
test_getMailResponseMailContent
ret = EmailServer.getMailResponse(lines, self.ctx)
File "/home/sysrqb/.virtualenvs/virt-bridgedb/lib/python2.7/site-
packages/bridgedb-0.1.5_78_gca67179_dirty-
py2.7.egg/bridgedb/EmailServer.py", line 213, in getMailResponse
gpgContext=ctx.gpgContext)
File "/home/sysrqb/.virtualenvs/virt-bridgedb/lib/python2.7/site-
packages/bridgedb-0.1.5_78_gca67179_dirty-
py2.7.egg/bridgedb/EmailServer.py", line 443, in composeEmail
mailbody = w.startbody("text/plain")
File "/usr/lib64/python2.7/MimeWriter.py", line 141, in startbody
self.flushheaders()
File "/usr/lib64/python2.7/MimeWriter.py", line 125, in flushheaders
self._fp.writelines(self._headers)
exceptions.TypeError: unicode argument expected, got 'str'
}}}
As a result, #11370. Which should be resolved now.
> 5) In `91dc3a6e079b318a3af1bbd4bb63cf249e2bbdb6`
> `Rewrite database lock acquisition function`:
> A typo got accidentally added:
> {{{
> - Always return a :class:`bridgedb.Storage.Database` that is
> + lways return a :class:`bridgedb.Storage.Database` that is
> }}}
>
Nice catch
> 6) In `3c9e04ac9006ff21f30aeb033f6e2560f09541f6`
> `Make sure leekspin created our descriptors`:
> Sorry for the `/usr/share/dict` problem in leekspin, it's a good
idea to add these checks.
>
I debated deleting this commit but I decided to leave it because it caused
no harm to check for the files.
> 7) Super! CHANGELOG entries! Awesome! :D
>
And now there's another one!
> Overall, the context manager wrapping for the semaphores on db accesses
is a bit of madness, but doing threading in Twisted Python is a bit of
madness, and I'm amazed this is working. Threading in Twisted is seriously
no small feat. You did a really brilliant job on this!
>
This comment made it all worth it. Thanks! :)
> Unless someone has something more to add, I'm merging this now for
0.1.7. :D
yay! Ok, so after the changes,
5092364f7e3ce42b61b0336a95519639d0e59308 needs review for #11370
bd05227c5eefc528b72d57d53ac3f139fc2f0d5e (was 91dc3a6e) contains the re-
added letter (5)
0556e7daa8504756b34384a0f9b5dfa753556c09 (was b67862d3) now uses
io.StringIO (4)
bd652aa560cfaf15b9b37979aba73e9f8f548b4e (was 43c6f877) fixes
callFromThread() and iterator (1)
And I added 8dfcd5d7f77950828d3c95f93417baf7ea70c6b7 for a doc and to fix
long lines.
Pushed
[https://gitweb.torproject.org/user/sysrqb/bridgedb.git/shortlog/refs/heads/bug5232_adding_concurrent_processing_squashed_r1
bug5232_adding_concurrent_processing_squashed_r1] for rereview!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5232#comment:19>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs