[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-bugs] #3834 [Torflow]: Postgre bandwidth scanner has halted on moria



#3834: Postgre bandwidth scanner has halted on moria
-----------------------+----------------------------------------------------
 Reporter:  mikeperry  |          Owner:  aagbsn
     Type:  defect     |         Status:  new   
 Priority:  major      |      Milestone:        
Component:  Torflow    |        Version:        
 Keywords:             |         Parent:        
   Points:             |   Actualpoints:        
-----------------------+----------------------------------------------------

Comment(by aagbsn):

 Some good news:
 I ran into the freezing problem using the SQLite backend, and now we have
 a little bit more information:

 DEBUG[Fri Sep 02 11:45:30 2011]:0.00586295127869 STREAM 75 SENTCONNECT 28
 38.229.70.2 443
 ERROR[Fri Sep 02 11:45:31 2011]:An unknown HTTP error occured
 Traceback (most recent call last):
   File "bwauthority_child.py", line 125, in http_request
     reply = urllib2.urlopen(request)
   File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
     return _opener.open(url, data, timeout)
   File "/usr/lib/python2.7/urllib2.py", line 391, in open
     response = self._open(req, data)
   File "/usr/lib/python2.7/urllib2.py", line 409, in _open
     '_open', req)
   File "/usr/lib/python2.7/urllib2.py", line 369, in _call_chain
     result = func(*args)
   File "/usr/lib/python2.7/urllib2.py", line 1193, in https_open
     return self.do_open(httplib.HTTPSConnection, req)
   File "/usr/lib/python2.7/urllib2.py", line 1154, in do_open
     h.request(req.get_method(), req.get_selector(), req.data, headers)
   File "/usr/lib/python2.7/httplib.py", line 955, in request
     self._send_request(method, url, body, headers)
   File "/usr/lib/python2.7/httplib.py", line 989, in _send_request
     self.endheaders(body)
   File "/usr/lib/python2.7/httplib.py", line 951, in endheaders
     self._send_output(message_body)
   File "/usr/lib/python2.7/httplib.py", line 811, in _send_output
     self.send(msg)
   File "/usr/lib/python2.7/httplib.py", line 773, in send
     self.connect()
   File "/usr/lib/python2.7/httplib.py", line 1154, in connect
     self.timeout, self.source_address)
   File "/usr/lib/python2.7/socket.py", line 562, in create_connection
     sock.connect(sa)
   File "../libs/SocksiPy/socks.py", line 369, in connect
     self.__negotiatesocks5(destpair[0],destpair[1])
   File "../libs/SocksiPy/socks.py", line 236, in __negotiatesocks5
     raise Socks5Error((ord(resp[1]),_socks5errors[ord(resp[1])]))
 Socks5Error: (1, 'general SOCKS server failure')
 DEBUG[Fri Sep 02 11:45:31 2011]:Removed 127.0.0.1:51292 from our local
 port list
 DEBUG[Fri Sep 02 11:45:31 2011]:Got last exit of
 778CABE1D95CD5E4094A82C696972DAECA8634A5
 DEBUG[Fri Sep 02 11:45:31 2011]:0.0769230769231-0.115384615385% circuit
 build+fetch failed for ('778CABE1D95CD5E4094A82C696972DAECA8634A5',
 'vmi3007')
 DEBUG[Fri Sep 02 11:45:31 2011]:Checking if scan count is met...
 WARN[Fri Sep 02 11:45:36 2011]:No event for: 650 STREAM 75 FAILED 28
 38.229.70.2:443 REASON=END REMOTE_REASON=MISC
 Traceback (most recent call last):
   File "../../TorCtl/TorCtl.py", line 717, in _eventLoop
     self._handleFn(timestamp, reply)
   File "../../TorCtl/TorCtl.py", line 1338, in _handle1
     l.listen(event)
   File "../../TorCtl/TorCtl.py", line 1290, in listen
     self._map1.get(event.event_name, self.unknown_event)(event)
   File "../../TorCtl/SQLSupport.py", line 1086, in stream_status_event
     strm.id).execute(row_type='failedstream')
   File "/home/aagbsn/.virtualenvs/torflow/lib/python2.7/site-
 packages/sqlalchemy/sql/expression.py", line 2660, in execute
     return e._execute_clauseelement(self, multiparams, params)
   File "/home/aagbsn/.virtualenvs/torflow/lib/python2.7/site-
 packages/sqlalchemy/engine/base.py", line 2292, in _execute_clauseelement
     return connection._execute_clauseelement(elem, multiparams, params)
   File "/home/aagbsn/.virtualenvs/torflow/lib/python2.7/site-
 packages/sqlalchemy/engine/base.py", line 1532, in _execute_clauseelement
     compiled_sql, distilled_params
   File "/home/aagbsn/.virtualenvs/torflow/lib/python2.7/site-
 packages/sqlalchemy/engine/base.py", line 1640, in _execute_context
     context)
   File "/home/aagbsn/.virtualenvs/torflow/lib/python2.7/site-
 packages/sqlalchemy/engine/base.py", line 1633, in _execute_context
     context)
   File "/home/aagbsn/.virtualenvs/torflow/lib/python2.7/site-
 packages/sqlalchemy/engine/default.py", line 325, in do_execute
     cursor.execute(statement, parameters)
 OperationalError: (OperationalError) database is locked u'UPDATE stream
 SET row_type=? WHERE stream.id = ?' ('failedstream', 1)

 My hypothesis is that some sqlalchemy sessions are not closed (by
 session.commit(), or session.remove()), though most of the time this
 doesn't fail because a commit does happen elsewhere. When http_request
 fails (as in the case of a Socks5Error), the commit doesn't happen, the
 session is left open, and the next unlucky query fails. SQLite doesn't
 support concurrent connections, but postgres does, so the query hangs
 forever...

 I noticed the session oddities while working on #2947, and refactored
 SQLSupport.py to use local sessions that are closed after each request.
 See:
 https://github.com/aagbsn/torctl/commit/4cc3605c969cebb0b1880c1257f68e63ecb6064f

 mikeperry and I have been testing these changes with the postgres backend
 and have not encountered arma's freezing issue.

 I was able to replicate the issue reliably by setting nodes_per_slice and
 circs_per_node to 1 in bwauthority.cfg . Using the sqlite backend,
 bwauthority exits and is restarted. Over a 30 minute period I logged 3
 crashes. A second run logged 5 crashes. After switching TorCtl branches
 and running the same test, I have been unable to reproduce the error after
 45 minutes.

 --Aaron

 Replying to [comment:18 arma]:
 > Scanner 4 got another traceback today. Looks like the output is
 interleaved with some other thread:
 > {{{
 > [...]
 > DEBUG[Thu Sep 01 10:49:07 2011]:Got stream bw: 3167
 > DEBUG[Thu Sep 01 10:49:07 2011]:0.102648973465 STREAM_BW 3167 319 2362
 > ERROR[Thu Sep 01 10:49:07 2011]:An unknown HTTP error occured
 > Traceback (most recent call last):
 >   File "bwauthority_child.py", line 125, in http_request
 >     reply = urllib2.urlopen(request)
 >   File "/usr/lib/python2.5/urllib2.py", line 124, in urlopen
 > DEBUG[Thu Sep 01 10:49:07 2011]:1314888547.54 0.00143194198608 CIRC 2066
 CLOSED
 $BC789B6057C5C71F2F85B3DA202B52EE94C4FF0C,$640AE17D46C7374F57689CD60E508685FB5BB4B5
 REASON=DESTROYED REMOTE_REASON=OR_CONN_CLOSED
 >     return _opener.open(url, data)
 >   File "/usr/lib/python2.5/urllib2.py", line 381, in open
 >     response = self._open(req, data)
 >   File "/usr/lib/python2.5/urllib2.py", line 399, in _open
 >     '_open', req)
 >   File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain
 >     result = func(*args)
 >   File "/usr/lib/python2.5/urllib2.py", line 1115, in https_open
 >     return self.do_open(httplib.HTTPSConnection, req)
 >   File "/usr/lib/python2.5/urllib2.py", line 1080, in do_open
 >     r = h.getresponse()
 >   File "/usr/lib/python2.5/httplib.py", line 928, in getresponse
 > DEBUG[Thu Sep 01 10:49:07 2011]:0.143916845322 CIRC 2066 CLOSED
 $BC789B6057C5C71F2F85B3DA202B52EE94C4FF0C,$640AE17D46C7374F57689CD60E508685FB5BB4B5
 REASON=DESTROYED REMOTE_REASON=OR_CONN_CLOSED
 > DEBUG[Thu Sep 01 10:49:07 2011]:Close refcount 0 for
 BC789B6057C5C71F2F85B3DA202B52EE94C4FF0C
 > DEBUG[Thu Sep 01 10:49:07 2011]:Close refcount 1 for
 640AE17D46C7374F57689CD60E508685FB5BB4B5
 >     response.begin()
 >   File "/usr/lib/python2.5/httplib.py", line 385, in begin
 >     version, status, reason = self._read_status()
 >   File "/usr/lib/python2.5/httplib.py", line 349, in _read_status
 >     raise BadStatusLine(line)
 > BadStatusLine
 > DEBUG[Thu Sep 01 10:49:07 2011]:Removed 127.0.0.1:60353 from our local
 port list
 > DEBUG[Thu Sep 01 10:49:07 2011]:Got last exit of
 640AE17D46C7374F57689CD60E508685FB5BB4B5
 > DEBUG[Thu Sep 01 10:49:07 2011]:71.9952019192-73.9944022391% circuit
 build+fetch failed for ('640AE17D46C7374F57689CD60E508685FB5BB4B5',
 'Unnamed')
 > DEBUG[Thu Sep 01 10:49:07 2011]:Checking if scan count is met...
 > DEBUG[Thu Sep 01 10:49:07 2011]:Got stream bw: 3167
 > [...]
 > }}}

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/3834#comment:22>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs