[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-bugs] #29672 [Internal Services/Service - trac]: trac gets overwhelmed

To: undisclosed-recipients: ;
Subject: Re: [tor-bugs] #29672 [Internal Services/Service - trac]: trac gets overwhelmed
From: "Tor Bug Tracker & Wiki" <blackhole@xxxxxxxxxxxxxx>
Date: Thu, 11 Apr 2019 17:10:30 -0000
Auto-submitted: auto-generated
Delivered-to: archiver@xxxxxxxx
Delivery-date: Thu, 11 Apr 2019 13:10:40 -0400
In-reply-to: <047.d5bfe71e018e0b688ee81a54efbfe7d4@torproject.org>
List-archive: <http://lists.torproject.org/pipermail/tor-bugs/>
List-help: <mailto:tor-bugs-request@lists.torproject.org?subject=help>
List-id: "auto: Tor bug tracker status mails" <tor-bugs.lists.torproject.org>
List-post: <mailto:tor-bugs@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs>, <mailto:tor-bugs-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-bugs>, <mailto:tor-bugs-request@lists.torproject.org?subject=unsubscribe>
References: <047.d5bfe71e018e0b688ee81a54efbfe7d4@torproject.org>
Reply-to: no-reply@xxxxxxxxxxxxxx, tor-assistants@xxxxxxxxxxxxxx
Sender: "tor-bugs" <tor-bugs-bounces@xxxxxxxxxxxxxxxxxxxx>

#29672: trac gets overwhelmed
----------------------------------------------+--------------------------
 Reporter:  anarcat                           |          Owner:  qbi
     Type:  defect                            |         Status:  assigned
 Priority:  High                              |      Milestone:
Component:  Internal Services/Service - trac  |        Version:
 Severity:  Critical                          |     Resolution:
 Keywords:                                    |  Actual Points:
Parent ID:                                    |         Points:
 Reviewer:                                    |        Sponsor:
----------------------------------------------+--------------------------

Comment (by anarcat):

 today trac hung badly - all requests were giving 503 errors to client and
 the machine was maxing its CPU and memory. i found this in the error log:

 {{{
 [Thu Apr 11 16:30:23.749569 2019] [wsgi:error] [pid 22934:tid
 140416296871680] (11)Resource temporarily unavailable: [client
 [REDACTED]:40900] mod_wsgi (pid=22934): Unable to connect to WSGI daemon
 process 'trac.torproject.org' on '/var/run/apache2/wsgi.2106.9.1.sock'
 after multiple attempts as listener backlog limit was exceeded.
 }}}

 The `trac.log` was full of:

 {{{
 IOError: Apache/mod_wsgi failed to write response data: Broken pipe
 }}}

 CPU and memory had been maxed out for more than two hours already when the
 outage started:

 [[Image(https://paste.anarc.at/snaps/snap-2019.04.11-12.53.48.png,700)]]

 Apache was also seeing more hits than usual:

 [[Image(https://paste.anarc.at/snaps/snap-2019.04.11-12.57.04.png,700)]]

 But I don't believe it was starving out of resources:

 [[Image(https://paste.anarc.at/snaps/snap-2019.04.11-12.58.56.png,700)]]

 It's possible the pgsql database got overwhelmed. We don't have metrics
 for that in prometheus because, ironically enough, I just decided
 yesterday it might have been overkill. Maybe we should revise that
 decision now.

 I wonder if our WSGI config could be tweaked. This is what we have right
 now:

 {{{
 WSGIDaemonProcess trac.torproject.org user=tracweb group=tracweb home=/
 processes=6 threads=10 maximum-requests=5000 inactivity-timeout=1800
 umask=0007 display-name=wsgi-trac.torproject.org
 }}}

 I've decided to make more of those settings explicit to see if some tweaks
 might be useful:

 {{{
 WSGIDaemonProcess trac.torproject.org user=tracweb group=tracweb home=/
 processes=6 threads=10 maximum-requests=5000 inactivity-timeout=1800
 umask=0007 graceful-timeout=30 restart-interval=30 response-socket-
 timeout=10 display-name=wsgi-trac.torproject.org
 }}}

 The server was rebooted, which fixed the problem, but we'll see if the
 above tweaks might fix the problem in the future.

 Failing that, a good path to take next time is to look at whether the
 database is overloaded - it would explain why the frontend is falling over
 without a clear explanation, although it must be said that most of the CPU
 was taken by WSGI processes, not pgsql.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29672#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online

_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Prev by Author: Re: [tor-bugs] #30012 [Core Tor/Stem]: When stem receives a signal, log useful information
Next by Author: Re: [tor-bugs] #30316 [Core Tor/Tor]: Vote's 'bandwidth-file-headers' is in wrong order
Previous by thread: Re: [tor-bugs] #26154 [Obfuscation/BridgeDB]: Remove apt-get update from BridgeDB's .travis.yml to avoid SHA1 signature error
Next by thread: Re: [tor-bugs] #29672 [Internal Services/Service - trac]: trac gets overwhelmed
Index(es):
- Author
- Thread