[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #29672 [Internal Services/Service - trac]: trac gets overwhelmed
#29672: trac gets overwhelmed
----------------------------------------------+--------------------------
Reporter: anarcat | Owner: qbi
Type: defect | Status: assigned
Priority: High | Milestone:
Component: Internal Services/Service - trac | Version:
Severity: Critical | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
----------------------------------------------+--------------------------
Comment (by anarcat):
today trac hung badly - all requests were giving 503 errors to client and
the machine was maxing its CPU and memory. i found this in the error log:
{{{
[Thu Apr 11 16:30:23.749569 2019] [wsgi:error] [pid 22934:tid
140416296871680] (11)Resource temporarily unavailable: [client
[REDACTED]:40900] mod_wsgi (pid=22934): Unable to connect to WSGI daemon
process 'trac.torproject.org' on '/var/run/apache2/wsgi.2106.9.1.sock'
after multiple attempts as listener backlog limit was exceeded.
}}}
The `trac.log` was full of:
{{{
IOError: Apache/mod_wsgi failed to write response data: Broken pipe
}}}
CPU and memory had been maxed out for more than two hours already when the
outage started:
[[Image(https://paste.anarc.at/snaps/snap-2019.04.11-12.53.48.png,700)]]
Apache was also seeing more hits than usual:
[[Image(https://paste.anarc.at/snaps/snap-2019.04.11-12.57.04.png,700)]]
But I don't believe it was starving out of resources:
[[Image(https://paste.anarc.at/snaps/snap-2019.04.11-12.58.56.png,700)]]
It's possible the pgsql database got overwhelmed. We don't have metrics
for that in prometheus because, ironically enough, I just decided
yesterday it might have been overkill. Maybe we should revise that
decision now.
I wonder if our WSGI config could be tweaked. This is what we have right
now:
{{{
WSGIDaemonProcess trac.torproject.org user=tracweb group=tracweb home=/
processes=6 threads=10 maximum-requests=5000 inactivity-timeout=1800
umask=0007 display-name=wsgi-trac.torproject.org
}}}
I've decided to make more of those settings explicit to see if some tweaks
might be useful:
{{{
WSGIDaemonProcess trac.torproject.org user=tracweb group=tracweb home=/
processes=6 threads=10 maximum-requests=5000 inactivity-timeout=1800
umask=0007 graceful-timeout=30 restart-interval=30 response-socket-
timeout=10 display-name=wsgi-trac.torproject.org
}}}
The server was rebooted, which fixed the problem, but we'll see if the
above tweaks might fix the problem in the future.
Failing that, a good path to take next time is to look at whether the
database is overloaded - it would explain why the frontend is falling over
without a clear explanation, although it must be said that most of the CPU
was taken by WSGI processes, not pgsql.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29672#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs