[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #31159 [Internal Services/Tor Sysadmin Team]: Monitor anti-censorship www services with prometheus
#31159: Monitor anti-censorship www services with prometheus
-------------------------------------------------+-------------------------
Reporter: phw | Owner: hiro
Type: task | Status:
| assigned
Priority: Medium | Milestone:
Component: Internal Services/Tor Sysadmin Team | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: #30152 | Points: 1
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Changes (by anarcat):
* owner: tpa => hiro
* status: new => assigned
Old description:
> In the anti-censorship team we currently monitor
> [https://trac.torproject.org/projects/tor/wiki/org/teams/AntiCensorshipTeam/InfrastructureMonitoring
> several services] with sysmon. We recently discovered that sysmon
> doesn't seem to follow HTTP 301 redirects. This means that if a web
> service dies but the 301 redirect still works (e.g., BridgeDB is dead but
> its apache reverse proxy still works), sysmon won't notice.
>
> Now that prometheus is running, we should fill this monitoring gap by
> testing the following web sites:
>
> * https://bridges.torproject.org
> * https://snowflake.torproject.org
> * https://gettor.torproject.org
>
> Our test should ensure that these sites serve the content we expect,
> e.g., make sure that bridges.tp.o contains the string "BridgeDB" in its
> HTML. Testing the HTTP status code does not suffice: if BridgeDB is down,
> the reverse proxy may still respond.
>
> I wonder if prometheus could also help us with #12802 by sending an email
> to bridges@tp.o and making sure that it responds with at least one
> bridge?
New description:
In the anti-censorship team we currently monitor
[https://trac.torproject.org/projects/tor/wiki/org/teams/AntiCensorshipTeam/InfrastructureMonitoring
several services] with sysmon. We recently discovered that sysmon doesn't
seem to follow HTTP 301 redirects. This means that if a web service dies
but the 301 redirect still works (e.g., BridgeDB is dead but its apache
reverse proxy still works), sysmon won't notice.
Now that prometheus is running, we should fill this monitoring gap by
testing the following web sites:
* https://bridges.torproject.org
* https://snowflake.torproject.org
* https://gettor.torproject.org
Our test should ensure that these sites serve the content we expect, e.g.,
make sure that bridges.tp.o contains the string "BridgeDB" in its HTML.
Testing the HTTP status code does not suffice: if BridgeDB is down, the
reverse proxy may still respond.
I wonder if prometheus could also help us with #12802 by sending an email
to bridges@tp.o and making sure that it responds with at least one bridge?
Checklist:
1. [ ] monitor services in Nagios: BridgeDB, Snowflake, and GetTor
2. [ ] deploy Prometheus's "blackbox exporter" for default bridges,
which are external services
3. [ ] delegate to (and train) the anti-censorship team the blackbox
exporter configuration
3. [ ] experiment with Prometheus's "alertmanager", which can send
notifications if a monitoring target goes offline
4. [ ] grant the anti-censorship team access to Prometheus's grafana
dashboard.
--
Comment:
awesome summary, thanks. i turned that into a checklist and assigned the
ticket to hiro who, I think, will handle followup on this. hiro, let me
know if you need help or if any of this is incorrect...
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/31159#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs