[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #30006 [Applications/Quality Assurance and Testing]: Monitor "aliveness" of default bridges in Tor Browser
#30006: Monitor "aliveness" of default bridges in Tor Browser
-------------------------------------------------+-------------------------
Reporter: phw | Owner: phw
Type: defect | Status:
| assigned
Priority: Medium | Milestone:
Component: Applications/Quality Assurance and | Version:
Testing |
Severity: Normal | Resolution:
Keywords: default bridge | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Comment (by anarcat):
Some more details on how this works. Prometheus is just a
scraping/alerting system and relies on "exporters" to do the work. For
example, we have "node exporters" installed on every TPA machine which
provide stats like disk, CPU, and memory usage and also have "apache
exporters" which provide internal stats on webservers as well. Details of
that deployment are in #29681.
The exporter that seem to fit the bill of "probe a TCP port for liveness"
seem to be the [https://github.com/prometheus/blackbox_exporter blackbox
exporter]. It could be deployed on the Prometheus server and check each
public tor bridge for reachability. The blackbox exporter is not very well
documented (not surprising considering its name), so I found more
documentation on how it works
[https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusBlackboxNotes
here] and [https://michael.stapelberg.ch/posts/2016-01-01-prometheus-
blackbox-exporter/ here].
The example you pasted was ran on my home workstation, and was simply a
matter of running:
{{{
apt install prometheus-blackbox-exporter
}}}
The exporter supports probing arbitrary hosts on the fly like this. The
final targets would need to be added to the
[https://github.com/prometheus/blackbox_exporter/blob/master/CONFIGURATION.md
configuration file] (see also
[https://github.com/prometheus/blackbox_exporter/blob/master/example.yml
this example]). This could all be done somewhat automatically as well,
with a cron job polling the list of bridges from some canonical location.
The blackbox exporter is pretty powerful: in theory, we could make it do a
simple send/expect dialog to verify the other end is really a Tor server,
if that would be useful.
Once the exporter is setup, the Prometheus server would be configured to
scrape those metrics, which would be collected every "scrape interval"
(currently 15 seconds).
Note that we do not have alerting capabilities yet: this is still handled
by Icinga (previously known as Nagios) (see #29864 and #29863 for that
discussion). Instead, we could make a Grafana dashboard that displays
those metrics. There are a few dashboards that exist already that process
those metrics out of the box, but they would probably require at least
some tweaking:
* https://grafana.com/dashboards/5990
* https://grafana.com/dashboards/5345
* https://grafana.com/dashboards/7587
* full list:
https://grafana.com/dashboards?dataSource=prometheus&search=blackbox
I'm not sure alerting is really a necessity. It might be sufficient to
check that dashboardas part of the release process, for example.
The open questions for me are:
1. is this the metrics team responsability? or TPA?
2. what is the canonical reference for the list of public bridges?
[https://gitweb.torproject.org/builders/tor-browser-
build.git/plain/projects/tor-browser/Bundle-Data/PTConfigs/bridge_prefs.js
this javascript file]? how stable is that file format? do I need to parse
it as javascript or can I get away with a regex?
3. what is the threshold for failure? say we ping the bridge every 15
seconds, how many failures per which time period is a considered a
failure? an example would be less than 50% of probes in the last day, for
example. we can also check for latency as well
4. are latency metrics sensitive? currently, the Prometheus metrics are
more or less publicly accessible, so if this is implemented, it would
expose the latency of those hosts which could be leveraged for correlation
attacks (although arguably *anyone* could run a similar setup and do a
similar attack). if we are worried about this, a separate Prometheus
server could be deployed with stronger security. (see also the discussion
in #29863)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/30006#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs