[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-bugs] #31244 [Internal Services/Tor Sysadmin Team]: long term prometheus metrics

To: undisclosed-recipients: ;
Subject: Re: [tor-bugs] #31244 [Internal Services/Tor Sysadmin Team]: long term prometheus metrics
From: "Tor Bug Tracker & Wiki" <blackhole@xxxxxxxxxxxxxx>
Date: Tue, 22 Oct 2019 17:55:33 -0000
Auto-submitted: auto-generated
Delivered-to: archiver@xxxxxxxx
Delivery-date: Tue, 22 Oct 2019 13:56:12 -0400
In-reply-to: <047.e9eaa6fad1f2bb47d67f9638b23d2724@torproject.org>
List-archive: <http://lists.torproject.org/pipermail/tor-bugs/>
List-help: <mailto:tor-bugs-request@lists.torproject.org?subject=help>
List-id: "auto: Tor bug tracker status mails" <tor-bugs.lists.torproject.org>
List-post: <mailto:tor-bugs@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs>, <mailto:tor-bugs-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-bugs>, <mailto:tor-bugs-request@lists.torproject.org?subject=unsubscribe>
References: <047.e9eaa6fad1f2bb47d67f9638b23d2724@torproject.org>
Reply-to: no-reply@xxxxxxxxxxxxxx, tor-assistants@xxxxxxxxxxxxxx
Sender: "tor-bugs" <tor-bugs-bounces@xxxxxxxxxxxxxxxxxxxx>

#31244: long term prometheus metrics
-------------------------------------------------+-------------------------
 Reporter:  anarcat                              |          Owner:  anarcat
     Type:  enhancement                          |         Status:
                                                 |  assigned
 Priority:  Medium                               |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:                                       |  Actual Points:
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------
Changes (by anarcat):

 * owner:  tpa => anarcat
 * status:  new => assigned


Comment:

 i've decided to postpone the creation of a secondary server and instead
 change the retention period on the current server to see if it fixes
 reliability issues detailed in #31916. if, in 30 days, we still have this
 problem, then we can setup a secondary to see if we can reproduce the
 problem there. after all, we don't need a redundant setup as long as we
 don't do alerting, for which we still use nagios (#29864). see also the
 commit log for more details:

 {{{
 origin/master 7cda3928fe9c6bf83ee3e8977b74d58acbb7519a
 Author:     Antoine Beaupré <anarcat@xxxxxxxxxx>
 AuthorDate: Tue Oct 22 13:46:05 2019 -0400
 Commit:     Antoine Beaupré <anarcat@xxxxxxxxxx>
 CommitDate: Tue Oct 22 13:46:05 2019 -0400

 Parent:     91e379a5 make all mpm_worker paramaters configurable
 Merged:     master sudo-ldap
 Contained:  master

 downgrade scrape interval on internal prometheus server (#31916)

 This is an attempt at fixing the reliability issues on the prometheus
 server detailed in #31916. The current theory is that ipsec might be
 the culprit, but it's also possible that the prometheus is overloaded
 and that's creating all sorts of other, unrelated problems.

 This is sidetracking the setup of a *separate* long term monitoring
 server (#31244), of course, but I'm not sure that's really necessary
 for now. Since we don't use prometheus for alerting (#29864), we don't
 absolutely /need/ redundancy here so we can afford a SPOF for
 Prometheus while we figure out this bug.

 If, in thirday days, we still have reliability problems, we will know
 this is not due to the retention period and can cycle back to the
 other solutions, including creating a secondary server to see if it
 reproduces the problem.

 1 file changed, 2 insertions(+), 1 deletion(-)
 modules/profile/manifests/prometheus/server/internal.pp | 3 ++-

 modified   modules/profile/manifests/prometheus/server/internal.pp
 @@ -42,7 +42,8 @@ class profile::prometheus::server::internal (
      vhost_name          => $vhost_name,
      collect_scrape_jobs => $collect_scrape_jobs,
      scrape_configs      => $scrape_configs,
 -    storage_retention   => '30d',
 +    storage_retention   => '365d',
 +    scrape_interval     => '5m',
    }
    # expose our IP address to exporters so they can allow us in
    #

 }}}

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/31244#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online

_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Prev by Author: Re: [tor-bugs] #30607 [Applications/Tor Browser]: Support Tor Running on Android Q
Next by Author: Re: [tor-bugs] #32178 [Core Tor/Tor]: Tor adds trailing space character to log events
Previous by thread: Re: [tor-bugs] #31244 [Internal Services/Tor Sysadmin Team]: long term prometheus metrics
Next by thread: Re: [tor-bugs] #31244 [Internal Services/Tor Sysadmin Team]: long term prometheus metrics
Index(es):
- Author
- Thread