summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAge
* fix typo in redis checkHEADmasterAntoine Beaupré16 hours
|
* whitespace change to trigger a rebuildAntoine Beaupré34 hours
|
* cleanup postgres-gitlab-hosts fullyJérôme Charaoui36 hours
|
* gitlab-02 now uses puppetized postgresql (tpo/tpa/team#41426)Jérôme Charaoui37 hours
|
* meronense upgraded to postgres15Jérôme Charaoui37 hours
| | | | as this was the last host in postgres13, cleanup that hostgroup
* upgraded pgsql materculae and meronense (tpo/tpa/team#41252)Jérôme Charaoui44 hours
|
* stop hardcoding dnssec-verify pathAntoine Beaupré4 days
| | | | It moved to /usr/bin in bookworm.
* upgrade bacula-director-01 to psql-15 (tpo/tpa/team#41252)Jérôme Charaoui4 days
|
* monitor btcpay.tpo via new dsa_check_cert_sni command (tpo/tpa/team#41386)Jérôme Charaoui2023-12-18
|
* try again, again to monitor certs on btcpayserver (tpo/tpa/team#41386)Antoine Beaupré2023-12-18
|
* try again to monitor certs on btcpayserver (tpo/tpa/team#41386)Antoine Beaupré2023-12-18
|
* try to monitor certs on btcpayserver (tpo/tpa/team#41386)Antoine Beaupré2023-12-18
|
* remove heavy-postfix hostgroupJérôme Charaoui2023-12-11
| | | | | | | | this was only used to check for the existence of the postgrey process, but that service's arguments changed in bookworm due to the low value of this check and the impending migration to something else, simply drop it altogether
* rude: now has postgresql-15Jérôme Charaoui2023-12-11
|
* stop monitoring the presence or absence of amavis processesAntoine Beaupré2023-11-20
| | | | | | | | | | | | This is useless. Right now, this is warning us that amavis is failing on rude, but that's only because it was upgraded and the process signature changed. Meanwhile: who *cares* if amavis is running unexpectedly on another host? What are we worried about here, a hostile... virus checker? I'm tempted to remove *all* such checks: we have a systemd check that is better suited for this kind of checks, but one hurt at a time.
* fix disk check warning on new tb-build-0[23] hostsAntoine Beaupré2023-11-15
| | | | | | | | | | | | | this was trying to ignore /srv which wasn't mounted, so it was failing with: root@tb-build-02:~# /usr/lib/nagios/plugins/check_disk -X devpts -X proc -X linprocfs -X devfs -X fdescfs -X tracefs -X sysfs -X nfs -X overlay -w 10% -c 5% -A -i '^.*/docker/.*$' -i $(findmnt -o SOURCE -n /srv) /usr/lib/nagios/plugins/check_disk: option requires an argument -- 'i' Unknown argument Usage: check_disk {-w absolute_limit |-w percentage_limit% | -W inode_percentage_limit } {-c absolute_limit|-c percentage_limit% | -K inode_percentage_limit } {-p path | -x device} [-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ] [-t timeout] [-u unit] [-v] [-X type] [-N type]
* remove gettor static component (tpo/web/team#44)Jérôme Charaoui2023-11-08
|
* survey-01 upgraded to bookwormJérôme Charaoui2023-11-01
|
* upgrade weather-01 to bookworm (tpo/tpa/team#41252)Antoine Beaupré2023-10-31
|
* retire old tb-build-04 and -05 servers (tpo/tpa/team#41367)Antoine Beaupré2023-10-31
|
* install ssh-dal-01 jump host (tpo/tpa/team#41351)Antoine Beaupré2023-10-05
|
* fully remove postgres11-hosts hostgroupJérôme Charaoui2023-10-05
|
* run catalog checks from puppetdb-01Jérôme Charaoui2023-10-05
|
* fix build errorJérôme Charaoui2023-10-05
| | | | | | | | | | | | | | | the build script doesn't like empty host groups: remote: ./build-nagios remote: ./build-nagios:365:in `throw': uncaught throw "no hosts for service process - postgresql11 - master" (UncaughtThrowError) remote: from ./build-nagios:365:in `block in <main>' remote: from ./build-nagios:338:in `each' remote: from ./build-nagios:338:in `<main>' remote: make: *** [Makefile:5: generated/nrpe_tor.cfg] Error 1 remote: run-parts: /srv/git.torproject.org/git-helpers/post-receive-per-repo.d/admin%tor-nagios/trigger-nagios-build exited with return code 2
* missed a spotJérôme Charaoui2023-10-05
|
* moved pauli to gnt-dal (tpo/tpa/team#41346)Jérôme Charaoui2023-10-05
| | | | also moved puppetdb to its own machine, so no more pgsql
* add puppetdb (tpo/tpa/team#41341)Jérôme Charaoui2023-10-02
|
* rename chi-node-14 to ci-runner-x86-03 (tpo/tpa/team#41325)Jérôme Charaoui2023-09-20
|
* replace ci-runner-x86-01 with -02 (tpo/tpa/team#41295)Jérôme Charaoui2023-09-19
|
* fix tb-build-* monitoring (tpo/tpa/team#41304)Antoine Beaupré2023-09-07
| | | | The IP of -03 was incorrect, and -02 was missing.
* install tb-build-03 (tpo/tpa/team#41304)Antoine Beaupré2023-09-06
|
* reinstall rdsys server (tpo/tpa/team#41297)Antoine Beaupré2023-08-30
|
* install rdsys-frontend-test-02 (tpo/tpa/team#41297)Antoine Beaupré2023-08-28
|
* install podman CI runner (tpo/tpa/team#41296)Antoine Beaupré2023-08-14
|
* remove nginx process checksAntoine Beaupré2023-08-08
| | | | | | | | This was cargo-culted from the Apache config originally, and I do not think the checks are warranted. They have never, as far as I can remember, yielded correct alerts and are currently yielding a false positive as forum-01 has two nginx servers running simultaneously because of the container configuration.
* do not check btcpayserver-02 for postgresAntoine Beaupré2023-08-08
| | | | it's running inside a container (tpo/tpa/team#41290)
* remove dependency on removed serviceAntoine Beaupré2023-08-08
| | | | i wonder how that compiled in the first place
* retire postgresql check on btcpayserver-02 (tpo/tpa/team#41290)Antoine Beaupré2023-08-08
| | | | | | | | | | | This check is almost useless: it checks for the master process presence. We *did* get psql crashes in the past, so it's worthwhile to monitor it, but for this specific case it fails miserably because the database server is hidden inside a container. Let's assume the container gizmo does its thing correctly and not monitor it at all. Again, a case of monitoring for a possible cause instead of symptoms.
* stop monitoring postfix processes (tpo/tpa/team#41291)Antoine Beaupré2023-08-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Those checks have *always* returned false positives. In my decades of experience with it, I have actually never seen postfix crash completely, so those checks are basically useless in the first place. Even if postfix *would* crash, systemd would likely notice and flag the service as failed. And even if *that* would fail, we should have *other* systems for checking mail services health like end-to-end deliverability checks and so on. This fixes one false positive which is a postfix process running inside a container for the Discourse service. We've also had false positives with various causes, for example: CHECK_NRPE STATE CRITICAL: Socket timeout after 50 seconds. was: the host is rebooting. connect to address 64.18.183.94 port 5666: Connection refused ... was: the host is *booting*. PROCS CRITICAL: 0 processes with UID = 0 (root), command name master, args /usr/lib/postfix/sbin/master This was an actual case of the master process missing, but was probably due to a race condition on reboot, unclear. In any case, this is noisy and hard to disable for individual host, so just make it go away.
* retire tb-build-01 (tpo/tpa/team#41209)Jérôme Charaoui2023-07-26
|
* Revert "Remove apache2 https check from donate-review-01"kez2023-07-11
| | | | | | | | This reverts commit 0a859296f3bc0d54895666401cc75d0b0a6c4838. Nagios complains of an unwanted https service without this check. I think the best fix is to leave this check enabled, and just give apache a default https vhost to serve.
* provision minio-01 (tpo/tpa/team#41257)Antoine Beaupré2023-07-10
|
* Remove apache2 https check from donate-review-01kez2023-06-30
| | | | | donate-review only serves review apps over https, so there isn't an https vhost running on the main donate-review-01.tpo domain
* ignore running, unmanaged webserver on btcpayserverAntoine Beaupré2023-06-28
|
* remove irrelevant commentAntoine Beaupré2023-06-28
|
* remove ignored bridgedb check (tpo/tpa/team#40828)Antoine Beaupré2023-06-28
|
* try again to silence warnings on btcpayserverAntoine Beaupré2023-06-28
| | | | It doesn't seem to be properly detecting the nginx service.
* btcpayserver actually runs nginxAntoine Beaupré2023-06-26
|
* add btcpayserver to monitoring (tpo/tpa/team#41240)Antoine Beaupré2023-06-26
| | | | This was forgotten in tpo/tpa/team#33750.
* drop domain partJérôme Charaoui2023-06-21
|