|
||
---|---|---|
config | ||
src/sysalert | ||
systemd | ||
.gitignore | ||
LICENSE | ||
pyproject.toml | ||
README.md |
sysalert
Generic OnFailure= and OnSuccess= handler for systemd
Purpose
This tool is intended to be used to send notifications when a systemd service fails. It is installed by setting sysalert-failure@%n.service
and sysalert-success@%n.service
as OnFailure= and OnSuccess=-handlers in the systemd service files.
The primary purpose is to keep track of services triggerd by timers and paths and similar, but it can be used to montitor any systemd service.
Features and inner workings
- ignore X failures before sending notification
- do not send repeated notifications of the same problem
- send recovery notifications
- flexible alert mechanism
On a high level sysalert works like this:
When sysalert-failure is triggered the triggering service exit status, invocation ID and a timestamp
is saved to a sqlite database. Based on previous results and configuration in /etc/sysalert.ini
a
notification is sent using the configured alert method.
When sysalert-success is triggered sysalert will send a notification about service recovery (if enabled) as well as clear the sqlite database from any failures from the triggering service.
Installation
Build and install the python package and install the configuration file and systemd services:
cp config/sysalert.ini /etc/
cp systemd/sysalert-failure@.service systemd/sysalert-success@.service /etc/systemd/system/
mkdir /etc/systemd/system/sysalert-.service.d
cp systemd/overrides/sysalert-.service.d.conf /etc/systemd/system/sysalert-.service.d/10-sysalert.conf
systemctl daemon-reload
Once everything is installed you can set sysalert-failure@%n.service
and sysalert-success@%n.service
as OnFailure= and OnSuccess=-handlers in any service unit to get an email notification on failure.
It is also possible to set this system-wide by creating
/etc/systemd/system/service.d/10-sysalert.conf
like so:
[Unit]
OnFailure=sysalert-failure@%n.service
OnSuccess=sysalert-success@%n.service
WARNING: setting a system-wide handler like this will override any OnFailure= or OnSuccess= set in service files, and modifying dependencies for sysalert may cause the system to fail at boot. Only do this if you're sure it works on your system or are ready to troubleshoot boot failures.
There is also a Gentoo ebuild I made for my own convinience, but beware as the ebuild installs sysalert as a system-wide handler as described above.
Configuration
sysalert searches /etc/sysalert.ini for configuration; see example configuration in repo.
Note that by default the sysalert-services depend on network.target, depending on your alert-methods you may need to override this.
Alert methods
At the moment the only implemented alert method is 'sysalert.email' which uses smtp to send an email about service problems. Currently the email content is not templated, but it does include the journal log for the failed service as well as other nice-to-know information.
sysalert uses dynamic imports to import the alert methods. sysalert.email is a python module
implemented in this package, but it can be any python module on your system that implements the
success()
and failure()
methods.
success()
and failure()
Any module that implements these methods can be used as an alert-method. These methods takes three arguments:
-
service_name - name of the service
-
failures - list of dicts containing data about previous (and current) failures. the list is sorted on time with the first failure first and latest failure at the end. Currently the dicts include:
service_result
exit_code
exit_status
invocation_id
timestamp
alert_method
-
config - a dict containing all key-values defined in the configuration section for the alert-method. For example 'sysalert.email'-section for 'sysalert.email' alert method.
Stuff to fix
This was a weekend project and is not very polished. Here are a few things that could probably be improved:
- Fix hardcoded paths (config-file and database location)
- Implement command line tool (running
sysalert
manually should make it possible to update/clear database entries, maybe reconfigure and see alert status) - Proper packaging and maybe publish in pip
- Implement more handlers (maybe
sysalert.syslog
) - Find a method to detect if a failed service was triggered manually or by a timer/path/other service etc. Would be nice to be able to set this as default only on services triggered by timers...