96 lines
4.6 KiB
Markdown
96 lines
4.6 KiB
Markdown
# sysalert
|
|
Generic OnFailure= and OnSuccess= handler for systemd
|
|
|
|
## Purpose
|
|
This tool is intended to be used to send notifications when a systemd service fails. It is installed by setting `sysalert-failure@%n.service` and `sysalert-success@%n.service` as OnFailure= and OnSuccess=-handlers in the systemd service files.
|
|
|
|
The primary purpose is to keep track of services triggerd by timers and paths and similar, but it
|
|
can be used to montitor any systemd service.
|
|
|
|
## Features and inner workings
|
|
- ignore X failures before sending notification
|
|
- do not send repeated notifications of the same problem
|
|
- send recovery notifications
|
|
- flexible alert mechanism
|
|
|
|
On a high level sysalert works like this:
|
|
|
|
When sysalert-failure is triggered the triggering service exit status, invocation ID and a timestamp
|
|
is saved to a sqlite database. Based on previous results and configuration in `/etc/sysalert.ini` a
|
|
notification is sent using the configured alert method.
|
|
|
|
When sysalert-success is triggered sysalert will send a notification about service
|
|
recovery (if enabled) as well as clear the sqlite database from any failures from the triggering service.
|
|
|
|
|
|
## Installation
|
|
[Build and install](https://packaging.python.org/en/latest/tutorials/packaging-projects/) the python
|
|
package and install the configuration file and systemd services:
|
|
```
|
|
cp config/sysalert.ini /etc/
|
|
cp systemd/sysalert-failure@.service systemd/sysalert-success@.service /etc/systemd/system/
|
|
mkdir /etc/systemd/system/sysalert-.service.d
|
|
cp systemd/overrides/sysalert-.service.d.conf /etc/systemd/system/sysalert-.service.d/10-sysalert.conf
|
|
systemctl daemon-reload
|
|
```
|
|
|
|
Once everything is installed you can set `sysalert-failure@%n.service` and `sysalert-success@%n.service` as OnFailure= and OnSuccess=-handlers in any service unit to get an email notification on failure.
|
|
It is also possible to set this system-wide by creating
|
|
`/etc/systemd/system/service.d/10-sysalert.conf` like so:
|
|
```
|
|
[Unit]
|
|
OnFailure=sysalert-failure@%n.service
|
|
OnSuccess=sysalert-success@%n.service
|
|
```
|
|
**WARNING:** setting a system-wide handler like this will override any OnFailure= or OnSuccess= set
|
|
in service files, and modifying dependencies for sysalert may cause the system to fail at boot. Only
|
|
do this if you're sure it works on your system or are ready to troubleshoot boot failures.
|
|
|
|
|
|
There is also a [Gentoo ebuild](https://gitea.fulh.ax/feffe/feffe-portage-overlay/src/branch/master/sys-apps/sysalert)
|
|
I made for my own convinience, but beware as the ebuild installs sysalert as a system-wide handler
|
|
as described above.
|
|
|
|
## Configuration
|
|
sysalert searches /etc/sysalert.ini for configuration; see example configuration in repo.
|
|
|
|
Note that by default the sysalert-services depend on network.target, depending on your alert-methods
|
|
you may need to override this.
|
|
|
|
## Alert methods
|
|
At the moment the only implemented alert method is 'sysalert.email' which uses smtp to send an email
|
|
about service problems. Currently the email content is not templated, but it does include the
|
|
journal log for the failed service as well as other nice-to-know information.
|
|
|
|
sysalert uses dynamic imports to import the alert methods. sysalert.email is a python module
|
|
implemented in this package, but it can be any python module on your system that implements the
|
|
`success()` and `failure()` methods.
|
|
|
|
### `success()` and `failure()`
|
|
Any module that implements these methods can be used as an alert-method. These methods takes three arguments:
|
|
|
|
- **service_name** - name of the service
|
|
- **failures** - list of dicts containing data about previous (and current) failures. the list is
|
|
sorted on time with the first failure first and latest failure at the end. Currently the dicts include:
|
|
- `service_result`
|
|
- `exit_code`
|
|
- `exit_status`
|
|
- `invocation_id`
|
|
- `timestamp`
|
|
- `alert_method`
|
|
|
|
- **config** - a dict containing all key-values defined in the configuration section for the
|
|
alert-method. For example 'sysalert.email'-section for 'sysalert.email' alert method.
|
|
|
|
## Stuff to fix
|
|
This was a weekend project and is not very polished. Here are a few things that could probably be
|
|
improved:
|
|
- Fix hardcoded paths (config-file and database location)
|
|
- Implement command line tool (running `sysalert` manually should make it possible to update/clear
|
|
database entries, maybe reconfigure and see alert status)
|
|
- Proper packaging and maybe publish in pip
|
|
- Implement more handlers (maybe `sysalert.syslog`)
|
|
- Find a method to detect if a failed service was triggered manually or by a timer/path/other
|
|
service etc. Would be nice to be able to set this as default only on services triggered by
|
|
timers...
|