routeros-scripts/doc/check-health.md

90 lines
3.1 KiB
Markdown
Raw Normal View History

2020-03-27 20:41:40 +00:00
Notify about health state
=========================
[⬅️ Go back to main README](../README.md)
2020-03-27 20:41:40 +00:00
> **Info**: This script can not be used on its own but requires the base
2022-02-11 22:34:39 +00:00
> installation. See [main README](../README.md) for details.
2020-03-27 20:41:40 +00:00
Description
-----------
This script is run from scheduler periodically, sending notification on
health related events:
* high CPU utilization
* high RAM utilization (low available RAM)
* voltage jumps up or down more than configured threshold
* voltage drops below hard lower limit
2020-03-27 20:41:40 +00:00
* power supply failed or recovered
* temperature is above or below threshold
Note that bad initial state will not trigger an event.
Monitoring CPU and RAM utilization (available processing and memory
resources) works on all devices. Other than that only sensors available
in hardware can be checked. See what your hardware supports:
2020-03-27 20:41:40 +00:00
/system/health/print;
2020-03-27 20:41:40 +00:00
2021-06-17 13:23:51 +00:00
### Sample notifications
#### CPU utilization
![check-health notification cpu utilization high](check-health.d/notification-01-cpu-utilization-high.avif)
![check-health notification cpu utilization ok](check-health.d/notification-02-cpu-utilization-ok.avif)
#### RAM utilization (low available RAM)
![check-health notification ram utilization high](check-health.d/notification-03-ram-utilization-high.avif)
![check-health notification ram utilization ok](check-health.d/notification-04-ram-utilization-ok.avif)
2021-06-17 13:23:51 +00:00
#### Voltage
![check-health notification voltage](check-health.d/notification-05-voltage.avif)
2021-06-17 13:23:51 +00:00
#### Temperature
![check-health notification temperature high](check-health.d/notification-06-temperature-high.avif)
![check-health notification temperature ok](check-health.d/notification-07-temperature-ok.avif)
2021-06-17 13:23:51 +00:00
#### PSU state
![check-health notification psu fail](check-health.d/notification-08-psu-fail.avif)
![check-health notification psu ok](check-health.d/notification-09-psu-ok.avif)
2021-06-17 13:23:51 +00:00
2020-03-27 20:41:40 +00:00
Requirements and installation
-----------------------------
Just install the script and create a scheduler:
$ScriptInstallUpdate check-health;
/system/scheduler/add interval=53s name=check-health on-event="/system/script/run check-health;" start-time=startup;
> **Info**: Running lots of scripts simultaneously can tamper the
> precision of cpu utilization, escpecially on devices with limited
> resources. Thus an unusual interval is used here.
2020-03-27 20:41:40 +00:00
Configuration
-------------
The configuration goes to `global-config-overlay`, these are the parameters:
2020-03-27 20:41:40 +00:00
* `CheckHealthTemperature`: an array specifying temperature thresholds for sensors
* `CheckHealthVoltageLow`: value (in volt*10) giving a hard lower limit
2020-03-27 20:41:40 +00:00
* `CheckHealthVoltagePercent`: percentage value to trigger voltage jumps
> **Info**: Copy relevant configuration from
> [`global-config`](../global-config.rsc) (the one without `-overlay`) to
> your local `global-config-overlay` and modify it to your specific needs.
Also notification settings are required for
[e-mail](mod/notification-email.md),
[matrix](mod/notification-matrix.md),
[ntfy](mod/notification-ntfy.md) and/or
2021-11-16 15:03:43 +00:00
[telegram](mod/notification-telegram.md).
2020-03-27 20:41:40 +00:00
---
[⬅️ Go back to main README](../README.md)
[⬆️ Go back to top](#top)