check-health: monitor CPU load

---- ✂️ ----
🧮📈️ Health warning: CPU load

The average CPU load on MikroTik is at 76%!
---- ✂️ ----
🧮📉️ Health recovery: CPU load

The average CPU load on MikroTik decreased to 64%.
---- ✂️ ----
This commit is contained in:
Christian Hesse 2023-01-20 14:24:20 +01:00
parent 2694f8d2b1
commit 75bd14267e
10 changed files with 30 additions and 7 deletions

View file

@ -10,6 +10,8 @@
:global GlobalFunctionsReady;
:while ($GlobalFunctionsReady != true) do={ :delay 500ms; }
:global CheckHealthCPULoad;
:global CheckHealthCPULoadNotified;
:global CheckHealthLast;
:global CheckHealthTemperature;
:global CheckHealthTemperatureDeviation;
@ -43,6 +45,20 @@
$ScriptLock $0;
:set CheckHealthCPULoad (($CheckHealthCPULoad * 4 + [ /system/resource/get cpu-load ] * 10) / 5);
:if ($CheckHealthCPULoad > 750 && $CheckHealthCPULoadNotified != true) do={
$SendNotification2 ({ origin=$0; \
subject=([ $SymbolForNotification "abacus,chart-increasing" ] . "Health warning: CPU load"); \
message=("The average CPU load on " . $Identity . " is at " . ($CheckHealthCPULoad / 10) . "%!") });
:set CheckHealthCPULoadNotified true;
}
:if ($CheckHealthCPULoad < 650 && $CheckHealthCPULoadNotified = true) do={
$SendNotification2 ({ origin=$0; \
subject=([ $SymbolForNotification "abacus,chart-decreasing" ] . "Health recovery: CPU load"); \
message=("The average CPU load on " . $Identity . " decreased to " . ($CheckHealthCPULoad / 10) . "%.") });
:set CheckHealthCPULoadNotified false;
}
:foreach Voltage in=[ /system/health/find where type="V" ] do={
:local Name [ /system/health/get $Voltage name ];
:local Value [ /system/health/get $Voltage value ];

Binary file not shown.

Binary file not shown.

View file

@ -12,32 +12,38 @@ Description
This script is run from scheduler periodically, sending notification on
health related events:
* high CPU load
* voltage jumps up or down more than configured threshold or drops below limit
* power supply failed or recovered
* temperature is above or below threshold
Note that bad initial state will not trigger an event.
Only sensors available in hardware can be checked. See what your
hardware supports:
Monitoring CPU load works on all devices. Other than that only sensors
available in hardware can be checked. See what your hardware supports:
/system/health/print;
### Sample notifications
#### CPU load
![check-health notification cpu load high](check-health.d/notification-01-cpu-load-high.avif)
![check-health notification cpu load ok](check-health.d/notification-02-cpu-load-ok.avif)
#### Voltage
![check-health notification voltage](check-health.d/notification-01-voltage.avif)
![check-health notification voltage](check-health.d/notification-03-voltage.avif)
#### Temperature
![check-health notification](check-health.d/notification-02-temperature-high.avif)
![check-health notification](check-health.d/notification-03-temperature-ok.avif)
![check-health notification temperature high](check-health.d/notification-04-temperature-high.avif)
![check-health notification temperature ok](check-health.d/notification-05-temperature-ok.avif)
#### PSU state
![check-health notification](check-health.d/notification-04-psu-fail.avif)
![check-health notification](check-health.d/notification-05-psu-ok.avif)
![check-health notification psu fail](check-health.d/notification-06-psu-fail.avif)
![check-health notification psu ok](check-health.d/notification-07-psu-ok.avif)
Requirements and installation
-----------------------------

View file

@ -1075,6 +1075,7 @@
# return UTF-8 symbol for unicode name
:set SymbolByUnicodeName do={
:local Symbols {
"abacus"="\F0\9F\A7\AE";
"alarm-clock"="\E2\8F\B0";
"calendar"="\F0\9F\93\85";
"chart-decreasing"="\F0\9F\93\89";