pushover_host_alert_rules
Monitor node status with Prometheus host alert rules. Get alerts for unreachable nodes, configure critical alerts, and ensure smooth operation.
Host Alert Rules
These Prometheus alert rules monitor the status of nodes and trigger alerts when a node becomes unreachable. Configure these rules to receive timely notifications and maintain system stability.
Alert Rules Configuration
Below is the configuration for Prometheus alert rules to monitor host status. These rules define when and how alerts are triggered based on node availability.
groups:
- name: host_alert_rules.yml
rules:
# Alert for any node that is unreachable for > 1 minute.
- alert: node_down
expr: up{job="node-exporter"} == 0
for: 1m
labels:
severity: critical
environment: env-production
annotations:
summary: "Job {{ $labels.job }} is down on {{ $labels.instance }}"
description: "Failed to scrape {{ $labels.job }} on {{ $labels.instance }} for more than 1 minute. Node might be down."
impact: "Any metrics from {{ $labels.job }} on {{ $labels.instance }} will be missing"
action: "Check on {{ $labels.instance }} if {{ $labels.job }} is running"
dashboard: https://grafana.localdns.xyz
runbook: https://runbooks.localdns.xyz
Understanding the Alert
The node_down
alert is triggered when the up
metric from the node-exporter
job is 0 for more than 1 minute. This indicates that the node is unreachable and requires immediate attention.
Explanation of Key Parameters
expr: up{job="node-exporter"} == 0
: This expression checks if theup
metric for thenode-exporter
job is 0, indicating the node is down.for: 1m
: The alert is triggered only if the condition persists for more than 1 minute.labels: severity: critical
: Sets the severity of the alert to critical.annotations
: Provides additional information about the alert, such as a summary, description, impact, and recommended action.
Further Reading
For more information on Prometheus alerting and node monitoring, refer to the following resources: