host_alert_rules
Configure and manage Prometheus host alert rules for effective monitoring. Learn to define alerts for node down, low disk space, and more with clear annotations and labels.
Prometheus Host Alert Rules
This document outlines example Prometheus alert rules for monitoring host systems, primarily using the node_exporter. These rules are designed to detect critical issues such as nodes being down or running critically low on disk space. By defining these alert rules, teams can proactively manage their infrastructure, ensuring high availability and performance. The configuration includes essential labels and annotations to provide context for alerts, aiding in faster diagnosis and resolution.
Node Down Alert Rule
This rule triggers an alert when a node exporter instance is unreachable for a specified duration, indicating a potential host outage.
- alert: node_down
expr: up{job="node-exporter"} == 0
for: 1m
labels:
severity: warning
environment: prod
alert_target: "{{ $labels.host }}"
annotations:
summary: "Job {{ $labels.job }} is down on {{ $labels.instance }}"
description: "Failed to scrape {{ $labels.job }} on {{ $labels.instance }} for more than 1 minute. Node might be down."
impact: "Any metrics from {{ $labels.job }} on {{ $labels.instance }} will be missing"
action: "Check on {{ $labels.instance }} if {{ $labels.job }} is running"
dashboard: http://grafana.localdns.xyz/d/pjhLJOzmk/infrastructure-hosts-stats
runbook: http://wiki.localdns.xyz
priority: P2
Low Disk Space Alert Rule
This rule detects when the available disk space on a host's root mount point falls below a critical threshold, signaling an impending storage issue.
- alert: debug_instance_hard_disk_low
expr: (node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"} < 20
for: 1m
labels:
severity: warning
alert_channel: notifications
environment: prod
team: devops
aws_region: eu-west-1
annotations:
title: "[TEST] Disk Usage is Low in {{ $labels.instance }}"
description: "Instance {{ $labels.instance }} has less than {{ humanize $value}}% available on mount {{ $labels.mountpoint }} "
summary: "Low Disk Space Available"
dashboard: http://grafana.localdns.xyz/d/pjhLJOzmk/infrastructure-hosts-stats
runbook: http://wiki.localdns.xyz