Prometheus Node Metrics Examples

This page provides essential Prometheus query examples for Node Exporter, focusing on collecting and analyzing critical system-level metrics. These queries are vital for comprehensive server monitoring and performance tuning.

System Load Monitoring

Understanding system load is crucial for identifying potential performance bottlenecks. The following query calculates the system load as a percentage, normalized by the number of CPU cores available.

avg(node_load1{instance="my-instance-name",job="node-exporter"}) / count(count(node_cpu_seconds_total{instance="my-instance-name",job="node-exporter"}) by (cpu)) * 100

CPU Utilization Analysis

Monitor CPU usage to ensure your systems are not overloaded. This query calculates the non-idle CPU time over the last 5 minutes, providing a clear view of CPU utilization.

100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle", instance="my-instance-name"}[5m])) * 100)

Memory Usage Insights

Effective memory management is key to system stability. Here are queries to track available memory and identify memory pressure.

Memory Available in Percentage:

node_memory_MemAvailable_bytes{instance="my-instance-name"} / node_memory_MemTotal_bytes{instance="my-instance-name"} * 100

Memory Pressure (Major Page Faults):

rate(node_vmstat_pgmajfault{instance="my-instance-name"}[1m])

Disk Space and Performance

Keep an eye on disk capacity and performance to prevent I/O issues and storage exhaustion.

Disk Space Available in Bytes:

node_filesystem_avail_bytes{instance=~"my-ec2-instance",job=~"node-exporter",mountpoint="/"}

Disk Space Available in Percentage:

(node_filesystem_avail_bytes{mountpoint="/", instance=~"my-ec2-instance"}  * 100) / node_filesystem_size_bytes{mountpoint="/", instance=~"my-ec2-instance"}

Disk Latencies (Read/Write):

rate(node_disk_read_time_seconds_total{instance="my-instance-name"}[1m]) / rate(node_disk_reads_completed_total{instance="my-instance-name"}[1m])
rate(node_disk_write_time_seconds_total{instance="my-instance-name"}[1m]) / rate(node_disk_writes_completed_total{instance="my-instance-name"}[1m])

Network Throughput Monitoring

Track network traffic to understand bandwidth usage and identify potential network congestion.

irate(node_network_receive_bytes_total{instance="my-instance-name"}[5m]) * 8
irate(node_network_transmit_bytes_total{instance="my-instance-name"}[5m]) * 8

Node Uptime Calculation

Verify system availability by calculating the node's uptime.

node_time_seconds{instance="my-ec2-instance",job="node-exporter"} - node_boot_time_seconds{instance="my-ec2-instance",job="node-exporter"}

These Prometheus queries, when used with Node Exporter, provide a robust foundation for monitoring your server infrastructure. For more advanced visualizations and alerting, consider integrating these metrics into Grafana dashboards.

External Resources:

Prometheus Node Metrics - System Monitoring Queries

Prometheus Node Metrics