KUBERNETES
Monitor Kubernetes cluster health with Prometheus metrics. Explore container and node metrics for resource usage, performance, and potential issues.
Kubernetes Prometheus Metrics
Container Metrics
Pod Resource Constraints
This section details metrics related to pod memory usage and limits, specifically when resource requests and limits are configured. Understanding these metrics is crucial for efficient resource allocation and preventing OOMKilled events.
process_resident_memory_bytes{pod="prometheus-prometheus-kube-prometheus-prometheus-0"}
kube_pod_container_resource_requests{resource="memory", pod="kube-prometheus-prometheus-0"}
kube_pod_container_resource_limits{resource="memory", pod="kube-prometheus-prometheus-0"}
High CPU Throttling
Alerts when a container experiences significant CPU throttling. This indicates that the container is not receiving enough CPU resources to meet its demands, potentially impacting application performance.
sum by(container, pod, namespace) (increase(container_cpu_cfs_throttled_periods_total{container!=""}[5m])) / sum by(container, pod, namespace) (increase(container_cpu_cfs_periods_total[5m])) > (25 / 100)
Kube API Server Down
Detects when the Kubernetes API server is no longer discoverable by Prometheus, indicating a critical failure in the control plane.
absent(up{job="apiserver"} == 1)
Kubelet Down
Alerts if the Kubelet, responsible for managing pods on a node, disappears from Prometheus target discovery, signaling a potential node issue.
absent(up{job="kubelet"} == 1)
High Kube API Errors
Monitors the rate of HTTP 5xx errors returned by the Kubernetes API server, indicating potential instability or overload.
sum by(resource, subresource, verb) (rate(apiserver_request_total{code=~"5..",job="apiserver"}[5m])) / sum by(resource, subresource, verb) (rate(apiserver_request_total{job="apiserver"}[5m])) > 0.1
High Kube API Latency
Identifies abnormal latency in API server requests, helping to pinpoint performance bottlenecks within the Kubernetes control plane.
(cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} > on(verb) group_left() (avg by(verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0) + 2 * stddev by(verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0))) > on(verb) group_left() 1.2 * avg by(verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0) and on(verb, resource) cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job="apiserver",quantile="0.99"} > 1
Container in Waiting State
Alerts when a pod's container has been in a waiting state for an extended period (e.g., over an hour), which could indicate issues with image pulling, scheduling, or readiness probes.
sum by(namespace, pod, container) (kube_pod_container_status_waiting_reason{job="kube-state-metrics",namespace=~".*"}) > 0
Deployment Replicas Mismatch
Notifies when a Kubernetes Deployment has a mismatch between the desired and available replicas for more than 15 minutes, suggesting a problem with scaling or rollout.
(kube_deployment_spec_replicas{job="kube-state-metrics",namespace=~".*"} != kube_deployment_status_replicas_available{job="kube-state-metrics",namespace=~".*"}) and (changes(kube_deployment_status_replicas_updated{job="kube-state-metrics",namespace=~".*"} [5m]) == 0)
StatefulSet Replicas Mismatch
Similar to Deployments, this alerts on discrepancies between desired and ready replicas for StatefulSets, crucial for stateful applications.
(kube_statefulset_status_replicas_ready{job="kube-state-metrics",namespace=~".*"} != kube_statefulset_status_replicas{job="kube-state-metrics",namespace=~".*"}) and (changes(kube_statefulset_status_replicas_updated{job="kube-state-metrics",namespace=~".*"} [5m]) == 0)
Persistent Volume Filling Up
Warns when a Persistent Volume (PV) is running low on free space, providing a percentage of available space. This helps prevent data loss due to full storage.
kubelet_volume_stats_available_bytes{job="kubelet",metrics_path="/metrics",namespace=~".*"} / kubelet_volume_stats_capacity_bytes{job="kubelet",metrics_path="/metrics",namespace=~".*"} < 0.03
Low Persistent Volume Percentage
A more general alert for Persistent Volumes that are over 80% utilized, prompting proactive storage management.
sum by (persistentvolumeclaim) (kubelet_volume_stats_used_bytes{job="kubelet"} / kubelet_volume_stats_capacity_bytes) * 100.0 > 80
Persistent Volume Errors
Detects Persistent Volumes in an erroneous state, such as 'Failed' or 'Pending', which requires immediate attention.
kube_persistentvolume_status_phase{job="kube-state-metrics",phase=~"Failed|Pending"} > 0
Pod Crashing Loop
Identifies pods that are repeatedly restarting within a short timeframe (e.g., 5 minutes), indicating an application crash or misconfiguration.
rate(kube_pod_container_status_restarts_total{job="kube-state-metrics",namespace=~".*"}[15m]) * 60 * 5 > 0
Customizable message for pod restart alerts:
Pod {{ $labels.namespace }} / {{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf "%.2f" $value }} times / 5 minutes.
Pod Not Running
Alerts when a pod is not in a ready state, which could mean it's failing to start, crashing, or experiencing other issues.
sum by (pod)(kube_pod_status_ready{condition="true"} == 0)
Total Restarts for Container
Tracks the total number of container restarts over a specified period (e.g., 1 hour), useful for diagnosing intermittent issues.
increase(kube_pod_container_status_restarts_total[1h])
# Example with namespace and pod filtering:
# increase(kube_pod_container_status_restarts_total{namespace="my-namespace", pod=~".*prefix.*"}[1h])
OOMKilled Reason for Termination
Specifically detects containers that have been terminated due to Out Of Memory (OOM) errors, a common cause of application failures.
kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}
# Alternative metric:
# container_oom_events_total{name="container-name"}
# Example with namespace filtering:
# kube_pod_container_status_last_terminated_reason{reason="OOMKilled",namespace="my-namespace"}
Less Replicas Than Desired
Monitors Kubernetes Deployments to ensure the number of available replicas matches the desired count, crucial for high availability.
kube_deployment_status_replicas_available{namespace="my-namespace"} / kube_deployment_spec_replicas{namespace="my-namespace"}
Prometheus Down
Alerts if the Prometheus server itself is not being scraped by Prometheus, indicating a critical monitoring system failure.
absent(up{job="prometheus-operator-prometheus",namespace="monitoring"} == 1)
Node Metrics
Node Filesystem Space Filling Up
Warns when a node's filesystem is filling up rapidly, predicting potential space exhaustion within the next 24 hours. This is critical for preventing node instability.
(node_filesystem_avail_bytes{fstype!="",job="node-exporter"} / node_filesystem_size_bytes{fstype!="",job="node-exporter"} * 100 < 40 and predict_linear(node_filesystem_avail_bytes{fstype!="",job="node-exporter"}[6h], 24 * 60 * 60) < 0 and node_filesystem_readonly{fstype!="",job="node-exporter"} == 0)
Node Filesystem Almost Out of Space
Alerts when a node's filesystem has less than 5% of available space remaining, requiring immediate attention to free up disk space.
(node_filesystem_avail_bytes{fstype!="",job="node-exporter"} / node_filesystem_size_bytes{fstype!="",job="node-exporter"} * 100 < 5 and node_filesystem_readonly{fstype!="",job="node-exporter"} == 0)
High Node CPU Usage
Monitors the overall CPU utilization on a node, alerting when it exceeds 80%. High CPU usage can lead to performance degradation across all pods on that node.
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle", instance=~"(.*)"}[5m])) * 100) * on(instance) group_left(nodename) node_uname_info{} > 80
High Node Memory Usage
Tracks the memory utilization on a node, alerting when it exceeds 80%. High memory usage can lead to swapping and reduced performance.
100 * (1 - ((avg_over_time(node_memory_MemFree_bytes[10m]) + avg_over_time(node_memory_Cached_bytes[10m]) + avg_over_time(node_memory_Buffers_bytes[10m])) / avg_over_time(node_memory_MemTotal_bytes[10m]))) * on(instance) group_left(nodename) node_uname_info{} > 80