KUBERNETES

Monitor Kubernetes cluster health with Prometheus metrics. Explore container and node metrics for resource usage, performance, and potential issues.

Kubernetes Prometheus Metrics

Container Metrics

Pod Resource Constraints

This section details metrics related to pod memory usage and limits, specifically when resource requests and limits are configured. Understanding these metrics is crucial for efficient resource allocation and preventing OOMKilled events.

process_resident_memory_bytes{pod="prometheus-prometheus-kube-prometheus-prometheus-0"}
kube_pod_container_resource_requests{resource="memory", pod="kube-prometheus-prometheus-0"}
kube_pod_container_resource_limits{resource="memory", pod="kube-prometheus-prometheus-0"}

High CPU Throttling

Alerts when a container experiences significant CPU throttling. This indicates that the container is not receiving enough CPU resources to meet its demands, potentially impacting application performance.

sum by(container, pod, namespace) (increase(container_cpu_cfs_throttled_periods_total{container!=""}[5m])) / sum by(container, pod, namespace) (increase(container_cpu_cfs_periods_total[5m])) > (25 / 100)

Kube API Server Down

Detects when the Kubernetes API server is no longer discoverable by Prometheus, indicating a critical failure in the control plane.

absent(up{job="apiserver"} == 1)

Kubelet Down

Alerts if the Kubelet, responsible for managing pods on a node, disappears from Prometheus target discovery, signaling a potential node issue.

absent(up{job="kubelet"} == 1)

High Kube API Errors

Monitors the rate of HTTP 5xx errors returned by the Kubernetes API server, indicating potential instability or overload.

sum by(resource, subresource, verb) (rate(apiserver_request_total{code=~"5..",job="apiserver"}[5m])) / sum by(resource, subresource, verb) (rate(apiserver_request_total{job="apiserver"}[5m])) > 0.1

High Kube API Latency

Identifies abnormal latency in API server requests, helping to pinpoint performance bottlenecks within the Kubernetes control plane.

(cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} > on(verb) group_left() (avg by(verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0) + 2 * stddev by(verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0))) > on(verb) group_left() 1.2 * avg by(verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0) and on(verb, resource) cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job="apiserver",quantile="0.99"} > 1

Container in Waiting State

Alerts when a pod's container has been in a waiting state for an extended period (e.g., over an hour), which could indicate issues with image pulling, scheduling, or readiness probes.

sum by(namespace, pod, container) (kube_pod_container_status_waiting_reason{job="kube-state-metrics",namespace=~".*"}) > 0

Deployment Replicas Mismatch

Notifies when a Kubernetes Deployment has a mismatch between the desired and available replicas for more than 15 minutes, suggesting a problem with scaling or rollout.

(kube_deployment_spec_replicas{job="kube-state-metrics",namespace=~".*"} != kube_deployment_status_replicas_available{job="kube-state-metrics",namespace=~".*"}) and (changes(kube_deployment_status_replicas_updated{job="kube-state-metrics",namespace=~".*"} [5m]) == 0)

StatefulSet Replicas Mismatch

Similar to Deployments, this alerts on discrepancies between desired and ready replicas for StatefulSets, crucial for stateful applications.

(kube_statefulset_status_replicas_ready{job="kube-state-metrics",namespace=~".*"} != kube_statefulset_status_replicas{job="kube-state-metrics",namespace=~".*"}) and (changes(kube_statefulset_status_replicas_updated{job="kube-state-metrics",namespace=~".*"} [5m]) == 0)

Persistent Volume Filling Up

Warns when a Persistent Volume (PV) is running low on free space, providing a percentage of available space. This helps prevent data loss due to full storage.

kubelet_volume_stats_available_bytes{job="kubelet",metrics_path="/metrics",namespace=~".*"} / kubelet_volume_stats_capacity_bytes{job="kubelet",metrics_path="/metrics",namespace=~".*"} < 0.03

Low Persistent Volume Percentage

A more general alert for Persistent Volumes that are over 80% utilized, prompting proactive storage management.

sum by (persistentvolumeclaim) (kubelet_volume_stats_used_bytes{job="kubelet"} / kubelet_volume_stats_capacity_bytes) * 100.0 > 80

Persistent Volume Errors

Detects Persistent Volumes in an erroneous state, such as 'Failed' or 'Pending', which requires immediate attention.

kube_persistentvolume_status_phase{job="kube-state-metrics",phase=~"Failed|Pending"} > 0

Pod Crashing Loop

Identifies pods that are repeatedly restarting within a short timeframe (e.g., 5 minutes), indicating an application crash or misconfiguration.

rate(kube_pod_container_status_restarts_total{job="kube-state-metrics",namespace=~".*"}[15m]) * 60 * 5 > 0

Customizable message for pod restart alerts:

Pod {{ $labels.namespace }} / {{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf "%.2f" $value }} times / 5 minutes.

Pod Not Running

Alerts when a pod is not in a ready state, which could mean it's failing to start, crashing, or experiencing other issues.

sum by (pod)(kube_pod_status_ready{condition="true"} == 0)

Total Restarts for Container

Tracks the total number of container restarts over a specified period (e.g., 1 hour), useful for diagnosing intermittent issues.

increase(kube_pod_container_status_restarts_total[1h])
# Example with namespace and pod filtering:
# increase(kube_pod_container_status_restarts_total{namespace="my-namespace", pod=~".*prefix.*"}[1h])

OOMKilled Reason for Termination

Specifically detects containers that have been terminated due to Out Of Memory (OOM) errors, a common cause of application failures.

kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}
# Alternative metric:
# container_oom_events_total{name="container-name"}
# Example with namespace filtering:
# kube_pod_container_status_last_terminated_reason{reason="OOMKilled",namespace="my-namespace"}

Less Replicas Than Desired

Monitors Kubernetes Deployments to ensure the number of available replicas matches the desired count, crucial for high availability.

kube_deployment_status_replicas_available{namespace="my-namespace"} / kube_deployment_spec_replicas{namespace="my-namespace"}

Prometheus Down

Alerts if the Prometheus server itself is not being scraped by Prometheus, indicating a critical monitoring system failure.

absent(up{job="prometheus-operator-prometheus",namespace="monitoring"} == 1)

Node Metrics

Node Filesystem Space Filling Up

Warns when a node's filesystem is filling up rapidly, predicting potential space exhaustion within the next 24 hours. This is critical for preventing node instability.

(node_filesystem_avail_bytes{fstype!="",job="node-exporter"} / node_filesystem_size_bytes{fstype!="",job="node-exporter"} * 100 < 40 and predict_linear(node_filesystem_avail_bytes{fstype!="",job="node-exporter"}[6h], 24 * 60 * 60) < 0 and node_filesystem_readonly{fstype!="",job="node-exporter"} == 0)

Node Filesystem Almost Out of Space

Alerts when a node's filesystem has less than 5% of available space remaining, requiring immediate attention to free up disk space.

(node_filesystem_avail_bytes{fstype!="",job="node-exporter"} / node_filesystem_size_bytes{fstype!="",job="node-exporter"} * 100 < 5 and node_filesystem_readonly{fstype!="",job="node-exporter"} == 0)

High Node CPU Usage

Monitors the overall CPU utilization on a node, alerting when it exceeds 80%. High CPU usage can lead to performance degradation across all pods on that node.

100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle", instance=~"(.*)"}[5m])) * 100) * on(instance) group_left(nodename) node_uname_info{} > 80

High Node Memory Usage

Tracks the memory utilization on a node, alerting when it exceeds 80%. High memory usage can lead to swapping and reduced performance.

100 * (1 - ((avg_over_time(node_memory_MemFree_bytes[10m]) + avg_over_time(node_memory_Cached_bytes[10m]) + avg_over_time(node_memory_Buffers_bytes[10m])) / avg_over_time(node_memory_MemTotal_bytes[10m]))) * on(instance) group_left(nodename) node_uname_info{} > 80