alert-examples
Explore Prometheus alert examples for monitoring HTTP requests, including 5xx error rates, total errors, and request latencies. Learn to track service performance and identify issues.
Prometheus Alert Examples
HTTP Request Monitoring Alerts
This section provides example Prometheus alerts for monitoring HTTP requests. Effective monitoring of HTTP traffic is crucial for understanding application performance, identifying errors, and ensuring a smooth user experience. These examples focus on key metrics like error rates and request latencies.
5xx Error Rate Alerts
Alerts for 5xx errors are vital for detecting server-side issues that prevent requests from being fulfilled. The following PromQL query calculates the average rate per second of 5xx errors during a minute, aggregated by service, pod, and URI. This helps pinpoint specific components experiencing failures.
sum(rate(http_server_requests_seconds_count{status=~"5[0-9][0-9]"}[1m])) by (service, pod, uri)
To understand the total volume of 5xx errors, you can multiply the rate by 60 to get the count per minute for each group:
sum(rate(http_server_requests_seconds_count{status=~"5[0-9][0-9]"}[1m])) by (service, pod, uri) * 60
Request Latency Monitoring
Monitoring request latency is essential for assessing the responsiveness of your services. The following query calculates the average response time for HTTP requests across different status codes (2xx, 4xx, 5xx), grouped by service and URI. This helps identify performance bottlenecks.
sum by (service, uri) (
rate(http_server_requests_seconds_sum{status=~"[2-5][0-9][0-9]", service=~".*"}[1m])
/
rate(http_server_requests_seconds_count{status=~"[2-5][0-9][0-9]", service=~".*"}[1m])
)
To get the total seconds spent on requests within the last minute, you can multiply the average response time by the total number of requests in that minute:
sum by (service, uri) (
rate(http_server_requests_seconds_sum{status=~"[2-5][0-9][0-9]", service=~".*"}[1m])
/
rate(http_server_requests_seconds_count{status=~"[2-5][0-9][0-9]", service=~".*"}[1m])
) * 60