Loki Rules - Prometheus Alerting Rules - loki Cheatsheets

Prometheus Alerting Rules for Loki

This section details example Prometheus alerting rules configured for Loki. These rules help in monitoring and alerting on specific log patterns or conditions within your logging infrastructure. Effective alerting is crucial for maintaining the health and performance of your systems.

HighThroughputLogStreams Alert

The following rule, HighThroughputLogStreams, is designed to detect a high volume of log streams within a specified time frame. This can indicate an issue such as excessive logging, a potential denial-of-service attack, or a misbehaving application.

groups:
  - name: example
    rules:
    - alert: HighThroughputLogStreams
      expr: sum by (container_name) (count_over_time({container_name=~".*"} |regexp`(?P.*)` [1h])>0)
      for: 20s
      labels:
        severity: "2"
      annotations:
        description: '{{ $labels.instance }} {{ $labels.msg }} memory.'

Understanding the Alert Expression

The expr field defines the condition for triggering the alert. In this case, it sums the count of log entries over the last hour for each container_name. If this sum is greater than 0, and the condition persists for 20 seconds (for: 20s), the alert is fired. The regexp part is a placeholder and should be replaced with a specific pattern relevant to your logs.

Alert Labels and Annotations

labels provide metadata for the alert, such as its severity. annotations offer more detailed information, like a descriptive message that can include dynamic labels from the log data, helping operators quickly understand the context of the alert.

Best Practices for Loki Alerting

When defining Loki alerting rules, consider the following:

Specificity: Make your log pattern matching (regexp) as specific as possible to avoid false positives.
Thresholds: Carefully tune the thresholds (e.g., the time window and count) to match your system's normal behavior.
Context: Ensure your annotations provide enough context for quick diagnosis.
Severity: Assign appropriate severity levels to prioritize responses.

For more advanced configurations and best practices regarding Prometheus alerting rules, refer to the Prometheus Alerting Rules documentation and Loki LogQL documentation.