Container Metrics

Prometheus Container Metrics with cAdvisor

This guide provides examples for Prometheus focused on Container Level Metrics, scraped from cAdvisor. It details the setup requirements, Grafana variables, and example Prometheus queries for monitoring container performance within an ECS Cluster.

Setup Requirements

To effectively monitor container metrics using Prometheus and cAdvisor, ensure your ECS Cluster is configured with the following:

cAdvisor Deployment: cAdvisor should be running on the cluster. Refer to this cadvisor_taskdef.json for task definition examples.
Prometheus Scrape Configuration: Configure your Prometheus scrape job to collect metrics from cAdvisor. An example configuration is provided below:


  # cadvisor
  - job_name: container-metrics
    scrape_interval: 15s
    ec2_sd_configs:
    - region: eu-west-1
      role_arn: 'arn:aws:iam::xxxxxxxxxxxx:role/prometheus-ec2-role'
      port: 9100
      filters:
        - name: tag:PrometheusContainerScrape
          values:
            - Enabled
    relabel_configs:
    - source_labels: [__meta_ec2_private_ip]
      replacement: '${1}:8080'
      target_label: __address__
    - source_labels: [__meta_ec2_tag_Name]
      target_label: instance
    - source_labels: [__meta_ec2_tag_ECSClusterName]
      target_label: cluster_name

Grafana Variables for Dynamic Dashboards

Enhance your Grafana dashboards with dynamic variables for easier data exploration and filtering.

Interval Variable

Allows users to select the time interval for graphs and queries.


Name: interval
Label: Interval
Type: interval
Values: 1m,10m,30m,1h,6h,12h,1d,7d,14d,30d

ECS Cluster Name Variable

Filter metrics by a specific ECS cluster.


Name: cluster_name
Label: ECS Cluster
Type: Query
Values: label_values(cadvisor_version_info,  cluster_name)

Service Name Variable

Filter metrics by a specific service name within an ECS cluster.


Name: service_name
Label: Service Name
Type: Query
Values: label_values(container_cpu_load_average_10s{cluster_name=~"$cluster_name"}, container_label_com_amazonaws_ecs_container_name)

Example Prometheus Queries for Container Monitoring

These queries are designed for use in Grafana panels to visualize and analyze container performance.

CPU Metrics

Container CPU Utilization (Graph)

Displays the rate of CPU time consumed by each container.


sum(rate(container_cpu_usage_seconds_total{name=~".+", cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster) * 100

Aggregated CPU Utilization by Service (Gauge)

Shows the total CPU utilization for each service.


sum(sum(rate(container_cpu_usage_seconds_total{name=~".+", cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by ( container_label_com_amazonaws_ecs_container_name) * 100)

Memory Metrics

Container Memory Usage (RSS) (Graph)

Visualizes the Resident Set Size (RSS) memory used by each container.


sum(container_memory_rss{cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster)

Aggregated Memory Usage by Service (Gauge)

Displays the total memory usage for each service.


sum(sum(container_memory_rss{name=~".+", cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster))

Memory Utilization Percentage

Calculates the percentage of memory used relative to the container's limit.


avg((avg (container_memory_working_set_bytes{name=~".+"}) by (name, instance ))/ on (name, instance)(avg (container_spec_memory_limit_bytes>0 ) by (name, instance))*100)

Memory Limits

Shows the configured memory limits for containers.


container_spec_memory_limit_bytes{container_label_com_docker_compose_service=~"$service_name", instance=~"$host"}

Network Metrics

Incoming Network Traffic per Container (Graph)

Monitors the rate of incoming network data for each container.


sum(rate(container_network_receive_bytes_total{cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster)

Outgoing Network Traffic per Container (Graph)

Monitors the rate of outgoing network data for each container.


sum(rate(container_network_transmit_bytes_total{cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster)

Combined Network Traffic with Direction Inversion

A common Grafana panel configuration to display both incoming and outgoing traffic, with outgoing traffic inverted for clarity.

Incoming:


Legend => down: {{name}}

Outgoing:


Legend => up: {{name}}

Series Overrides:


Alias or regex => /.*up.*/
Transform => negative-y

Aggregated Incoming Network Traffic by Service (Gauge)

Shows the total incoming network traffic for each service.


sum(sum(rate(container_network_receive_bytes_total{cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster))

Aggregated Outgoing Network Traffic by Service (Gauge)

Shows the total outgoing network traffic for each service.


sum(sum(rate(container_network_transmit_bytes_total{cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster))

Container Metrics - Prometheus & cAdvisor Examples