CONTAINER_METRICS
Explore Prometheus queries and Grafana variables for container metrics, leveraging cAdvisor data for ECS clusters. Monitor CPU, memory, and network usage effectively.
Container Metrics
Prometheus Container Metrics with cAdvisor
This guide provides examples for Prometheus focused on Container Level Metrics, scraped from cAdvisor. It details the setup requirements, Grafana variables, and example Prometheus queries for monitoring container performance within an ECS Cluster.
Setup Requirements
To effectively monitor container metrics using Prometheus and cAdvisor, ensure your ECS Cluster is configured with the following:
- cAdvisor Deployment: cAdvisor should be running on the cluster. Refer to this cadvisor_taskdef.json for task definition examples.
- Prometheus Scrape Configuration: Configure your Prometheus scrape job to collect metrics from cAdvisor. An example configuration is provided below:
# cadvisor
- job_name: container-metrics
scrape_interval: 15s
ec2_sd_configs:
- region: eu-west-1
role_arn: 'arn:aws:iam::xxxxxxxxxxxx:role/prometheus-ec2-role'
port: 9100
filters:
- name: tag:PrometheusContainerScrape
values:
- Enabled
relabel_configs:
- source_labels: [__meta_ec2_private_ip]
replacement: '${1}:8080'
target_label: __address__
- source_labels: [__meta_ec2_tag_Name]
target_label: instance
- source_labels: [__meta_ec2_tag_ECSClusterName]
target_label: cluster_name
Grafana Variables for Dynamic Dashboards
Enhance your Grafana dashboards with dynamic variables for easier data exploration and filtering.
Interval Variable
Allows users to select the time interval for graphs and queries.
Name: interval
Label: Interval
Type: interval
Values: 1m,10m,30m,1h,6h,12h,1d,7d,14d,30d
ECS Cluster Name Variable
Filter metrics by a specific ECS cluster.
Name: cluster_name
Label: ECS Cluster
Type: Query
Values: label_values(cadvisor_version_info, cluster_name)
Service Name Variable
Filter metrics by a specific service name within an ECS cluster.
Name: service_name
Label: Service Name
Type: Query
Values: label_values(container_cpu_load_average_10s{cluster_name=~"$cluster_name"}, container_label_com_amazonaws_ecs_container_name)
Example Prometheus Queries for Container Monitoring
These queries are designed for use in Grafana panels to visualize and analyze container performance.
CPU Metrics
Container CPU Utilization (Graph)
Displays the rate of CPU time consumed by each container.
sum(rate(container_cpu_usage_seconds_total{name=~".+", cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster) * 100
Aggregated CPU Utilization by Service (Gauge)
Shows the total CPU utilization for each service.
sum(sum(rate(container_cpu_usage_seconds_total{name=~".+", cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by ( container_label_com_amazonaws_ecs_container_name) * 100)
Memory Metrics
Container Memory Usage (RSS) (Graph)
Visualizes the Resident Set Size (RSS) memory used by each container.
sum(container_memory_rss{cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster)
Aggregated Memory Usage by Service (Gauge)
Displays the total memory usage for each service.
sum(sum(container_memory_rss{name=~".+", cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster))
Memory Utilization Percentage
Calculates the percentage of memory used relative to the container's limit.
avg((avg (container_memory_working_set_bytes{name=~".+"}) by (name, instance ))/ on (name, instance)(avg (container_spec_memory_limit_bytes>0 ) by (name, instance))*100)
Memory Limits
Shows the configured memory limits for containers.
container_spec_memory_limit_bytes{container_label_com_docker_compose_service=~"$service_name", instance=~"$host"}
Network Metrics
Incoming Network Traffic per Container (Graph)
Monitors the rate of incoming network data for each container.
sum(rate(container_network_receive_bytes_total{cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster)
Outgoing Network Traffic per Container (Graph)
Monitors the rate of outgoing network data for each container.
sum(rate(container_network_transmit_bytes_total{cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster)
Combined Network Traffic with Direction Inversion
A common Grafana panel configuration to display both incoming and outgoing traffic, with outgoing traffic inverted for clarity.
Incoming:
Legend => down: {{name}}
Outgoing:
Legend => up: {{name}}
Series Overrides:
Alias or regex => /.*up.*/
Transform => negative-y
Aggregated Incoming Network Traffic by Service (Gauge)
Shows the total incoming network traffic for each service.
sum(sum(rate(container_network_receive_bytes_total{cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster))
Aggregated Outgoing Network Traffic by Service (Gauge)
Shows the total outgoing network traffic for each service.
sum(sum(rate(container_network_transmit_bytes_total{cluster_name=~"$cluster_name", container_label_com_amazonaws_ecs_container_name=~"$service_name"}[$interval])) by (name, container_label_com_amazonaws_ecs_container_name, container_label_com_amazonaws_ecs_cluster))