alertmanager_fargate_loki

Configure Alertmanager to run on AWS Fargate with Loki for centralized logging. This guide provides a sample ECS task definition for seamless integration.

Alertmanager Fargate Loki Deployment Configuration

This document outlines a sample AWS Elastic Container Service (ECS) task definition for deploying Alertmanager on AWS Fargate, integrated with Grafana Loki for centralized log management. This configuration leverages the power of Fargate for serverless container orchestration and Loki for efficient log aggregation.

Alertmanager Container Configuration

The primary container definition for Alertmanager specifies its image, resource allocation, port mappings, and environment variables. Crucially, it includes secrets for webhook URLs and API keys, ensuring secure communication with external services like Slack and Opsgenie.

Log Configuration with AWS FireLens and Fluent Bit

A key aspect of this setup is the integration with AWS FireLens and Fluent Bit. The logConfiguration within the Alertmanager container is set to awsfirelens, directing logs to the log router. The log_router container, using the grafana/fluent-bit-plugin-loki image, is configured to send these logs to a specified Loki endpoint. This ensures that all container logs are captured and sent to Loki for analysis and monitoring.

Key Log Configuration Options

  • logDriver: awsfirelens: Directs logs to the FireLens log router.
  • secretOptions: Securely retrieves the Loki log URL from AWS Systems Manager Parameter Store.
  • options: Configures Fluent Bit with specific settings like log driver name, label configuration, and key removal for cleaner log data.
  • enable-ecs-log-metadata: "true": Injects ECS task metadata into logs for better context.

AWS Fargate and Network Mode

The task definition specifies FARGATE as a required compatibility and uses the awsvpc network mode, which is standard for Fargate deployments. This provides each task with its own elastic network interface and IP address, enhancing security and network isolation.

IAM Roles and Permissions

Appropriate IAM roles, such as ecsTaskExecutionRole and ecs-taskrole-tooling, are essential for the ECS tasks to pull images from ECR, access SSM parameters, and perform other necessary AWS operations. Ensure these roles have the required permissions.

This configuration provides a robust foundation for deploying Alertmanager with centralized logging on AWS Fargate, enabling effective monitoring and alerting for your applications.

{
  "family": "alertmanager",
  "executionRoleArn":"arn:aws:iam::xxxxxxxxxxxx:role/ecsTaskExecutionRole",
  "taskRoleArn":"arn:aws:iam::xxxxxxxxxxxx:role/ecs-taskrole-tooling",
  "cpu": "256",
  "memory": "512",
  "networkMode": "awsvpc",
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "containerDefinitions": [
    {
      "name": "alertmanager",
      "image": "xxxxxxxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/alertmanager:0.24.0",
      "memoryReservation": 64,
      "portMappings":[
        {
          "protocol":"tcp",
          "containerPort":9093,
          "hostPort":9093
        }
      ],
      "environment": [
        {
          "name": "SLACK_ICON",
          "value": ":fire:"
        }
      ],
      "secrets": [
        {
          "name": "SLACK_WEBHOOK_URL",
          "valueFrom": "arn:aws:ssm:eu-west-1:xxxxxxxxxxxx:parameter/prometheus/prod/SLACK_WEBHOOK_URL"
        },
        {
          "name": "GENERAL_OPSGENIE_API_KEY",
          "valueFrom": "arn:aws:ssm:eu-west-1:xxxxxxxxxxxx:parameter/prometheus/prod/GENERAL_OPSGENIE_API_KEY"
        },
        {
          "name": "DEVOPS_OPSGENIE_API_KEY",
          "valueFrom": "arn:aws:ssm:eu-west-1:xxxxxxxxxxxx:parameter/prometheus/prod/DEVOPS_OPSGENIE_API_KEY"
        },
        {
          "name": "ALERTMANAGER_URL",
          "valueFrom": "arn:aws:ssm:eu-west-1:xxxxxxxxxxxx:parameter/prometheus/prod/ALERTMANAGER_URL"
        }
      ],
      "essential": true,
      "logConfiguration": {
        "logDriver": "awsfirelens",
        "secretOptions": [
          {
            "name": "Url",
            "valueFrom": "arn:aws:ssm:eu-west-1:xxxxxxxxxxxx:parameter/prometheus/prod/LOKI_LOG_URL"
          }
        ],
        "options": {
          "Name": "grafana-loki",
          "Labels": "{job=\"container-logs\"}",
          "RemoveKeys": "container_id,ecs_task_arn",
          "LabelKeys": "container_name,ecs_task_definition,source,ecs_cluster",
          "LineFormat": "key_value"
        }
      }
    },
    {
      "essential": true,
      "image": "grafana/fluent-bit-plugin-loki:2.5.0-amd64",
      "name": "log_router",
      "firelensConfiguration": {
        "type": "fluentbit",
        "options": {
          "enable-ecs-log-metadata": "true"
        }
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/devops-tools-cluster/sidecars",
          "awslogs-region": "eu-west-1",
          "awslogs-create-group": "true",
          "awslogs-stream-prefix": "alertmanager-firelens"
        }
      },
      "memoryReservation": 50
    }
  ]
}