Gitlab CI Auto-Retry Jobs - Configure Job Retries - gitlab-ci Cheatsheets

This document demonstrates how to configure automatic retries for your Gitlab CI jobs. This is crucial for handling temporary failures and ensuring pipeline reliability.

Configuring Job Retries

The following example shows how to configure a job with retry capabilities. We'll specify the maximum number of retries and the types of failures that trigger a retry.

---
stages:
  - test

test-job:
  stage: test
  interruptible: true
  script:
    - echo "run this"
  retry:
    max: 2 # runs 3 at max -> https://gitlab.com/gitlab-org/gitlab/-/issues/28088
    when:
      - runner_system_failure
      - api_failure
      - stuck_or_timeout_failure
      - scheduler_failure      
      - unknown_failure

Understanding Retry Parameters

`max`

This parameter sets the maximum number of retry attempts. In this example, max: 2 means the job will run a total of 3 times (1 initial attempt + 2 retries).

`when`

This parameter specifies the types of failures that should trigger a retry. The example includes several common failure scenarios:

runner_system_failure: Failures related to the CI runner itself.
api_failure: Failures during communication with the Gitlab API.
stuck_or_timeout_failure: Jobs that get stuck or time out.
scheduler_failure: Failures related to the Gitlab CI scheduler.
unknown_failure: A catch-all for unexpected failures.

By carefully selecting the when conditions, you can ensure that only truly transient errors trigger retries, preventing unnecessary resource consumption.

Further Considerations

For more advanced retry strategies, consider using Gitlab's built-in features or exploring external tools for more sophisticated error handling and retry logic. Remember to monitor your retry rates to identify potential underlying issues in your CI/CD pipeline.