GitLab Runner Pod Cleanup
Introduction
Pod Cleanup is an application that runs inside your Kubernetes cluster and periodically checks for orphaned pods. We initially developed it for the cases where
GitLab Runner Manager did not have the chance to clean up the pods it created, like when a manager pod got evicted. However, you can also use it for general pod cleanup. For details, see this issue.
To find the pods that need to be cleaned up, Pod Cleanup looks for pods with the pod-cleanup.gitlab.com/ttl pod annotation (configurable).
The value is any valid duration string that can be parsed with the Go time.ParseDuration.
For example pod-cleanup.gitlab.com/ttl: 1h means that the pod can be deleted after it's existed for 1 hour. See Application Configuration for details on how to configure Pod Cleanup.
How GitLab Runner Pod Cleanup works
When Pod Cleanup finds a pod that is expired, it tries to delete it. When multiple expired pods are found, it sends the delete request sequentially for the initial iteration.
sequenceDiagram
loop pod-cleanup
pod-cleanup->>+kubernetes: GET /api/v1/pods
kubernetes->>-pod-cleanup: Paged list of Pods
loop Cleanup
Note right of pod-cleanup: Check if the TTL has expired
pod-cleanup->>+kubernetes: DELETE /api/v1/namespaces/{namespace}/pods/{pod-name}
end
end
Usage
Basic
Out of the box, with the default configuration Pod Cleanup should handle most cases.
To install the latest Pod Cleanup from main in your cluster, you can use the default pod-cleanup.yml in the root of this repo:
kubectl apply -f https://gitlab.com/gitlab-org/ci-cd/gitlab-runner-pod-cleanup/-/raw/main/pod-cleanup.yml
If you want Pod Cleanup to clean up GitLab Runner pods older than 1 hour, set the following in your config.toml:
[[runners]]
name = "kubernetes-pod-cleanup"
url = "https://gitlab.example.com/"
token = "..."
executor = "kubernetes"
[runners.kubernetes.pod_annotations]
"pod-cleanup.gitlab.com/ttl" = "1h"
Advanced
See docs/README.md for more advanced use.
Application configuration
The application is configured through environment variables. The available variables are:
| Setting |
Description |
POD_CLEANUP_LOG_LEVEL |
Log level for the GitLab Runner Pod Cleanup application. Default: info |
POD_CLEANUP_LOG_FORMAT |
Format used for logging. Accepted values: text and json. Default: json |
POD_CLEANUP_LIMIT |
Maximum number of pods to be deleted per each POD_CLEANUP_INTERVAL tick. Default: 15 |
POD_CLEANUP_MAX_ERR_ALLOWED |
The maximum number of errors allowed when deleting a pod. When this limit is reached, Pod Cleanup adds the pod to a denylist and skips it if encountered again. Default: 5 |
POD_CLEANUP_INTERVAL |
Deletion interval. It is an unsigned sequence of decimal numbers, each with optional fraction and a unit suffix, such as 300ms, 1.5h or 2h45m. Valid time units are ns, us (or µs), ms, s, m, h. Minimum: 1s. Default: 60s |
POD_CLEANUP_CACHE_CLEANUP_INTERVAL |
The maximum amount of time before the cache saving the faulty pods is cleaned up. It is an unsigned sequence of decimal numbers, each with optional fraction and a unit suffix, such as 300ms, 1.5h or 2h45m. Valid time units are ns, us (or µs), ms, s, m, h. Minimum: 15m. Default: 30m |
POD_CLEANUP_CACHE_EXPIRATION |
The maximum amount of time before a faulty pod expires from the cache. It is an unsigned sequence of decimal numbers, each with optional fraction and a unit suffix, such as 300ms, 1.5h or 2h45m. Valid time units are ns, us (or µs), ms, s, m, h. Minimum: 30m. Default: 1.5h |
POD_CLEANUP_KUBERNETES_REQUEST_TIMEOUT |
The maximum amount of time a Kubernetes API request can take. It is an unsigned sequence of decimal numbers, each with optional fraction and a unit suffix, such as 300ms, 1.5h or 2h45m. Valid time units are ns, us (or µs), ms, s, m, h. Minimum: 5s. Default: 30s |
POD_CLEANUP_KUBERNETES_NAMESPACES |
List of the namespaces to search for Kubernetes pods. Multiple namespaces are comma-separated ,. Default: default |
POD_CLEANUP_KUBERNETES_ANNOTATION |
Annotation to consider when looking for the ttl setting. Default: pod-cleanup.gitlab.com/ttl |
POD_CLEANUP_KUBERNETES_REQUEST_LIMIT |
Limit the number of pods to retrieve per API request when getting existing pod. Minimum: 100. Default: 500 |
LICENSE
MIT