Agent Runtime Operator
The Agent Runtime Operator is a Kubernetes operator that provides a unified, framework-agnostic way to deploy and manage agentic workloads on Kubernetes. It uses Custom Resource Definitions (CRDs) to abstract away framework-specific configurations, simplifying the management of components like wiring of agents with a observability stack or model routers.
The operator is built with the Operator SDK framework. Make sure to be familiar with the Operator SDK concepts when working with this project.
Getting Started
Get started with the Agent Runtime Operator by following our comprehensive guides:
Development
Prerequisites
Before contributing to this project, ensure you have the following tools installed:
- Go: version 1.24.0 or higher
- Docker: version 20.10+ (or a compatible alternative like Podman)
- kubectl: The Kubernetes command-line tool
- kind: For running Kubernetes locally in Docker
- make: The build automation tool
- Git: For version control
Local Environment
Set up your local Kubernetes environment.
You can use any Kubernetes cluster. Kind is used for E2E tests and is used exemplarily here for local development.
# Create a local Kubernetes cluster (or use an existing one)
kind create cluster
Install cert-manager in the cluster, as it is required for webhook certificate management:
# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.19.1/cert-manager.yaml
# Wait for cert-manager to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=cert-manager -n cert-manager --timeout=60s
Build and Deploy
Build and deploy the operator locally:
# Install CRDs into the cluster
make install
# Build docker image
make docker-build
# Load image into kind cluster (not needed if using local registry)
make kind-load
# Deploy the operator to the cluster
make deploy
Run unit tests, code formatting checks, and static analysis whenever you make changes:
# Run unit tests, code formatter and static analysis
make test
Linting
Several linters are configured via golangci-lint. Run the linter to ensure code quality:
# Run golangci-lint
make lint
You can also auto-fix some issues:
# Auto-fix linting issues
make lint-fix
End-to-End Tests
The E2E tests automatically create an isolated Kind cluster, deploy the operator, run comprehensive tests, and clean up afterward.
# Run complete E2E test suite
make test-e2e
E2E Test Features:
- Operator deployment verification
- CRD installation testing
- Webhook functionality testing
- Metrics endpoint validation
- Certificate management verification
Manual E2E Test Management:
For faster iteration, you can manually set up and tear down the test cluster and run tests against it.
# Set up test cluster manually
make setup-test-e2e
# Run tests against existing cluster
KIND_CLUSTER=agent-runtime-operator-test-e2e go test ./test/e2e/ -v -ginkgo.v
# Clean up test cluster
make cleanup-test-e2e
Clean up failed test runs:
When E2E tests fail, the cleanup step is not run to allow for manual analysis of the failure. As the cluster is in an undefined state, it is recommended to delete the cluster and start fresh:
make cleanup-test-e2e
Create or Update API and Webhooks
The operator-sdk CLI can be used to create or update APIs and webhooks.
This is the preferred way to add new APIs and webhooks to the operator.
If the operator-sdk CLI is updated, you may need to re-run these commands to update the generated code.
# Create API for Agent CRD
operator-sdk create api --group runtime --version v1alpha1 --kind Agent
# Create webhook for Agent CRD
operator-sdk create webhook --group runtime --version v1alpha1 --kind Agent --defaulting --programmatic-validation
Contribution
See Contribution Guide for details on contribution, and the process for submitting pull requests.