contenox/runtime: GenAI Orchestration Runtime

contenox/runtime is an open-source runtime for orchestrating generative AI workflows. It treats AI workflows as state machines, enabling:
β
Declarative workflow definition
β
Built-in state management
β
Vendor-agnostic execution
β
Multi-backend orchestration
β
Observability with passion
β
Made with Go for intensive load
β
Build Agentic capabilities via Hooks
This is the core for the contenox ecosystem.
β‘ Get Started in 3 Minutes
Startup flow:
- Launch runtime services via Docker
- Register one or more model backends
- Assign each backend to the relevant execution pools
- Pull required models to the backends
- Execute a workflow and observe the logs
See first workflow runs in under 5 minutes:
Prerequisites
- Docker and Docker Compose
curl and jq (for CLI examples)
1. Launch the Runtime
git clone https://github.com/contenox/runtime.git
cd runtime
docker compose build
docker compose up -d
This starts the complete environment:
- Runtime API (port 8081)
- Ollama (port 11435)
- Postgres, NATS, Valkey, and tokenizer services
2. Register Ollama Backends
BACKEND_ID=$(curl -s -X POST http://localhost:8081/backends \
-H "Content-Type: application/json" \
-d '{
"name": "local-ollama",
"baseURL": "http://ollama:11435",
"type": "ollama"
}' | jq -r '.id')
echo "Backend ID: $BACKEND_ID"
3. Assign Backend to Default Pools
# For task execution
curl -X POST http://localhost:8081/backend-associations/internal_tasks_pool/backends/$BACKEND_ID
# For embeddings
curl -X POST http://localhost:8081/backend-associations/internal_embed_pool/backends/$BACKEND_ID
4. Wait for Models to Download
EMBED_MODEL="nomic-embed-text:latest"
TASK_MODEL="qwen3:4b"
echo "β³ Downloading models (2β5 minutes)..."
while true; do
STATUS=$(curl -s http://localhost:8081/backends/$BACKEND_ID)
if jq -e ".pulledModels[] | select(.model == \"$EMBED_MODEL\")" <<< "$STATUS" >/dev/null && \
jq -e ".pulledModels[] | select(.model == \"$TASK_MODEL\")" <<< "$STATUS" >/dev/null; then
echo "β
Models ready!"
break
fi
sleep 10
echo "β³ Still downloading..."
done
5. Execute a Prompt
curl -X POST http://localhost:8081/execute \
-H "Content-Type: application/json" \
-d '{"prompt": "Explain quantum computing"}'
6. Create a Tasks-Chain Workflow
Save as qa.json
{
"input": "What's the best way to optimize database queries?",
"inputType": "string",
"chain": {
"id": "smart-query-assistant",
"description": "Handles technical questions",
"tasks": [
{
"id": "generate_response",
"description": "Generate final answer",
"handler": "raw_string",
"systemInstruction": "You're a senior engineer. Provide concise, professional answers to technical questions.",
"transition": {
"branches": [
{
"operator": "default",
"goto": "end"
}
]
}
}
]
}
}
Execute the workflow:
curl -X POST http://localhost:8081/tasks \
-H "Content-Type: application/json" \
-d @qa.json
All runtime activity is captured in structured logs, providing deep observability into workflows and system behavior.
Logs can be viewed using Docker:
docker logs contenox-runtime-kernel
Logs, Metrics and Request-Traces are also pushed to valkey.
β¨ Key Features
State Machine Engine
- Conditional Branching: Route execution based on LLM outputs
- Built-in Handlers:
condition_key: Validate and route responses
parse_number: Extract numerical values
parse_range: Handle score ranges
raw_string: Standard text generation
embedding: Embedding generation
model_execution: Model execution on a chat history
hook: Calls a user-defined hook pointing to an external service
- Context Preservation: Automatic input/output passing between steps
- Multi-Model Support: Define preferred Models for each Chain-Task.
- Conditional Branching: Logical operators to route execution based on LLM outputs.
- Retry and Timeout: Configure task-level retries and timeouts for robust workflows.
Multi-Provider Support
Define preferred model provider and backend resolution policy directly within task chains. This allows for seamless, dynamic orchestration across various LLM providers.
graph LR
A[Workflow] --> B(Ollama)
A --> C(OpenAI)
A --> D(Gemini)
A --> E(vLLM)
- Unified Interface: Consistent API across providers
- Automatic Sync: Models stay consistent across backends
- Pool Management: Assign backends to specific task types
- Backend Resolver: Distribute requests to backends based on resolution policies
π§© Extensibility
Custom Hooks
Hooks are external servers that can be called from within task chains when registered.
They allow to interact with systems and data outside of the runtime and Tasks-Chains itself.
π See Hook Documentation
π API Documentation
The full API surface is thoroughly documented and defined in the OpenAPI format, making it easy to integrate with other tools. You can find more details here:
Also review the tests for context to the API documentation.