agenticmodel

package

v1.0.0-beta.113 Latest Latest Go to latest Published: Jun 19, 2026 License: MIT Imports: 32 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/c360studio/semstreams

Links

Open Source Insights

README ¶

agentic-model

Model integration component for the agentic processing system.

Overview

The agentic-model component routes agent requests to OpenAI-compatible LLM endpoints. It receives AgentRequest messages from the loop orchestrator, calls the appropriate model endpoint, and publishes AgentResponse messages back. Supports multiple named endpoints, tool calling, retry with backoff, and token tracking.

Architecture

┌───────────────┐     ┌────────────────┐     ┌──────────────────┐
│ agentic-loop  │────►│ agentic-model  │────►│ LLM Endpoint     │
│               │     │                │     │ (OpenAI, Ollama) │
│               │◄────│                │◄────│                  │
└───────────────┘     └────────────────┘     └──────────────────┘
  agent.request.*       HTTP/HTTPS          OpenAI-compatible
  agent.response.*                          /v1/chat/completions

Features

Multiple Endpoints: Configure different models/providers by name
OpenAI Compatible: Works with OpenAI, Ollama, LiteLLM, vLLM, etc.
Tool Support: Full tool calling (function calling) support
Retry Logic: Exponential backoff with jitter via pkg/retry
Rate Limiting: Per-endpoint token bucket rate limiting and concurrency control
Endpoint Throttling: Semaphore-based concurrency cap shared across all agents
Token Tracking: Tracks prompt and completion tokens

Configuration

{
  "type": "processor",
  "name": "agentic-model",
  "enabled": true,
  "config": {
    "stream_name": "AGENT",
    "timeout": "120s",
    "retry": {
      "max_attempts": 3,
      "initial_delay": "1s",
      "max_delay": "60s",
      "rate_limit_delay": "5s"
    },
    "ports": {
      "inputs": [
        {"name": "requests", "type": "jetstream", "subject": "agent.request.>", "stream_name": "AGENT"}
      ],
      "outputs": [
        {"name": "responses", "type": "jetstream", "subject": "agent.response.*", "stream_name": "AGENT"}
      ]
    }
  }
}

Endpoint configurations including rate limits are defined in the top-level model_registry config block, not inline in this component. See the Model Registry section for endpoint configuration.

Configuration Options

Option	Type	Default	Description
`timeout`	string	"120s"	Request timeout
`stream_name`	string	"AGENT"	JetStream stream name
`consumer_name_suffix`	string	""	Suffix for consumer names (for testing)
`retry.max_attempts`	int	3	Maximum retry attempts
`retry.initial_delay`	string	"1s"	Initial delay before first retry
`retry.max_delay`	string	"60s"	Maximum delay between retries
`retry.rate_limit_delay`	string	"5s"	Extra wait added before backoff on HTTP 429
`ports`	object	(defaults)	Port configuration

Endpoint Configuration

Endpoints are configured in the top-level model_registry block. In addition to the base fields, each endpoint now supports rate limiting controls:

Field	Type	Required	Description
`url`	string	yes	Base URL for OpenAI-compatible API
`model`	string	yes	Model name for API requests
`api_key_env`	string	no	Environment variable for API key
`requests_per_minute`	int	no	Token bucket rate limit (0 = unlimited)
`max_concurrent`	int	no	Maximum simultaneous in-flight requests (0 = unlimited)

Provider-Specific Notes

Ollama. The /v1/chat/completions layer can't set context size per-request — long prompts silently truncate on the server unless the model is pre-built with a Modelfile. On first request to an Ollama endpoint, agentic-model probes /api/show and emits one WARN if the model's num_ctx is below endpoint.max_tokens. See docs/operations/04-ollama-setup.md for the Modelfile recipe.

Model Aliases

Model aliases provide semantic names for endpoints, allowing other components to reference models by purpose rather than specific endpoint:

{
  "model_aliases": {
    "reasoning": "gpt-4",
    "coding": "gpt-4-turbo",
    "fast": "gpt-3.5-turbo",
    "summarization": "gpt-3.5-turbo"
  }
}

Alias rules:

Target must exist in endpoints
No alias chaining (alias cannot point to another alias)
Empty target is not allowed

Requests can use either endpoint names or aliases:

{
  "model": "fast"
}

Resolves to the gpt-3.5-turbo endpoint.

Ports

Inputs

Name	Type	Subject	Description
requests	jetstream	agent.request.>	Agent requests from agentic-loop

Outputs

Name	Type	Subject	Description
responses	jetstream	agent.response.*	Model responses to agentic-loop

Endpoint Resolution

Requests are routed to endpoints by model name:

Exact match: Request model: "gpt-4" routes to endpoints.gpt-4
Alias match: Request model: "fast" routes to model_aliases.fast target
Default fallback: If no match, routes to endpoints.default (if configured)
Error: If no match and no default, returns error response

Compatible Providers

Provider	URL Format	Notes
OpenAI	`https://api.openai.com/v1/chat/completions`	Requires API key
Azure OpenAI	`https://{resource}.openai.azure.com/...`	Requires API key
Ollama	`http://localhost:11434/v1/chat/completions`	No auth required
LiteLLM	`http://localhost:8000/v1/chat/completions`	Proxy for multiple providers
vLLM	`http://localhost:8000/v1/chat/completions`	Self-hosted models
LocalAI	`http://localhost:8080/v1/chat/completions`	Local models

Response Status Mapping

OpenAI Finish Reason	Agentic Status	Description
`stop`	`complete`	Normal completion
`length`	`complete`	Max tokens reached
`tool_calls`	`tool_call`	Model wants to use tools
(error)	`error`	API or network error

Environment Variables

API keys are read from environment variables:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="..."

Reference in config:

{
  "api_key_env": "OPENAI_API_KEY"
}

Retry Behavior

Retries are implemented using pkg/retry with exponential backoff and jitter.

Max Attempts: Default 3, configurable via retry.max_attempts
Initial Delay: Default 1s (retry.initial_delay); tests use 100ms for speed
Max Delay: Default 60s (retry.max_delay); each interval doubles with jitter
HTTP 429 Handling: Detected via SDK error types (openai.APIError.HTTPStatusCode, openai.RequestError.HTTPStatusCode). An extra rate_limit_delay (default 5s) is added before the normal backoff begins, giving the provider time to recover
Retryable: HTTP 429, 500, 502, 503, 504, and network errors
Non-retryable: HTTP 400, 401, 403, 404, and context cancellation

Rate Limiting

When running agent teams or quests, multiple agents concurrently target the same endpoint. Without coordination, they can saturate the provider's rate limit within seconds, causing cascading 429 errors that waste time in retry loops. Per-endpoint throttling solves this by enforcing limits before requests leave the process.

Each endpoint in the model registry can be configured with two complementary controls:

requests_per_minute: A token bucket that caps the request rate. Agents block until a token is available rather than racing to the endpoint.
max_concurrent: A semaphore that caps simultaneous in-flight requests. Useful for providers that enforce concurrency limits independently of rate limits.

Both controls are shared across all agents targeting the same endpoint — the throttle is instantiated once per cached client, not per agent.

Configuration Example

{
  "model_registry": {
    "endpoints": {
      "gpt-4": {
        "url": "https://api.openai.com/v1/chat/completions",
        "model": "gpt-4-turbo-preview",
        "api_key_env": "OPENAI_API_KEY",
        "requests_per_minute": 60,
        "max_concurrent": 5
      },
      "ollama": {
        "url": "http://localhost:11434/v1/chat/completions",
        "model": "qwen2.5-coder:14b"
      }
    }
  }
}

The ollama endpoint has no limits configured — local models typically do not need them.

When to Use

Scenario	Recommended Setting
Single agent, low traffic	No limits needed
Agent team (quest), shared OpenAI key	`requests_per_minute` matching your tier
Shared API key across multiple services	Both `requests_per_minute` and `max_concurrent`
Local model (Ollama, vLLM)	No limits needed

Observability

Rate limit events are tracked via:

semstreams_agentic_model_rate_limit_hits_total{model="gpt-4"}

This counter increments each time a request encounters an HTTP 429. A sustained high rate suggests the configured requests_per_minute is too close to the actual provider limit and should be reduced.

Troubleshooting

Endpoint not found

Verify model name in request matches an endpoint key
Add a "default" endpoint as fallback
Check for typos in model names

Authentication errors

Verify api_key_env points to correct environment variable
Check that environment variable is set
Ensure API key is valid and has quota

Timeout errors

Increase timeout for complex requests
Check network connectivity to endpoint
Verify endpoint URL is correct

Tool calls not working

Ensure tools are provided in AgentRequest
Check tool parameter schemas are valid JSON Schema
Verify model supports tool calling

agentic-loop - Loop orchestration
agentic-tools - Tool execution
agentic types - Shared type definitions

Documentation ¶

Overview ¶

Package agenticmodel provides an OpenAI-compatible agentic model processor component that routes agent requests to configured LLM endpoints with retry logic and tool calling support.

Package agenticmodel provides the model integration processor for the SemStreams agentic system.

Overview ¶

The agentic-model processor routes agent requests to OpenAI-compatible LLM endpoints. It receives AgentRequest messages from the loop orchestrator, calls the appropriate model endpoint, and publishes AgentResponse messages back. The processor supports tool calling, retry with backoff, and token tracking. Model endpoints are resolved from the unified model registry (component.Dependencies.ModelRegistry).

This processor acts as the bridge between the agentic orchestration layer and external LLM services (OpenAI, Ollama, LiteLLM, vLLM, or any OpenAI-compatible API).

Architecture ¶

The model processor sits between the loop orchestrator and external LLM services:

┌───────────────┐     ┌────────────────┐     ┌──────────────────┐
│ agentic-loop  │────▶│ agentic-model  │────▶│ LLM Endpoint     │
│               │     │ (this pkg)     │     │ (OpenAI, Ollama) │
│               │◀────│                │◀────│                  │
└───────────────┘     └────────────────┘     └──────────────────┘
  agent.request.*       HTTP/HTTPS          OpenAI-compatible
  agent.response.*                          /v1/chat/completions

Quick Start ¶

Configure the model registry in the top-level config and start the processor:

config := agenticmodel.Config{
    StreamName: "AGENT",
    Timeout:    "120s",
}

// Model endpoints are resolved from deps.ModelRegistry (set in config.model_registry)
rawConfig, _ := json.Marshal(config)
comp, err := agenticmodel.NewComponent(rawConfig, deps)

lc := comp.(component.LifecycleComponent)
lc.Initialize()
lc.Start(ctx)

Endpoint Resolution ¶

When processing an AgentRequest, the processor resolves the endpoint from the unified model registry by looking up the request's Model field. Clients are created dynamically and cached for reuse.

If the resolved endpoint has SupportsTools=false, any tools in the request are stripped and a warning is logged.

OpenAI Compatibility ¶

The processor uses the sashabaranov/go-openai SDK and is compatible with any API that implements the OpenAI chat completions interface:

OpenAI API (api.openai.com)
Azure OpenAI Service
Ollama (with OpenAI compatibility layer)
LiteLLM proxy
vLLM with OpenAI server
LocalAI
Any OpenAI-compatible proxy

Tool Support ¶

The processor fully supports tool calling (function calling):

Incoming request with tools:

request := agentic.AgentRequest{
    Model: "gpt-4",
    Messages: []agentic.ChatMessage{
        {Role: "user", Content: "Read the config file"},
    },
    Tools: []agentic.ToolDefinition{
        {
            Name:        "read_file",
            Description: "Read file contents",
            Parameters: map[string]any{
                "type": "object",
                "properties": map[string]any{
                    "path": map[string]any{"type": "string"},
                },
            },
        },
    },
}

Response with tool calls:

response := agentic.AgentResponse{
    Status: "tool_call",
    Message: agentic.ChatMessage{
        Role: "assistant",
        ToolCalls: []agentic.ToolCall{
            {ID: "call_001", Name: "read_file", Arguments: map[string]any{"path": "config.yaml"}},
        },
    },
}

The processor converts between agentic.ToolDefinition and OpenAI's function schema format automatically.

Response Status Mapping ¶

The processor maps OpenAI finish reasons to agentic status:

"stop" → "complete" (normal completion)
"length" → "complete" (max tokens reached)
"tool_calls" → "tool_call" (model wants to use tools)
Any error → "error" with error message

Retry Logic ¶

The processor implements retry using pkg/retry with exponential backoff and jitter.

Default: 3 attempts, 1s initial delay, 60s max delay
Tests use 100ms initial delay for fast feedback
HTTP 429: Detected via openai.APIError.HTTPStatusCode and openai.RequestError.HTTPStatusCode. An extra rate_limit_delay (default 5s) is prepended before normal backoff begins.
Retryable: 429, 500, 502, 503, 504, and network errors
Non-retryable: 400, 401, 403, 404, context cancellation

Configuration:

"retry": {
    "max_attempts": 3,
    "initial_delay": "1s",
    "max_delay": "60s",
    "rate_limit_delay": "5s"
}

Token Tracking ¶

Every response includes token usage for cost monitoring and rate limiting:

response.TokenUsage.PromptTokens     // Input tokens
response.TokenUsage.CompletionTokens // Output tokens
response.TokenUsage.Total()          // Sum of both

Token counts come directly from the LLM provider's response.

Configuration Reference ¶

Full configuration schema (endpoints are in the top-level model_registry):

{
    "timeout": "string (default: 120s)",
    "stream_name": "string (default: AGENT)",
    "consumer_name_suffix": "string (optional)",
    "retry": {
        "max_attempts": "int (default: 3)",
        "initial_delay": "string (default: 1s)",
        "max_delay": "string (default: 60s)",
        "rate_limit_delay": "string (default: 5s)"
    },
    "ports": {
        "inputs": [...],
        "outputs": [...]
    }
}

Endpoint-level fields in model_registry:

{
    "url": "string",
    "model": "string",
    "api_key_env": "string (optional)",
    "requests_per_minute": "int (0 = unlimited)",
    "max_concurrent": "int (0 = unlimited)"
}

Ports ¶

Input ports (JetStream consumers):

agent.request: Agent requests from agentic-loop (subject: agent.request.>)

Output ports (JetStream publishers):

agent.response: Model responses to agentic-loop (subject: agent.response.*)

Message Flow ¶

The processor handles each request through:

Receive AgentRequest from agent.request.>
Resolve endpoint by model name
Acquire throttle slot (rate limiter token + concurrency semaphore)
Convert AgentRequest to OpenAI format
Call LLM endpoint with retry logic
Release throttle slot
Convert OpenAI response to AgentResponse
Publish to agent.response.{request_id}
Acknowledge JetStream message

Client Architecture ¶

The processor dynamically creates and caches Client instances per endpoint:

client, err := NewClient(endpointConfig)
response, err := client.ChatCompletion(ctx, request)

Clients are cached by URL|Model key with mutex protection for concurrent access. Clients wrap the go-openai SDK and handle:

API key injection from environment variables
Request/response type conversion
Retry with exponential backoff
Context cancellation propagation

Error Handling ¶

Errors are returned as AgentResponse with status="error":

response := agentic.AgentResponse{
    RequestID: "req_123",
    Status:    "error",
    Error:     "endpoint not found: unknown-model",
}

Error categories:

Endpoint resolution errors: Model not found in registry
Request validation errors: Invalid request format
Network errors: Connection failures (may retry)
API errors: 4xx/5xx from LLM provider
Timeout errors: Request exceeded timeout

Environment Variables ¶

API keys are read from environment variables specified in endpoint config:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="..."

Endpoint config:

{
    "url": "https://api.openai.com/v1/chat/completions",
    "model": "gpt-4",
    "api_key_env": "OPENAI_API_KEY"
}

If api_key_env is not specified, requests are made without authentication (suitable for local models like Ollama).

Thread Safety ¶

The Component is safe for concurrent use after Start() is called. Multiple goroutines can process requests concurrently. Each request creates its own Client instance, avoiding shared state issues.

Testing ¶

For testing, use the ConsumerNameSuffix config option and provide a model registry:

config := agenticmodel.Config{
    StreamName:         "AGENT",
    ConsumerNameSuffix: "test-" + t.Name(),
}

// Provide endpoints via model registry in deps.ModelRegistry
deps.ModelRegistry = &model.Registry{
    Endpoints: map[string]*model.EndpointConfig{
        "test-model": {URL: mockServer.URL, Model: "test-model", MaxTokens: 128000},
    },
    Defaults: model.DefaultsConfig{Model: "test-model"},
}

Use httptest.Server to mock the LLM endpoint in tests.

Limitations ¶

Current limitations:

Responses are complete documents; streaming is not supported
Retry configuration (max_attempts, delays) is global, not per-endpoint

Index ¶

func NewComponent(rawConfig json.RawMessage, deps component.Dependencies) (component.Discoverable, error)
func NewComponentWithOptions(rawConfig json.RawMessage, deps component.Dependencies, opts ...Option) (component.Discoverable, error)
func Register(registry RegistryInterface) error
type ChunkHandler
type Client
- func NewClient(endpoint *model.EndpointConfig) (*Client, error)
- func (c *Client) ChatCompletion(ctx context.Context, req agentic.AgentRequest) (agentic.AgentResponse, error)
- func (c *Client) Close() error
- func (c *Client) SetAdapter(a ProviderAdapter)
- func (c *Client) SetChunkHandler(handler ChunkHandler)
- func (c *Client) SetLogger(l *slog.Logger)
- func (c *Client) SetMetrics(m *modelMetrics)
- func (c *Client) SetResponsesAdapter(a ResponsesAdapter)
- func (c *Client) SetRetryConfig(cfg RetryConfig)
- func (c *Client) SetThrottle(t *EndpointThrottle)
type Component
- func (c *Component) ConfigSchema() component.ConfigSchema
- func (c *Component) DataFlow() component.FlowMetrics
- func (c *Component) Health() component.HealthStatus
- func (c *Component) HealthPolicy() model.HealthPolicy
- func (c *Component) Initialize() error
- func (c *Component) InputPorts() []component.Port
- func (c *Component) Meta() component.Metadata
- func (c *Component) OutputPorts() []component.Port
- func (c *Component) Start(ctx context.Context) error
- func (c *Component) Stop(timeout time.Duration) error
type Config
- func DefaultConfig() Config
- func (c *Config) Validate() error
type EndpointThrottle
- func NewEndpointThrottle(requestsPerMinute, maxConcurrent int) *EndpointThrottle
- func (t *EndpointThrottle) Acquire(ctx context.Context) error
- func (t *EndpointThrottle) Release()
type GeminiAdapter
- func (a *GeminiAdapter) Name() string
- func (a *GeminiAdapter) NormalizeMessages(messages []wire.Message) []wire.Message
- func (a *GeminiAdapter) NormalizeRequest(_ *wire.ChatCompletionRequest)
- func (a *GeminiAdapter) NormalizeResponse(resp *wire.ChatCompletionResponse)
- func (a *GeminiAdapter) NormalizeStreamDelta(delta wire.ToolCall, lastIndex int) int
type GenericAdapter
- func (a *GenericAdapter) Name() string
- func (a *GenericAdapter) NormalizeMessages(messages []wire.Message) []wire.Message
- func (a *GenericAdapter) NormalizeRequest(_ *wire.ChatCompletionRequest)
- func (a *GenericAdapter) NormalizeResponse(_ *wire.ChatCompletionResponse)
- func (a *GenericAdapter) NormalizeStreamDelta(delta wire.ToolCall, lastIndex int) int
type OllamaAdapter
- func (a *OllamaAdapter) Name() string
type OpenAIAdapter
- func (a *OpenAIAdapter) Name() string
- func (a *OpenAIAdapter) NormalizeMessages(messages []wire.Message) []wire.Message
- func (a *OpenAIAdapter) NormalizeRequest(_ *wire.ChatCompletionRequest)
- func (a *OpenAIAdapter) NormalizeResponse(_ *wire.ChatCompletionResponse)
- func (a *OpenAIAdapter) NormalizeStreamDelta(delta wire.ToolCall, _ int) int
type OpenAIResponsesAdapter
- func (a *OpenAIResponsesAdapter) CaptureReasoningRecords(resp *responses.Response) []agentic.ReasoningRecord
- func (a *OpenAIResponsesAdapter) EchoReasoningRecords(records []agentic.ReasoningRecord) []responses.InputItem
- func (a *OpenAIResponsesAdapter) Name() string
- func (a *OpenAIResponsesAdapter) NormalizeRequest(_ *responses.Request)
type Option
- func WithHealthPolicy(p model.HealthPolicy) Option
type ProviderAdapter
- func AdapterFor(provider string) ProviderAdapter
type RegistryInterface
type ResponsesAdapter
- func ResponsesAdapterFor(provider string) ResponsesAdapter
type RetryConfig
- func (r *RetryConfig) Validate() error
type StreamChunk

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewComponent ¶

func NewComponent(rawConfig json.RawMessage, deps component.Dependencies) (component.Discoverable, error)

NewComponent creates a new agentic-model processor component

func NewComponentWithOptions ¶

func NewComponentWithOptions(rawConfig json.RawMessage, deps component.Dependencies, opts ...Option) (component.Discoverable, error)

NewComponentWithOptions is the option-aware constructor. The factory path (NewComponent) calls this with no options, yielding the default in-process RollingWindowBreaker for circuit breaking. Tests and callers wanting a custom HealthPolicy use this directly.

func Register ¶

func Register(registry RegistryInterface) error

Register registers the agentic-model processor component with the given registry

Types ¶

type ChunkHandler ¶

type ChunkHandler func(chunk StreamChunk)

ChunkHandler is a callback for receiving streaming deltas. Implementations must be safe for concurrent use if the handler is shared.

type Client ¶

type Client struct {
	// contains filtered or unexported fields
}

Client wraps OpenAI SDK for agentic model requests

func NewClient ¶

func NewClient(endpoint *model.EndpointConfig) (*Client, error)

NewClient creates a new client for the given endpoint configuration. The default retry config is suitable for unit tests (3 attempts, 100ms initial delay). Call SetRetryConfig before ChatCompletion to apply production settings.

func (*Client) ChatCompletion ¶

func (c *Client) ChatCompletion(ctx context.Context, req agentic.AgentRequest) (agentic.AgentResponse, error)

ChatCompletion sends a chat completion request with retry and throttling.

Retry strategy uses two independent backoff curves:

Transient errors (5xx, network): exponential from InitialDelay, up to MaxAttempts
Rate limits (429): exponential from RateLimitDelay, up to MaxRateLimitRetries

Both curves cap at MaxDelay and respect ctx cancellation at every wait point.

When endpoint.WireBackend = "wire" (ADR-037), the call routes through the wire-native ChatCompletion path. When endpoint.WireBackend = "responses" (ADR-051), the call routes through the Responses path. Otherwise the SDK path runs.

func (*Client) Close ¶

func (c *Client) Close() error

Close releases resources held by the client

func (*Client) SetAdapter ¶

func (c *Client) SetAdapter(a ProviderAdapter)

SetAdapter sets the provider-specific adapter for normalizing requests and responses. When not set, buildChatRequest falls back to GenericAdapter.

func (*Client) SetChunkHandler ¶

func (c *Client) SetChunkHandler(handler ChunkHandler)

SetChunkHandler sets the callback for receiving streaming deltas.

func (*Client) SetLogger ¶

func (c *Client) SetLogger(l *slog.Logger)

SetLogger sets the logger for debug-level request/response logging.

func (*Client) SetMetrics ¶

func (c *Client) SetMetrics(m *modelMetrics)

SetMetrics sets the metrics instance for recording streaming metrics.

func (*Client) SetResponsesAdapter ¶

func (c *Client) SetResponsesAdapter(a ResponsesAdapter)

SetResponsesAdapter sets the Responses-path adapter. Used when endpoint.WireBackend = "responses" (ADR-051). When not set, the dispatch falls back to the default OpenAIResponsesAdapter.

func (*Client) SetRetryConfig ¶

func (c *Client) SetRetryConfig(cfg RetryConfig)

SetRetryConfig replaces the default retry configuration. Call this after NewClient to apply production settings.

func (*Client) SetThrottle ¶

func (c *Client) SetThrottle(t *EndpointThrottle)

SetThrottle attaches a rate/concurrency limiter to this client.

type Component ¶

type Component struct {
	// contains filtered or unexported fields
}

Component implements the agentic-model processor

func (*Component) ConfigSchema ¶

func (c *Component) ConfigSchema() component.ConfigSchema

ConfigSchema returns the configuration schema

func (*Component) DataFlow ¶

func (c *Component) DataFlow() component.FlowMetrics

DataFlow returns current data flow metrics

func (*Component) Health ¶

func (c *Component) Health() component.HealthStatus

Health returns the current health status

func (*Component) HealthPolicy ¶

func (c *Component) HealthPolicy() model.HealthPolicy

HealthPolicy returns the active health policy. Useful for callers that want to consult endpoint state from outside agentic-model (e.g., a sibling dispatcher consulting circuit-breaker state before routing) without going through the registry interface.

func (*Component) Initialize ¶

func (c *Component) Initialize() error

Initialize prepares the component (no-op for this component)

func (*Component) InputPorts ¶

func (c *Component) InputPorts() []component.Port

InputPorts returns configured input port definitions

func (*Component) Meta ¶

func (c *Component) Meta() component.Metadata

Meta returns component metadata

func (*Component) OutputPorts ¶

func (c *Component) OutputPorts() []component.Port

OutputPorts returns configured output port definitions

func (*Component) Start ¶

func (c *Component) Start(ctx context.Context) error

Start begins processing agent requests

func (*Component) Stop ¶

func (c *Component) Stop(timeout time.Duration) error

Stop gracefully stops the component within the given timeout

type Config ¶

type Config struct {
	Ports                *component.PortConfig `json:"ports"                schema:"type:ports,description:Port configuration,category:basic"`
	StreamName           string                `` /* 132-byte string literal not displayed */
	ConsumerNameSuffix   string                `` /* 127-byte string literal not displayed */
	DeleteConsumerOnStop bool                  `` /* 157-byte string literal not displayed */
	Timeout              string                `` /* 377-byte string literal not displayed */
	Retry                RetryConfig           `json:"retry"                schema:"type:object,description:Retry configuration,category:advanced"`
}

Config holds configuration for agentic-model processor component. Model endpoints are resolved from the unified model registry (component.Dependencies.ModelRegistry).

func DefaultConfig ¶

func DefaultConfig() Config

DefaultConfig returns default configuration for agentic-model processor

func (*Config) Validate ¶

func (c *Config) Validate() error

Validate checks the configuration for errors

type EndpointThrottle ¶

type EndpointThrottle struct {
	// contains filtered or unexported fields
}

EndpointThrottle controls request rate to a single API endpoint.

It combines two mechanisms:

A token bucket that limits requests per minute (if RequestsPerMinute > 0).
A semaphore that caps concurrent in-flight requests (if MaxConcurrent > 0).

Both limits are applied on every Acquire call. Either limit may be disabled independently by setting its value to 0.

Throttle instances are shared across all clients that target the same endpoint URL + model pair, so the limits are enforced across the whole agent team.

func NewEndpointThrottle ¶

func NewEndpointThrottle(requestsPerMinute, maxConcurrent int) *EndpointThrottle

NewEndpointThrottle creates a throttle for the given limits. requestsPerMinute == 0 disables rate limiting. maxConcurrent == 0 disables concurrency limiting.

func (*EndpointThrottle) Acquire ¶

func (t *EndpointThrottle) Acquire(ctx context.Context) error

Acquire blocks until a request slot is available or the context is cancelled. The caller must call Release when the request completes.

func (*EndpointThrottle) Release ¶

func (t *EndpointThrottle) Release()

Release returns a concurrency slot to the semaphore. It is a no-op when MaxConcurrent == 0.

type GeminiAdapter ¶

type GeminiAdapter struct{}

GeminiAdapter normalizes payloads for Google's Gemini OpenAI-compatible endpoint. Gemini's endpoint is broadly compatible but has several quirks that cause 400 errors.

As of ADR-037 chunk 8, GeminiAdapter is Extras-aware for the Gemini 3.x preview thought_signature flow:

NormalizeResponse extracts each tool_call's extra_content.google.thought_signature into the framework-internal carrier key wire.ToolCall.Extras[wireKeyC360ThoughtSignature] for convertWireResponse to lift into agentic.ToolCall.Metadata.
NormalizeMessages reconstructs the extra_content shape on the outgoing wire ToolCall for the FIRST tool_call per assistant message (Gemini's "first call per step" contract).

The adapter only runs when endpoint.Provider == "gemini" (gated by AdapterFor). Non-Gemini paths leave the carrier untouched; the stripC360KeysFromRequest hygiene step in buildWireRequest removes it before send to avoid leaking the framework-internal key.

func (*GeminiAdapter) Name ¶

func (a *GeminiAdapter) Name() string

Name returns "gemini".

func (*GeminiAdapter) NormalizeMessages ¶

func (a *GeminiAdapter) NormalizeMessages(messages []wire.Message) []wire.Message

NormalizeMessages fixes Gemini-specific message constraints and reconstructs the per-tool_call thought_signature shape on outgoing assistant messages:

Tool result messages require a non-empty name field. Without it: 400 INVALID_ARGUMENT: function_response.name cannot be empty.
Assistant messages with tool_calls require a non-empty content field. Gemini rejects a completely absent content — single space is the conventional workaround.
Assistant messages with the framework-internal wireKeyC360ThoughtSignature carrier on their tool_calls get the Gemini-shape extra_content reconstructed. Only the FIRST tool_call per message carries the signature (Gemini 3.x contract — subsequent calls in the same step inherit). The carrier key is deleted in place to avoid double-send.

func (*GeminiAdapter) NormalizeRequest ¶

func (a *GeminiAdapter) NormalizeRequest(_ *wire.ChatCompletionRequest)

NormalizeRequest is a no-op for Gemini today. NormalizeMessages owns the thought_signature reconstruction so it runs per-message during request building.

func (*GeminiAdapter) NormalizeResponse ¶

func (a *GeminiAdapter) NormalizeResponse(resp *wire.ChatCompletionResponse)

NormalizeResponse extracts the per-tool_call thought_signature from Gemini's extra_content.google.thought_signature shape into the framework-internal carrier key wireKeyC360ThoughtSignature. The generic convertWireResponse then lifts the carrier into agentic.ChatMessage.ReasoningRecords as a ReasoningRecord{Provider:"google", CarrierKind:ToolCall, ToolCallID:...} so the signature flows across loop turns without Gemini-specific code in the agentic layer (ADR-051).

Safe to call universally — adapters that don't see extra_content (every non-Gemini provider) read no carrier and write nothing.

func (*GeminiAdapter) NormalizeStreamDelta ¶

func (a *GeminiAdapter) NormalizeStreamDelta(delta wire.ToolCall, lastIndex int) int

NormalizeStreamDelta infers the tool call index when Gemini omits it. Gemini streaming deltas never include an index field. Instead:

A non-empty ID signals the start of a new tool call → return -1 (sentinel: caller must allocate the next available index via nextToolIndex).
An empty ID is an argument continuation → reuse lastIndex.

type GenericAdapter ¶

type GenericAdapter struct{}

GenericAdapter applies cross-provider safe normalizations that are either required by multiple providers or harmless for all known providers. It is the fallback when no provider-specific adapter is registered.

func (*GenericAdapter) Name ¶

func (a *GenericAdapter) Name() string

Name returns "generic".

func (*GenericAdapter) NormalizeMessages ¶

func (a *GenericAdapter) NormalizeMessages(messages []wire.Message) []wire.Message

NormalizeMessages applies normalizations that are safe across all providers:

Tool result messages get a non-empty name field. The name field is optional in the OpenAI spec but required by Gemini. Setting it universally is harmless.
Assistant messages with tool_calls get a non-empty content field. Gemini rejects absent content; setting it to a single space is a widely-used convention (LiteLLM, OpenAI proxy, etc.) and accepted by all known providers.
Consecutive messages with the same role are collapsed into one by joining their content with "\n\n". Several providers (Anthropic and some Gemini compatibility layers, including OpenRouter routes that bridge to them) require strict role alternation and reject consecutive same-role messages. Tool messages and assistant messages with tool_calls are preserved as-is — each carries identity-bearing fields (ToolCallID, ToolCalls) that cannot be merged without breaking tool-pair invariants.

reasoning_content omission is handled structurally during message conversion (the field is never copied into the outgoing wire.Message on the request path).

func (*GenericAdapter) NormalizeRequest ¶

func (a *GenericAdapter) NormalizeRequest(_ *wire.ChatCompletionRequest)

NormalizeRequest is a no-op for the generic adapter.

func (*GenericAdapter) NormalizeResponse ¶

func (a *GenericAdapter) NormalizeResponse(_ *wire.ChatCompletionResponse)

NormalizeResponse is a no-op for the generic adapter.

func (*GenericAdapter) NormalizeStreamDelta ¶

func (a *GenericAdapter) NormalizeStreamDelta(delta wire.ToolCall, lastIndex int) int

NormalizeStreamDelta infers the tool call index when the provider omits it. When an explicit index is provided, it is used directly. When absent, a non-empty ID signals a new tool call (return -1 sentinel so the accumulator allocates the next index), and an empty ID is an argument continuation (reuse lastIndex). This matches the behavior required by Gemini and is harmless for providers that always supply an explicit index.

type OllamaAdapter ¶

type OllamaAdapter struct {
	GenericAdapter
}

OllamaAdapter routes through Ollama's OpenAI-compatible /v1/chat/completions endpoint and inherits GenericAdapter's safe cross-provider message normalizations (tool name/content fallbacks, same-role collapse). When agentic.AgentRequest.ResponseFormat is non-nil, chunk 2's applyResponseFormat has already plumbed it onto chatReq.ResponseFormat — Ollama's /v1 layer accepts the OpenAI shape and is the documented structured-output workaround (https://docs.ollama.com/capabilities/structured-outputs).

CHUNK-3B DEFERRED — Ollama's NATIVE /api/chat endpoint accepts a top-level `format` field that constrains decoding via grammar rather than translating `response_format`, and per gh#10001 is more reliable on small/local models (gemma3 reportedly ignores `response_format` on /v1). The native path requires a separate HTTP client and request-shape translation that does not fit the SDK adapter pattern. Chunk 5 (integration test against real Ollama) gates whether chunk 3b ships:

If integration confirms /v1 + response_format is reliable for the models semspec actually deploys (qwen, deepseek), chunk 3b is unnecessary.
If integration shows wedge classes the OpenAI-shape can't address, chunk 3b becomes a separate ADR — likely a new EndpointConfig.Mode flag ("openai-compat" | "ollama-native") with a /api/chat client behind it.

See ADR-034 "What's deferred" for the full rationale.

func (*OllamaAdapter) Name ¶

func (a *OllamaAdapter) Name() string

Name returns "ollama".

type OpenAIAdapter ¶

type OpenAIAdapter struct{}

OpenAIAdapter handles OpenAI-specific features. OpenAI's endpoint is the reference implementation — most fields behave as documented, so this adapter is mostly a no-op extension point.

func (*OpenAIAdapter) Name ¶

func (a *OpenAIAdapter) Name() string

Name returns "openai".

func (*OpenAIAdapter) NormalizeMessages ¶

func (a *OpenAIAdapter) NormalizeMessages(messages []wire.Message) []wire.Message

NormalizeMessages returns the messages unchanged; OpenAI requires no quirk fixes.

func (*OpenAIAdapter) NormalizeRequest ¶

func (a *OpenAIAdapter) NormalizeRequest(_ *wire.ChatCompletionRequest)

NormalizeRequest is a no-op for OpenAI; reasoning_effort is set directly from endpoint config in buildChatRequest and handled natively by OpenAI.

func (*OpenAIAdapter) NormalizeResponse ¶

func (a *OpenAIAdapter) NormalizeResponse(_ *wire.ChatCompletionResponse)

NormalizeResponse is a no-op for OpenAI.

func (*OpenAIAdapter) NormalizeStreamDelta ¶

func (a *OpenAIAdapter) NormalizeStreamDelta(delta wire.ToolCall, _ int) int

NormalizeStreamDelta uses the explicit index OpenAI always provides.

type OpenAIResponsesAdapter ¶

type OpenAIResponsesAdapter struct{}

OpenAIResponsesAdapter is the OpenAI implementation of ResponsesAdapter. Capture/echo cover the {type:"reasoning", id, encrypted_content} item shape that OpenAI's /v1/responses surfaces in stateless (store=false) mode.

func (*OpenAIResponsesAdapter) CaptureReasoningRecords ¶

func (a *OpenAIResponsesAdapter) CaptureReasoningRecords(resp *responses.Response) []agentic.ReasoningRecord

CaptureReasoningRecords extracts {type:"reasoning"} output items from the Response into ReasoningRecord entries. Each captured record carries the encrypted_content blob as Opaque bytes and the summary's first text part as SummaryText (for trajectory / trace consumers; safe to log).

Non-reasoning output items are ignored. The slice is preserved in OutputItem order so cross-turn echo can preserve relative position.

func (*OpenAIResponsesAdapter) EchoReasoningRecords ¶

func (a *OpenAIResponsesAdapter) EchoReasoningRecords(records []agentic.ReasoningRecord) []responses.InputItem

EchoReasoningRecords reverses CaptureReasoningRecords: each matching record becomes a reasoning InputItem with id and encrypted_content set. Records carrying a non-empty SummaryText emit a single summary_text part — providers tolerate a missing summary, so empty SummaryText omits the field entirely.

Records with Provider != "openai" are skipped; the adapter only echoes its own provider's blobs, leaving cross-provider records (e.g. Gemini ToolCall-carrier records) untouched for the ChatCompletion adapter to handle on a different request.

func (*OpenAIResponsesAdapter) Name ¶

func (a *OpenAIResponsesAdapter) Name() string

Name returns "openai".

func (*OpenAIResponsesAdapter) NormalizeRequest ¶

func (a *OpenAIResponsesAdapter) NormalizeRequest(_ *responses.Request)

NormalizeRequest is a no-op today. Placeholder for future per- request shape fixes (the OpenAI Responses surface is stable enough today that no normalization is required).

type Option ¶

type Option func(*Component)

Option configures optional Component behavior at construction time. Pass to NewComponentWithOptions for non-default wiring.

func WithHealthPolicy ¶

func WithHealthPolicy(p model.HealthPolicy) Option

WithHealthPolicy injects a custom HealthPolicy. The default is an in-process RollingWindowBreaker tuned for typical LLM workloads (window=20, min=5, threshold=0.5, cooldown=30s). Override to share circuit-breaker state across processes (e.g., NATS-KV-backed) or to disable circuit breaking with model.NewAlwaysHealthyPolicy().

type ProviderAdapter ¶

type ProviderAdapter interface {
	// Name returns the provider identifier (e.g., "gemini", "openai").
	Name() string

	// NormalizeRequest adjusts the wire request before sending.
	// Called after the generic request is built, before the HTTP call.
	NormalizeRequest(req *wire.ChatCompletionRequest)

	// NormalizeMessages adjusts the message array before sending.
	// Called during request building for message-level fixes.
	NormalizeMessages(messages []wire.Message) []wire.Message

	// NormalizeStreamDelta adjusts a streaming tool call delta.
	// Returns the corrected tool call index, or -1 as a sentinel meaning
	// "allocate the next available index" (used when the provider omits it).
	NormalizeStreamDelta(delta wire.ToolCall, lastIndex int) int

	// NormalizeResponse adjusts the wire response after receiving.
	// Called before the response is converted to AgentResponse.
	NormalizeResponse(resp *wire.ChatCompletionResponse)
}

ProviderAdapter normalizes request/response payloads for a specific LLM provider's OpenAI-compatible endpoint. Adapters handle quirks that would otherwise cause 400 errors or silent data corruption.

As of ADR-037 chunk 6 the interface operates on wire types. Adapter implementations are wire-native; the SDK client path (chunk 6 transition) translates SDK ↔ wire at adapter call boundaries via translate.go. Chunk 7 wires the wire-native client; from chunk 8 onward `NormalizeResponse` gains real responsibility for provider blob translation (Gemini thought_signature, Anthropic, etc.).

func AdapterFor ¶

func AdapterFor(provider string) ProviderAdapter

AdapterFor returns the appropriate adapter for the given provider name. Falls back to GenericAdapter for unknown providers.

Note: "openai" is the umbrella for any OpenAI-API-compatible runtime (vLLM, sparky, OpenRouter, LocalAI, llama.cpp server) — operators set `provider: "openai"` plus the appropriate URL. The two genuine outliers are "gemini" (distinct API surface) and "ollama" (separate adapter for future /api/chat native-format-field path; see ADR-034).

type RegistryInterface ¶

type RegistryInterface interface {
	RegisterWithConfig(component.RegistrationConfig) error
}

RegistryInterface defines the minimal interface needed for registration

type ResponsesAdapter ¶

type ResponsesAdapter interface {
	// Name returns the provider identifier (e.g. "openai").
	Name() string

	// NormalizeRequest adjusts the wire request before sending.
	// Called after translation, before the HTTP call.
	NormalizeRequest(req *responses.Request)

	// CaptureReasoningRecords walks the response's output items,
	// extracts any provider-opaque reasoning blobs, and returns
	// them as ReasoningRecord{CarrierKind:StandaloneItem} entries
	// the caller appends to ChatMessage.ReasoningRecords.
	//
	// Returns nil if the response carries no reasoning items.
	CaptureReasoningRecords(resp *responses.Response) []agentic.ReasoningRecord

	// EchoReasoningRecords reverses CaptureReasoningRecords: walks
	// the caller-supplied records (filtered by Provider), emits
	// corresponding InputItem reasoning entries the request builder
	// inserts into Request.Input alongside the regular items.
	//
	// Per OpenAI's per-step echo rule, reasoning items are echoed
	// in order relative to other input items the caller already
	// authored.
	EchoReasoningRecords(records []agentic.ReasoningRecord) []responses.InputItem
}

ResponsesAdapter is the parallel of ProviderAdapter for the Responses path. It normalizes request/response payloads and streaming events for providers that speak the OpenAI Responses wire shape.

Today OpenAI is the only provider on this surface; the interface exists per ADR-051 D4 to keep the door open for future Responses- compatible providers without reshaping the call sites. Hooks are mostly no-ops; the non-trivial work lives in capture/echo of agentic.ReasoningRecord{CarrierKind:StandaloneItem}.

func ResponsesAdapterFor ¶

func ResponsesAdapterFor(provider string) ResponsesAdapter

ResponsesAdapterFor returns the appropriate ResponsesAdapter for the given provider name. Today OpenAI is the only Responses- compatible provider; unknown providers fall through to the OpenAI adapter (safe default — the wire shape IS the OpenAI spec, so any third-party Responses-compat provider should accept the same capture/echo).

type RetryConfig ¶

type RetryConfig struct {
	MaxAttempts         int    `` /* 145-byte string literal not displayed */
	MaxRateLimitRetries int    `` /* 171-byte string literal not displayed */
	Backoff             string `` /* 139-byte string literal not displayed */
	InitialDelay        string `` /* 132-byte string literal not displayed */
	MaxDelay            string `` /* 155-byte string literal not displayed */
	RateLimitDelay      string `` /* 155-byte string literal not displayed */
}

RetryConfig holds retry configuration

func (*RetryConfig) Validate ¶

func (r *RetryConfig) Validate() error

Validate checks the retry configuration for errors

type StreamChunk ¶

type StreamChunk struct {
	RequestID      string `json:"request_id"`
	ContentDelta   string `json:"content_delta,omitempty"`
	ReasoningDelta string `json:"reasoning_delta,omitempty"`
	Done           bool   `json:"done,omitempty"`
}

StreamChunk represents a single streaming delta for real-time monitoring. Chunks are ephemeral — published via core NATS (fire-and-forget), not JetStream.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

agentic-model

Overview

Architecture

Features

Configuration

Configuration Options

Endpoint Configuration

Provider-Specific Notes

Model Aliases

Ports

Inputs

Outputs

Endpoint Resolution

Compatible Providers

Response Status Mapping

Environment Variables

Retry Behavior

Rate Limiting

Configuration Example

When to Use

Observability

Troubleshooting

Endpoint not found

Authentication errors

Timeout errors

Tool calls not working

Related Components

Documentation ¶

Overview ¶

Overview ¶

Architecture ¶

Quick Start ¶

Endpoint Resolution ¶

OpenAI Compatibility ¶

Tool Support ¶

Response Status Mapping ¶

Retry Logic ¶

Token Tracking ¶

Configuration Reference ¶

Ports ¶

Message Flow ¶

Client Architecture ¶

Error Handling ¶

Environment Variables ¶

Thread Safety ¶

Testing ¶

Limitations ¶

See Also ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func NewComponent ¶

func NewComponentWithOptions ¶

func Register ¶

Types ¶

type ChunkHandler ¶

type Client ¶

func NewClient ¶

func (*Client) ChatCompletion ¶

func (*Client) Close ¶

func (*Client) SetAdapter ¶

func (*Client) SetChunkHandler ¶

func (*Client) SetLogger ¶

func (*Client) SetMetrics ¶

func (*Client) SetResponsesAdapter ¶

func (*Client) SetRetryConfig ¶

func (*Client) SetThrottle ¶

type Component ¶

func (*Component) ConfigSchema ¶

func (*Component) DataFlow ¶

func (*Component) Health ¶

func (*Component) HealthPolicy ¶

func (*Component) Initialize ¶

func (*Component) InputPorts ¶

func (*Component) Meta ¶

func (*Component) OutputPorts ¶

func (*Component) Start ¶

func (*Component) Stop ¶