inference

package
v0.8.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 24, 2026 License: Apache-2.0 Imports: 25 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrDeploymentExists = errors.New("inference: deployment already exists")

ErrDeploymentExists is returned by Create when a deployment with the same name already exists and --force was not specified.

View Source
var ErrDeploymentNotFound = errors.New("inference: deployment not found")

ErrDeploymentNotFound is returned when a named inference deployment does not exist in the store.

Functions

func DetectServerType added in v0.8.1

func DetectServerType(ctx context.Context, baseURL string) string

DetectServerType probes baseURL to determine the server software. Returns "ollama", "llama-server", "openai-compat", or "".

func FormatEndpointDisplay added in v0.8.1

func FormatEndpointDisplay(endpoints []EndpointInfo) string

FormatEndpointDisplay pretty-prints a list of discovered endpoints.

func ValidateName added in v0.8.1

func ValidateName(name string) error

ValidateName checks that name is a safe deployment identifier. It rejects empty strings, path traversal attempts, and shell metacharacters.

Types

type Client added in v0.8.1

type Client struct {
	// GatewayURL is the base URL of the inference gateway (e.g. "http://localhost:8402").
	GatewayURL string

	// HTTP is the underlying transport.  Defaults to http.DefaultTransport.
	HTTP http.RoundTripper
	// contains filtered or unexported fields
}

Client is an http.RoundTripper that transparently encrypts request bodies to an Obol SE inference gateway and optionally decrypts encrypted responses.

The SE public key is fetched lazily on first use and cached for the lifetime of the Client.

func NewClient added in v0.8.1

func NewClient(ctx context.Context, gatewayURL string) (*Client, error)

NewClient creates a Client targeting the given gateway URL and eagerly fetches the SE public key so the first request does not block on the fetch.

func (*Client) Do added in v0.8.1

func (c *Client) Do(req *http.Request) (*http.Response, error)

Do sends req using the client's transport (with SE encryption applied). It is a convenience wrapper around RoundTrip that matches http.Client.Do's signature.

func (*Client) EnableEncryptedReplies added in v0.8.1

func (c *Client) EnableEncryptedReplies(tag string) error

EnableEncryptedReplies generates an ephemeral local key. When set, the client attaches X-Obol-Reply-Pubkey to every encrypted request so the gateway encrypts the response back to this key, and Do() decrypts it transparently before returning.

On non-darwin builds this returns enclave.ErrNotSupported because the decryption half requires the SE; encryption (for the request) is always available.

func (*Client) Pubkey added in v0.8.1

func (c *Client) Pubkey() []byte

Pubkey returns the cached SE public key bytes (65-byte uncompressed P-256). Returns nil if the key has not been fetched yet.

func (*Client) RoundTrip added in v0.8.1

func (c *Client) RoundTrip(req *http.Request) (*http.Response, error)

RoundTrip implements http.RoundTripper. If the request has a non-nil body, it is encrypted to the gateway's SE public key and the Content-Type is set to application/x-obol-encrypted. Non-body requests (GET, HEAD, etc.) are forwarded as-is.

type ContainerManager added in v0.8.1

type ContainerManager struct {
	// contains filtered or unexported fields
}

ContainerManager manages an Ollama Linux container using the apple/container CLI (github.com/apple/container v0.9.0+).

The container runs Ollama on its internal port 11434, mapped to a host-local port (default 11435) so only the gateway process can reach it. No bridge to the external network is provided — the container can receive inference requests from the gateway but cannot initiate outbound connections.

Install the CLI before use:

curl -L -o /tmp/container-installer-signed.pkg \
  https://github.com/apple/container/releases/download/0.9.0/container-installer-signed.pkg
sudo installer -pkg /tmp/container-installer-signed.pkg -target /

func (*ContainerManager) EnsureSystemRunning added in v0.8.1

func (m *ContainerManager) EnsureSystemRunning(ctx context.Context) error

EnsureSystemRunning starts the container system daemon if it is not already active. Safe to call when the daemon is already running.

func (*ContainerManager) Start added in v0.8.1

func (m *ContainerManager) Start(ctx context.Context, image string, cpus, memoryMB int) error

Start pulls the OCI image (if not cached) and runs the Ollama container, then blocks until the Ollama API responds or ctx is cancelled.

Any stale container with the same name is removed before starting.

func (*ContainerManager) Stop added in v0.8.1

func (m *ContainerManager) Stop(ctx context.Context) error

Stop gracefully stops and removes the named container. Returns nil if the container does not exist.

func (*ContainerManager) UpstreamURL added in v0.8.1

func (m *ContainerManager) UpstreamURL() string

UpstreamURL returns the URL where the running Ollama can be reached from the host.

type Deployment added in v0.8.1

type Deployment struct {
	// Name is the human-readable identifier for this deployment.
	// Used as the keychain tag suffix and directory name.
	Name string `json:"name"`

	// EnclaveTag is the macOS keychain application tag for the SE key.
	// Derived from Name if not explicitly set:
	//   "com.obol.inference.<name>"
	EnclaveTag string `json:"enclave_tag"`

	// ListenAddr is the gateway listen address (default ":8402").
	ListenAddr string `json:"listen_addr"`

	// UpstreamURL is the inference backend URL (default "http://localhost:11434").
	UpstreamURL string `json:"upstream_url"`

	// WalletAddress is the USDC payment recipient.
	WalletAddress string `json:"wallet_address"`

	// PricePerRequest is the USDC price per inference call (default "0.001").
	PricePerRequest string `json:"price_per_request"`

	// PricePerMTok is the original per-million-token price when request pricing
	// was derived from the temporary phase-1 approximation.
	PricePerMTok string `json:"price_per_mtok,omitempty"`

	// ApproxTokensPerRequest records the fixed approximation used to derive the
	// charged request price from PricePerMTok.
	ApproxTokensPerRequest int `json:"approx_tokens_per_request,omitempty"`

	// Chain is the x402 payment chain name (e.g. "base", "base-sepolia").
	Chain string `json:"chain"`

	// FacilitatorURL is the x402 facilitator URL.
	FacilitatorURL string `json:"facilitator_url"`

	// VMMode enables running the upstream inference engine inside an Apple
	// Containerization Linux micro-VM instead of pointing at an existing
	// Ollama process.  Requires the apple/container CLI to be installed.
	// See: https://github.com/apple/container
	VMMode bool `json:"vm_mode,omitempty"`

	// VMImage is the OCI image to run (default "ollama/ollama:latest").
	VMImage string `json:"vm_image,omitempty"`

	// VMCPUs is the number of vCPUs to allocate to the VM (default 4).
	VMCPUs int `json:"vm_cpus,omitempty"`

	// VMMemoryMB is the RAM to allocate to the VM in MiB (default 8192).
	VMMemoryMB int `json:"vm_memory_mb,omitempty"`

	// VMHostPort is the host-local port mapped to Ollama's 11434 inside the
	// container (default 11435).  Must not conflict with other deployments.
	VMHostPort int `json:"vm_host_port,omitempty"`

	// TEEType is the Linux TEE backend ("tdx", "snp", "nitro", "stub").
	// Empty means macOS Secure Enclave mode.
	// Mutually exclusive with EnclaveTag-based SE mode on macOS.
	TEEType string `json:"tee_type,omitempty"`

	// ModelHash is the hex-encoded SHA-256 of the model being served.
	// Required when TEEType is set. Bound into the TEE attestation user_data.
	ModelHash string `json:"model_hash,omitempty"`

	// NoPaymentGate disables the built-in x402 payment middleware when the
	// gateway runs behind the cluster's x402 verifier to avoid double-gating.
	NoPaymentGate bool `json:"no_payment_gate,omitempty"`

	// Provenance holds optional metadata about how the model was produced
	// (e.g. autoresearch experiment results). Stored alongside the deployment
	// config and passed to the registration document when selling.
	Provenance *Provenance `json:"provenance,omitempty"`

	// CreatedAt is the RFC3339 timestamp of when this deployment was created.
	CreatedAt string `json:"created_at"`

	// UpdatedAt is the RFC3339 timestamp of the most recent update.
	UpdatedAt string `json:"updated_at,omitempty"`
}

Deployment is a named, persisted inference gateway configuration. A long-lived entity with a stable identity (SE public key) and configurable parameters.

type EndpointInfo added in v0.8.1

type EndpointInfo struct {
	Host       string
	Port       int
	ServerType string
	Models     []ModelInfo
	Healthy    bool
}

EndpointInfo describes a discovered local inference endpoint.

func ProbeEndpoint added in v0.8.1

func ProbeEndpoint(host string, port int) (*EndpointInfo, error)

ProbeEndpoint hits host:port/v1/models and returns discovered info.

func ProbeEndpointContext added in v0.8.1

func ProbeEndpointContext(ctx context.Context, host string, port int) (*EndpointInfo, error)

ProbeEndpointContext is the context-aware version of ProbeEndpoint. It creates a shared HTTP client used for both server type detection and model fetching to avoid redundant connections.

func ScanLocalEndpoints added in v0.8.1

func ScanLocalEndpoints() ([]EndpointInfo, error)

ScanLocalEndpoints probes all common local ports and returns any that respond.

func ScanLocalEndpointsContext added in v0.8.1

func ScanLocalEndpointsContext(ctx context.Context) ([]EndpointInfo, error)

ScanLocalEndpointsContext probes common ports concurrently with context support. All ports are probed in parallel using goroutines; results are collected and returned in the same order as commonPorts.

func (EndpointInfo) BaseURL added in v0.8.1

func (e EndpointInfo) BaseURL() string

BaseURL returns the HTTP base URL for this endpoint.

type Gateway

type Gateway struct {
	// contains filtered or unexported fields
}

Gateway is an x402-enabled reverse proxy for LLM inference with optional Secure Enclave or TEE request encryption and optional container-isolated upstream.

func NewGateway

func NewGateway(cfg GatewayConfig) (*Gateway, error)

NewGateway creates a new inference gateway with the given configuration.

func (*Gateway) Start

func (g *Gateway) Start() error

Start begins serving the gateway. Blocks until the server is shut down.

func (*Gateway) Stop

func (g *Gateway) Stop() error

Stop gracefully shuts down the gateway and any managed container.

type GatewayConfig

type GatewayConfig struct {
	// ListenAddr is the address to listen on (e.g., ":8402").
	ListenAddr string

	// UpstreamURL is the upstream inference service URL (e.g., "http://localhost:11434").
	UpstreamURL string

	// WalletAddress is the USDC recipient address for payments.
	WalletAddress string

	// PricePerRequest is the USDC amount charged per inference request (e.g., "0.001").
	PricePerRequest string

	// Chain is the x402 chain configuration (e.g., x402pkg.ChainBaseMainnet).
	Chain x402pkg.ChainInfo

	// FacilitatorURL is the x402 facilitator service URL.
	FacilitatorURL string

	// VerifyOnly skips blockchain settlement after successful verification.
	// Useful for testing and staging environments where no real funds are involved.
	VerifyOnly bool

	// EnclaveTag is the macOS Secure Enclave keychain application tag used for
	// request decryption.  When non-empty the gateway enables two additional
	// behaviours:
	//
	//   1. GET /v1/enclave/pubkey — returns the SE public key as JSON so that
	//      clients can encrypt their request bodies.
	//
	//   2. Inference endpoints accept Content-Type: application/x-obol-encrypted
	//      bodies.  The gateway decrypts them via the SE private key before
	//      forwarding to the upstream service.  If the request also contains a
	//      X-Obol-Reply-Pubkey header, the response is re-encrypted to the
	//      client's ephemeral key (end-to-end confidentiality).
	//
	// When empty, all enclave functionality is disabled and the gateway
	// operates in plain x402-only mode.
	EnclaveTag string

	// VMMode enables running the upstream inference engine inside an Apple
	// Containerization Linux micro-VM via the apple/container CLI.
	// When true, the gateway starts the container on Start() and stops it on
	// Stop(), overriding UpstreamURL with the container's mapped local port.
	VMMode bool

	// VMImage is the OCI image to run (default "ollama/ollama:latest").
	VMImage string

	// VMCPUs is the number of vCPUs to allocate (default 4).
	VMCPUs int

	// VMMemoryMB is the RAM to allocate in MiB (default 8192).
	VMMemoryMB int

	// VMHostPort is the host-local port mapped from the container's Ollama
	// port 11434 (default 11435).
	VMHostPort int

	// VMBinary is the path to the container CLI binary.
	// Defaults to "container" (PATH lookup).
	VMBinary string

	// TEEType specifies the Linux TEE backend. When non-empty, the gateway
	// uses internal/tee instead of internal/enclave for key management.
	// Valid values: "tdx", "snp", "nitro", "stub".
	// Mutually exclusive with EnclaveTag.
	TEEType string

	// ModelHash is the hex-encoded SHA-256 of the model being served.
	// Required when TEEType is set. Bound into the TEE attestation user_data
	// so verifiers can confirm the model identity.
	ModelHash string

	// NoPaymentGate disables the built-in x402 payment middleware. Use this
	// when the gateway runs behind the cluster's x402 verifier (via Traefik
	// ForwardAuth) to avoid double-gating requests. Enclave/TEE encryption
	// middleware remains active when enabled.
	NoPaymentGate bool
}

GatewayConfig holds configuration for the x402 inference gateway.

type ModelInfo added in v0.8.1

type ModelInfo struct {
	ID      string `json:"id"`
	OwnedBy string `json:"owned_by"`
	Created int64  `json:"created"`
}

ModelInfo describes a single model exposed by an inference server.

func ParseModelsResponse added in v0.8.1

func ParseModelsResponse(data []byte) ([]ModelInfo, error)

ParseModelsResponse parses raw JSON bytes into a slice of ModelInfo. Exported for testing.

type Provenance added in v0.8.1

type Provenance struct {
	Framework    string `json:"framework,omitempty"`    // e.g. "autoresearch"
	MetricName   string `json:"metricName,omitempty"`   // e.g. "val_bpb"
	MetricValue  string `json:"metricValue,omitempty"`  // e.g. "0.9973"
	ExperimentID string `json:"experimentId,omitempty"` // commit hash or UUID
	TrainHash    string `json:"trainHash,omitempty"`    // e.g. "sha256:..."
	ParamCount   string `json:"paramCount,omitempty"`   // e.g. "50000000"
}

Provenance tracks how a model or service was produced. JSON field names use camelCase so the same document can flow through publish.py -> --provenance-file -> ServiceOffer -> agent-registration.json.

type Store added in v0.8.1

type Store struct {
	// contains filtered or unexported fields
}

Store manages named inference deployment configurations on disk. Layout: <configDir>/inference/<name>/config.json

func NewStore added in v0.8.1

func NewStore(configDir string) *Store

NewStore returns a Store rooted at configDir.

func (*Store) Create added in v0.8.1

func (s *Store) Create(d *Deployment, force bool) error

Create persists a new Deployment. Returns ErrDeploymentExists if a deployment with that name is already stored and force is false.

func (*Store) Delete added in v0.8.1

func (s *Store) Delete(name string) error

Delete removes a deployment's config directory from disk. The SE key in the keychain is NOT deleted by this method — call enclave.DeleteKey(d.EnclaveTag) separately if desired.

func (*Store) Get added in v0.8.1

func (s *Store) Get(name string) (*Deployment, error)

Get loads a Deployment by name. Returns ErrDeploymentNotFound if missing.

func (*Store) List added in v0.8.1

func (s *Store) List() ([]*Deployment, error)

List returns all deployment names in alphabetical order.

func (*Store) Update added in v0.8.1

func (s *Store) Update(d *Deployment) error

Update persists changes to an existing Deployment.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL