llm

package

v1.15.0 Latest Latest Go to latest Published: Apr 30, 2026 License: MIT Imports: 10 Imported by: 0

Documentation ¶

Overview ¶

Package llm defines the AgentRunner abstraction used by Engram to delegate semantic comparison of observation pairs to an external LLM CLI tool (e.g., Claude Code or OpenCode). It ships concrete runner implementations and a factory that selects the runner via the ENGRAM_AGENT_CLI env var.

The package is a strict boundary: only cmd/engram/conflicts.go and internal/store/relations.go are permitted to import it. No other package in the Engram codebase imports internal/llm.

Index ¶

Constants
Variables
func BuildPrompt(a, b ObservationSnippet) string
func EstimateScanCost(pairCount int) (inputTokens, outputTokens int)
type AgentRunner
- func NewRunner(name string) (AgentRunner, error)
type ClaudeRunner
- func NewClaudeRunner() *ClaudeRunner
- func (r *ClaudeRunner) Compare(ctx context.Context, prompt string) (Verdict, error)
type ObservationSnippet
type OpenCodeRunner
- func NewOpenCodeRunner() *OpenCodeRunner
- func (r *OpenCodeRunner) Compare(ctx context.Context, prompt string) (Verdict, error)
type Verdict

Constants ¶

View Source

const EstimatedInputTokensPerPair = 300

EstimatedInputTokensPerPair is the estimated number of input tokens consumed per observation pair comparison call. This value was calibrated from live runs against claude haiku with the locked canonical prompt template.

LOCKED: Do not change without updating cost warning output and design docs.

View Source

const EstimatedOutputTokensPerPair = 50

EstimatedOutputTokensPerPair is the estimated number of output tokens produced per observation pair comparison call. The single-line JSON verdict is compact by design (Relation + Confidence + short Reasoning ≤ 200 chars).

LOCKED: Do not change without updating cost warning output and design docs.

Variables ¶

View Source

var (
	// ErrCLINotInstalled is returned when the agent CLI binary is not found in PATH.
	ErrCLINotInstalled = errors.New("agent CLI binary not found in PATH")

	// ErrCLIAuthMissing is returned when the agent CLI is installed but not authenticated.
	ErrCLIAuthMissing = errors.New("agent CLI is not authenticated")

	// ErrTimeout is returned when the agent CLI call exceeds the configured per-pair timeout.
	ErrTimeout = errors.New("agent CLI call exceeded timeout")

	// ErrInvalidJSON is returned when the agent CLI returns output that cannot be parsed
	// as the expected JSON envelope or Verdict JSON.
	ErrInvalidJSON = errors.New("agent CLI returned malformed JSON")

	// ErrUnknownRelation is returned when the LLM verdict contains a relation verb
	// that is not in the locked vocabulary
	// (conflicts_with | supersedes | scoped | related | compatible | not_conflict).
	ErrUnknownRelation = errors.New("agent returned a relation outside the locked vocabulary")
)

View Source

var ErrInvalidRunnerName = errors.New("invalid runner name")

ErrInvalidRunnerName is returned by NewRunner when the name argument does not match a known runner identifier ("claude" | "opencode").

Functions ¶

func BuildPrompt ¶

func BuildPrompt(a, b ObservationSnippet) string

BuildPrompt renders the locked canonical prompt for a pair of observations. The returned string is a single prompt ready to be passed to AgentRunner.Compare.

func EstimateScanCost ¶

func EstimateScanCost(pairCount int) (inputTokens, outputTokens int)

EstimateScanCost returns the estimated total input and output token counts for a semantic scan over pairCount observation pairs.

These are estimates for user-facing cost warnings. Actual token usage depends on observation content length and model variability.

Returns (inputTokens, outputTokens).

Types ¶

type AgentRunner ¶

type AgentRunner interface {
	// Compare sends prompt to the underlying LLM CLI and returns a structured
	// Verdict with the semantic relation between two observations.
	// On error the returned Verdict is zero-value.
	Compare(ctx context.Context, prompt string) (Verdict, error)
}

AgentRunner is the abstraction over an external LLM CLI that performs semantic comparison of two observations. Each runner implementation shells out to a specific CLI tool, parses its output, and returns a structured Verdict.

func NewRunner ¶

func NewRunner(name string) (AgentRunner, error)

NewRunner returns an AgentRunner for the given runner name. Supported values:

"claude" → *ClaudeRunner (shells out to the claude CLI)
"opencode" → *OpenCodeRunner (shells out to the opencode CLI)

For any other value, including the empty string, a descriptive error is returned that names the ENGRAM_AGENT_CLI environment variable and the supported values.

Typical usage (reading from the environment):

runner, err := llm.NewRunner(os.Getenv("ENGRAM_AGENT_CLI"))

type ClaudeRunner ¶

type ClaudeRunner struct {
	// contains filtered or unexported fields
}

ClaudeRunner implements AgentRunner by shelling out to the `claude` CLI. It invokes: claude -p <prompt> --output-format json --model haiku --max-turns 1 and parses the JSON envelope returned by `claude --output-format json`.

func NewClaudeRunner ¶

func NewClaudeRunner() *ClaudeRunner

NewClaudeRunner constructs a ClaudeRunner with the real exec.CommandContext implementation. Tests should inject a fake via the struct field directly.

func (*ClaudeRunner) Compare ¶

func (r *ClaudeRunner) Compare(ctx context.Context, prompt string) (Verdict, error)

Compare sends prompt to the Claude CLI and returns a structured Verdict. Invokes: claude -p --output-format json --model haiku --max-turns 1

Claude's JSON envelope format:

{
  "type":         "result",
  "result":       "<inner JSON or fence-wrapped JSON>",
  "total_cost_usd": ...,
  "modelUsage":   { "<model-id>": { "input_tokens": N, "output_tokens": N } },
  "duration_ms":  N
}

The inner result is parsed as a Verdict JSON object (possibly wrapped in markdown code fences, which are stripped before parsing).

type ObservationSnippet ¶

type ObservationSnippet struct {
	// ID is the sync_id of the observation (e.g. "obs-a1b2c3d4...").
	ID string
	// Title is the observation title.
	Title string
	// Content is the observation body text.
	Content string
	// Type is the observation type label (e.g. "decision", "architecture").
	Type string
	// Project is the project the observation belongs to.
	Project string
}

ObservationSnippet carries the fields from an observation that are embedded into the comparison prompt. Callers should populate all fields for best LLM accuracy; empty fields are tolerated without panics.

type OpenCodeRunner ¶

type OpenCodeRunner struct {
	// contains filtered or unexported fields
}

OpenCodeRunner implements AgentRunner by shelling out to the `opencode` CLI. It invokes: opencode run --format json --pure <prompt> and parses the NDJSON event stream returned on stdout.

func NewOpenCodeRunner ¶

func NewOpenCodeRunner() *OpenCodeRunner

NewOpenCodeRunner constructs an OpenCodeRunner with the real exec.CommandContext implementation. Tests should inject a fake via the struct field directly.

func (*OpenCodeRunner) Compare ¶

func (r *OpenCodeRunner) Compare(ctx context.Context, prompt string) (Verdict, error)

Compare sends prompt to the OpenCode CLI and returns a structured Verdict. Invokes: opencode run --format json --pure (with prompt on stdin)

OpenCode's output is NDJSON (newline-delimited JSON). Each line is a JSON object with a "type" field. The runner scans for events of type "text", extracts ".part.text", and parses that as a Verdict JSON object. If multiple text events exist, the last one wins.

type Verdict ¶

type Verdict struct {
	// Relation is the semantic relation verb returned by the LLM.
	// Must be one of: conflicts_with | supersedes | scoped | related | compatible | not_conflict
	Relation string

	// Confidence is the LLM's self-reported confidence score in [0.0, 1.0].
	Confidence float64

	// Reasoning is the LLM's short explanation (≤200 chars).
	Reasoning string

	// Model is the model identifier captured from the CLI output
	// (e.g., "claude-haiku-4-5"). May be empty if the CLI does not report it.
	Model string

	// DurationMS is the wall-clock duration of the CLI call in milliseconds.
	DurationMS int64
}

Verdict is the parsed output of a single AgentRunner.Compare call. It holds the semantic relation verb, confidence score, model attribution, and timing information captured from the CLI output.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL