context

package
v1.0.0-alpha.23 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2026 License: MIT Imports: 9 Imported by: 0

README

context

Building blocks for context construction in agentic systems.

Overview

The context package implements the "embed context, don't make agents discover it" pattern. It provides utilities for:

  • Token estimation - Manage context budgets precisely
  • Batch graph queries - Efficiently fetch entities and relationships
  • Context formatting - Prepare content for LLM consumption
  • Source tracking - Track provenance of context

SemStreams provides the HOW (building blocks), consumers decide the WHAT (relevance).

Quick Start

import "github.com/c360studio/semstreams/pkg/context"

// Query entities from the graph
result, err := context.BatchQueryEntitiesWithOptions(ctx, graphClient, entityIDs,
    context.BatchQueryOptions{
        IncludeRelationships: true,
        Depth:                1,
    })
if err != nil {
    return err
}

// Build constructed context with token tracking
opts := context.FormatOptions{
    MaxTokens:      8000,
    PrettyPrint:    true,
    SectionHeaders: true,
}
constructed, err := context.BuildContextFromBatch(result, opts)
if err != nil {
    return err
}

// Embed in TaskMessage - token count is exact
task.Context = constructed
fmt.Printf("Context uses %d tokens\n", constructed.TokenCount)

API Reference

Core Types
Type Description
ConstructedContext Formatted context with token count and source tracking
Source Tracks where context originated (entity, relationship, document)
BatchQueryResult Results from batch entity queries
BudgetAllocation Tracks token budget allocation across sections
FormatOptions Configures context formatting
Token Estimation
Function Description
EstimateTokens(s string) int Estimate tokens (~4 chars/token)
EstimateTokensForModel(s, model string) int Model-specific estimation
FitsInBudget(content string, budget int) bool Check if content fits budget
TruncateToBudget(content string, budget int) string Truncate at word boundaries
CountWords(s string) int Count words in string
TokensFromWords(wordCount int) int Estimate tokens from word count
Batch Graph Queries
Function Description
BatchQueryEntities(ctx, client, entityIDs) Batch lookup with defaults
BatchQueryEntitiesWithOptions(ctx, client, entityIDs, opts) Configurable batch lookup
ExpandWithNeighbors(ctx, client, entityIDs, depth) Expand to include N-hop neighbors
CollectEntityIDs(relationships) Extract unique entity IDs
Context Formatting
Function Description
FormatEntitiesForContext(entities, opts) Format entities for LLM
FormatRelationshipsForContext(relationships, opts) Format relationships for LLM
FormatBatchResultForContext(result, opts) Format complete batch result
BuildContextFromBatch(result, opts) Create ConstructedContext
Helper Functions
Function Description
NewConstructedContext(content, entities, sources) Create ConstructedContext
EntitySource(entityID) Create entity source
RelationshipSource(relationshipID) Create relationship source
DocumentSource(docID) Create document source
NewBudgetAllocation(totalBudget) Create budget tracker

Token Budget Management

The package provides tools for managing token budgets across context sections:

// Allocate budget across sections
budget := context.NewBudgetAllocation(10000)
budget.Allocate("system_prompt", 500)
budget.Allocate("entities", 4000)
budget.Allocate("relationships", 2000)
remaining := budget.Remaining() // 3500 for conversation

// Or allocate proportionally
budget := context.NewBudgetAllocation(8000)
budget.Allocate("system_prompt", 500)
allocations := budget.AllocateProportionally(
    []string{"entities", "relationships", "history"},
    []float64{0.5, 0.2, 0.3},
)

BatchQueryOptions

Configure batch queries with these options:

Option Type Default Description
IncludeRelationships bool false Fetch relationships for each entity
Depth int 0 Relationship traversal depth
MaxConcurrent int 10 Max concurrent relationship queries

FormatOptions

Configure formatting with these options:

Option Type Default Description
MaxTokens int 4000 Maximum tokens for output
PrettyPrint bool true Pretty print JSON
IncludeMetadata bool false Include entity metadata
EntityOrder []string nil Explicit entity ordering
SectionHeaders bool true Add section headers

Integration with Workflows

When using ConstructedContext with the workflow processor's publish_agent action:

{
  "name": "review",
  "action": {
    "type": "publish_agent",
    "role": "reviewer",
    "prompt": "Review the following code",
    "context": "${steps.build_context.output}"
  }
}

The context construction step produces a ConstructedContext that is embedded directly in the agent task. This enables:

  1. Exact token budgets - Know context size before dispatch
  2. Fresh context per task - No pollution from prior agent work
  3. Source tracking - Trace which entities contributed to decisions

Design Philosophy

This package follows the principle that "what's relevant" is domain knowledge:

  • A code review system has different relevance criteria than a logistics system
  • Rather than embedding domain-specific heuristics, SemStreams provides utilities
  • The consumer (e.g., SemSpec) implements the relevance logic

Pattern:

Consumer:
1. Analyze task to determine relevant entities (domain logic)
2. Use pkg/context to query and format entities (building blocks)
3. Embed ConstructedContext in TaskMessage (integration)

SemStreams:
4. Agent loop receives pre-built context
5. No runtime discovery needed
6. Token budget is known precisely

Documentation

Overview

Package context provides building blocks for context construction in agentic systems.

Overview

This package implements the "embed context, don't make agents discover it" pattern. Consumers use these utilities to build ConstructedContext before dispatching agents, enabling precise token budget management and eliminating runtime context discovery.

The key insight is that SemStreams provides the HOW (building blocks), while consumers decide the WHAT (what's relevant for their domain). This separation allows domain-specific context construction while providing reusable utilities for token management, batch queries, and LLM-friendly formatting.

Core Types

ConstructedContext wraps formatted context with token count and source tracking. It contains everything needed to embed context in an agent task:

  • Content: The formatted string ready for LLM consumption
  • TokenCount: Exact token count for budget management
  • Entities: Entity IDs included in the context
  • Sources: Provenance tracking (where context came from)
  • ConstructedAt: Timestamp for cache management

Source tracks where context originated. Source types include:

  • graph_entity: Context from a knowledge graph entity
  • graph_relationship: Context from graph relationships
  • document: Context from a document or chunk

Building Block Functions

Token estimation functions help manage context budgets:

Batch graph query functions fetch entities efficiently:

Context formatting functions prepare content for LLMs:

Example Usage

Building context for an agent task:

// Query entities from the graph
result, err := context.BatchQueryEntitiesWithOptions(ctx, client, entityIDs,
    context.BatchQueryOptions{
        IncludeRelationships: true,
        Depth:                1,
    })
if err != nil {
    return err
}

// Build constructed context with token tracking
opts := context.FormatOptions{
    MaxTokens:      8000,
    PrettyPrint:    true,
    SectionHeaders: true,
}
constructed, err := context.BuildContextFromBatch(result, opts)
if err != nil {
    return err
}

// Embed in TaskMessage - token count is exact
task.Context = constructed

Token budget management:

budget := context.NewBudgetAllocation(10000)
budget.Allocate("system_prompt", 500)
budget.Allocate("entities", 4000)
remaining := budget.Remaining() // 5500 for other content

Integration with Workflows

When using ConstructedContext with the workflow processor's publish_agent action, the context is embedded directly in the TaskMessage. This enables the pattern:

  1. Consumer builds context using domain-specific logic
  2. Exact token count known before agent dispatch
  3. Agent loop receives pre-built context (no discovery needed)
  4. Fresh context per task (no pollution from prior work)

Design Rationale

This package follows the principle that "what's relevant" is domain knowledge. A code review system has different relevance criteria than a logistics system. Rather than embedding domain-specific heuristics, SemStreams provides utilities that any domain can use:

  • Token counting and budget management
  • Efficient batch graph queries
  • LLM-friendly formatting
  • Source tracking for provenance

The consumer (e.g., SemSpec) implements the relevance logic and uses these building blocks to construct the final context.

Package context provides building blocks for context construction in agentic systems. Consumers use these utilities to build ConstructedContext before dispatching agents, enabling "embed context, don't make agents discover it" pattern.

Index

Constants

View Source
const DefaultCharsPerToken = 4

DefaultCharsPerToken is the average characters per token for most LLMs. Claude uses roughly 4 characters per token for English text.

Variables

This section is empty.

Functions

func CollectEntityIDs

func CollectEntityIDs(relationships []Relationship) []string

CollectEntityIDs extracts unique entity IDs from relationships

func CountWords

func CountWords(s string) int

CountWords counts words in a string (useful for rough estimates)

func EstimateTokens

func EstimateTokens(s string) int

EstimateTokens estimates token count for a string. Uses a heuristic of ~4 characters per token, which is accurate for English text with Claude models.

func EstimateTokensForModel

func EstimateTokensForModel(s string, model string) int

EstimateTokensForModel estimates tokens for a specific model. Currently all models use the same estimate, but this allows for model-specific adjustments in the future.

func ExpandWithNeighbors

func ExpandWithNeighbors(ctx context.Context, client GraphClient, entityIDs []string, depth int) ([]string, error)

ExpandWithNeighbors expands entity IDs to include their neighbors

func FitsInBudget

func FitsInBudget(content string, budget int) bool

FitsInBudget checks if content fits within a token budget.

func FormatBatchResultForContext

func FormatBatchResultForContext(result *BatchQueryResult, opts FormatOptions) (string, int, error)

FormatBatchResultForContext formats a BatchQueryResult for LLM context.

func FormatEntitiesForContext

func FormatEntitiesForContext(entities map[string]json.RawMessage, opts FormatOptions) (string, int, error)

FormatEntitiesForContext formats entity data for LLM context. Returns the formatted string, token count, and any error.

func FormatRelationshipsForContext

func FormatRelationshipsForContext(relationships []Relationship, opts FormatOptions) (string, int, error)

FormatRelationshipsForContext formats relationships for LLM context.

func TokensFromWords

func TokensFromWords(wordCount int) int

TokensFromWords estimates tokens from word count. Roughly 1.3 tokens per word for English.

func TruncateToBudget

func TruncateToBudget(content string, budget int) string

TruncateToBudget truncates content to fit within a token budget. Attempts to truncate at word boundaries.

Types

type BatchQueryOptions

type BatchQueryOptions struct {
	IncludeRelationships bool
	Depth                int
	MaxConcurrent        int // Max concurrent queries (default: 10)
}

BatchQueryOptions configures batch query behavior

type BatchQueryResult

type BatchQueryResult struct {
	Entities      map[string]json.RawMessage
	Relationships []Relationship
	NotFound      []string
	Errors        map[string]error
}

BatchQueryResult contains results from a batch query

func BatchQueryEntities

func BatchQueryEntities(ctx context.Context, client GraphClient, entityIDs []string) (*BatchQueryResult, error)

BatchQueryEntities performs batch entity lookups efficiently. Returns all found entities and tracks which were not found.

func BatchQueryEntitiesWithOptions

func BatchQueryEntitiesWithOptions(ctx context.Context, client GraphClient, entityIDs []string, opts BatchQueryOptions) (*BatchQueryResult, error)

BatchQueryEntitiesWithOptions performs batch entity lookups with options.

type BudgetAllocation

type BudgetAllocation struct {
	TotalBudget int
	Allocated   int
	Sections    map[string]int
}

BudgetAllocation helps allocate token budget across multiple content sections.

func NewBudgetAllocation

func NewBudgetAllocation(totalBudget int) *BudgetAllocation

NewBudgetAllocation creates a new budget allocation tracker.

func (*BudgetAllocation) Allocate

func (b *BudgetAllocation) Allocate(section string, requested int) int

Allocate allocates budget for a section. Returns the actual allocation (may be less than requested if budget is exhausted).

func (*BudgetAllocation) AllocateProportionally

func (b *BudgetAllocation) AllocateProportionally(sections []string, weights []float64) map[string]int

AllocateProportionally allocates remaining budget proportionally across sections.

func (*BudgetAllocation) Remaining

func (b *BudgetAllocation) Remaining() int

Remaining returns the remaining budget.

type ConstructedContext

type ConstructedContext = types.ConstructedContext

ConstructedContext is an alias for types.ConstructedContext. The canonical type is defined in pkg/types/context.go.

func BuildContextFromBatch

func BuildContextFromBatch(result *BatchQueryResult, opts FormatOptions) (*ConstructedContext, error)

BuildContextFromBatch creates a ConstructedContext from a BatchQueryResult.

func NewConstructedContext

func NewConstructedContext(content string, entities []string, sources []Source) *ConstructedContext

NewConstructedContext creates a new ConstructedContext from parts.

type FormatOptions

type FormatOptions struct {
	MaxTokens       int      // Max tokens for output
	PrettyPrint     bool     // Pretty print JSON
	IncludeMetadata bool     // Include entity metadata
	EntityOrder     []string // Explicit order for entities (if empty, uses map order)
	SectionHeaders  bool     // Add section headers
}

FormatOptions configures context formatting

func DefaultFormatOptions

func DefaultFormatOptions() FormatOptions

DefaultFormatOptions returns sensible defaults for formatting

type GraphClient

type GraphClient interface {
	// QueryEntities fetches multiple entities by their IDs
	QueryEntities(ctx context.Context, entityIDs []string) (map[string]json.RawMessage, error)

	// QueryRelationships fetches relationships for an entity with depth
	QueryRelationships(ctx context.Context, entityID string, depth int) ([]Relationship, error)
}

GraphClient defines the interface for batch graph query operations. This mirrors the interface in workflow actions but is defined here for use by context construction utilities.

type Relationship

type Relationship = types.Relationship

Relationship is an alias for the shared type

type Source

type Source = types.ContextSource

Source is an alias for types.ContextSource, tracking where context came from. The canonical type is defined in pkg/types/context.go.

func DocumentSource

func DocumentSource(docID string) Source

DocumentSource creates a Source for a document

func EntitySource

func EntitySource(entityID string) Source

EntitySource creates a Source for a graph entity

func RelationshipSource

func RelationshipSource(relationshipID string) Source

RelationshipSource creates a Source for a graph relationship

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL