context-management

command

v1.1.1 Latest Latest Go to latest Published: Nov 23, 2025 License: Apache-2.0 Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/AltairaLabs/PromptKit

Links

Open Source Insights

README ¶

Context Management Example

This example demonstrates how to use the ContextBuilderMiddleware to manage token budgets and context truncation in conversations.

What is Context Management?

Context management helps you control:

Token costs: Keep conversations within budget by limiting context size
Model limits: Stay within model context window limits (e.g., GPT-4's 8K/32K/128K token limits)
Performance: Smaller contexts = faster responses and lower latency

Truncation Strategies

The SDK supports four truncation strategies:

1. `TruncateOldest` (Default)

Removes the oldest messages first when the token budget is exceeded.

Best for:

Customer support chats where recent context is most important
Task-oriented conversations where early messages become less relevant
General chatbots where conversation flow matters more than history

Example: In a 10-turn conversation, if we exceed the budget, we drop turns 1, 2, 3, etc.

2. `TruncateFail`

Returns an error when the token budget is exceeded instead of truncating.

Best for:

Critical conversations where losing context is unacceptable
Applications with strict compliance requirements
Cases where you want explicit control over context management

Example: If a conversation exceeds 1000 tokens, the SDK returns an error instead of proceeding.

3. `TruncateSummarize` (Not shown in example)

Compresses old messages into summaries before removing them.

Best for:

Long conversations where you need to preserve key information
Research or analysis tasks where history matters
Use cases where semantic compression is valuable

Example: Instead of dropping turn 1 completely, summarize it as "User asked about pricing, got basic plan info"

4. `TruncateLeastRelevant` (Not shown in example)

Uses semantic similarity to keep the most relevant messages for the current conversation.

Best for:

Non-linear conversations where topics jump around
Knowledge retrieval where relevance matters more than recency
Complex multi-topic discussions

Example: If discussing pricing now, keep all pricing-related turns even if they're old, drop irrelevant small talk

Configuration

contextPolicy := &middleware.ContextBuilderPolicy{
    TokenBudget:      2000,  // Maximum tokens for entire context
    ReserveForOutput: 500,   // Reserve tokens for the model's response
    Strategy:         middleware.TruncateOldest,
    CacheBreakpoints: false, // Enable Anthropic-style cache markers
}

config := sdk.ConversationConfig{
    UserID:        "user-123",
    PromptName:    "assistant",
    ContextPolicy: contextPolicy,  // Pass the policy here
    Variables: map[string]interface{}{
        "name": "Assistant",
    },
}

Token Budget Calculation

The effective budget for conversation history is:

Available for history = TokenBudget - ReserveForOutput - SystemPrompt tokens - CurrentMessage tokens

For example with a 2000 token budget:

TokenBudget: 2000
ReserveForOutput: 500
SystemPrompt: ~100 tokens
CurrentMessage: ~50 tokens
Available for history: 2000 - 500 - 100 - 50 = 1350 tokens

Running the Example

# Set your OpenAI API key
export OPENAI_API_KEY="your-api-key-here"

# Run the example
go run main.go

Expected Output

The example demonstrates two scenarios:

Oldest-first truncation: A 5-turn conversation with a 2000 token budget
- Shows how the SDK automatically removes old messages
- Conversation continues smoothly despite truncation
Fail-on-overflow: A conversation with a strict 1000 token budget
- Shows how the SDK returns an error when budget is exceeded
- Gives you explicit control over handling overflow

Production Considerations

Choose the right budget: Consider your model's limits and typical conversation lengths
Reserve enough for output: Set ReserveForOutput based on expected response length
Pick the right strategy: Match the strategy to your use case requirements
Monitor token usage: Track result.TokensUsed to understand actual consumption
Handle errors gracefully: When using TruncateFail, have a fallback strategy

Cost Optimization Tips

Use aggressive budgets (e.g., 1000-2000 tokens) for simple Q&A
Reserve more tokens (e.g., 4000-8000) for complex reasoning tasks
Consider TruncateSummarize for long conversations to preserve context while reducing costs
Use TruncateLeastRelevant when conversation topics are non-linear

Next Steps

Experiment with different token budgets
Try the TruncateSummarize strategy for long conversations
Implement custom truncation logic by extending the middleware
Monitor token usage in production to optimize budgets

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Context Management Example

What is Context Management?

Truncation Strategies

1. TruncateOldest (Default)

2. TruncateFail

3. TruncateSummarize (Not shown in example)

4. TruncateLeastRelevant (Not shown in example)

Configuration

Token Budget Calculation

Running the Example

Expected Output

Production Considerations

Cost Optimization Tips

Next Steps

Documentation ¶

Source Files ¶

1. `TruncateOldest` (Default)

2. `TruncateFail`

3. `TruncateSummarize` (Not shown in example)

4. `TruncateLeastRelevant` (Not shown in example)