extraction

package

v0.3.4 Latest Latest Go to latest Published: Jan 14, 2026 License: GPL-3.0 Imports: 12 Imported by: 0

Documentation ¶

Overview ¶

Package extraction provides decision extraction capabilities from conversation messages using heuristic pattern matching.

The package supports:

Heuristic-based decision detection using configurable regex patterns
Confidence scoring for extracted decisions
Context window building from surrounding messages
Extensible pattern configuration

Architecture ¶

The main components are:

HeuristicExtractor: Pattern-based decision extraction
Pattern: Configurable regex patterns with weights and tags
DecisionCandidate: Represents a potential decision with confidence score

Usage ¶

Create a heuristic extractor with default patterns:

extractor, err := extraction.NewHeuristicExtractor(extraction.ExtractionConfig{
    ConfidenceThreshold: 0.5,
    LLMRefineThreshold:  0.8,
})

Extract decisions from messages:

candidates, err := extractor.Extract(messages)
for _, c := range candidates {
    fmt.Printf("Decision: %s (confidence: %.2f)\n", c.PatternMatched, c.Confidence)
}

Pattern Configuration ¶

Patterns can be customized via ExtractionConfig.Patterns. Each pattern has:

Regex: The pattern to match
Weight: Confidence score (0.0-1.0) when matched
Name: Human-readable pattern name
Tags: Categories for grouping (e.g., "architecture", "refactoring")

Default patterns detect common decision indicators like "I decided to", "The approach I'll take", "After considering", etc.

LLM Refinement ¶

The NeedsLLMRefine field on DecisionCandidate indicates whether the decision should be refined by an LLM for higher accuracy. This is set when the confidence score is above ConfidenceThreshold but below LLMRefineThreshold.

Note: LLM-based refinement is not yet implemented. The field is reserved for future enhancement.

Package extraction provides decision detection and tag extraction from Claude Code conversation messages. It supports both heuristic (pattern-based) and LLM-based extraction methods.

Index ¶

Variables
func ExtractDomain(tags []string) string
type Config
type Decision
type DecisionCandidate
type DecisionExtractor
- func NewDecisionExtractor(cfg ExtractionConfig) (DecisionExtractor, error)
type DefaultTagExtractor
- func NewTagExtractor(rules map[string][]string) *DefaultTagExtractor
- func (t *DefaultTagExtractor) ExtractTags(content string) []string
- func (t *DefaultTagExtractor) ExtractTagsFromFiles(paths []string) []string
type ExtractionConfig
- func DefaultConfig() ExtractionConfig
type HeuristicExtractor
- func NewHeuristicExtractor(cfg ExtractionConfig) (*HeuristicExtractor, error)
- func (h *HeuristicExtractor) Extract(messages []RawMessage) ([]DecisionCandidate, error)
type NoOpExtractor
- func (n *NoOpExtractor) Extract(messages []RawMessage) ([]DecisionCandidate, error)
type NoOpSummarizer
- func (n *NoOpSummarizer) Available() bool
- func (n *NoOpSummarizer) Summarize(ctx context.Context, candidate DecisionCandidate) (Decision, error)
type Pattern
- func DefaultPatterns() []Pattern
type RawMessage
type Summarizer
- func NewSummarizer(cfg ExtractionConfig) (Summarizer, error)
type TagExtractor
type TagRules

Constants ¶

This section is empty.

Variables ¶

View Source

var DefaultTagRules = map[string][]string{

	"golang":     {".go", "go mod", "go build", "go test", "golang"},
	"python":     {".py", "pip", "pytest", "python", "django", "flask"},
	"typescript": {".ts", ".tsx", "npm", "yarn", "node", "typescript"},
	"javascript": {".js", ".jsx", "npm", "node", "javascript"},
	"rust":       {".rs", "cargo", "rustc", "rust"},
	"java":       {".java", "maven", "gradle", "java"},

	"kubernetes": {"kubectl", "k8s", "helm", "deployment.yaml", "service.yaml", "kubernetes"},
	"terraform":  {".tf", "terraform", "tfstate", "tfvars"},
	"docker":     {"Dockerfile", "docker-compose", "container", "image", "docker"},
	"aws":        {"aws", "s3", "ec2", "lambda", "cloudformation", "iam"},
	"gcp":        {"gcloud", "gcp", "pubsub", "bigquery", "gke"},

	"debugging":     {"fix", "bug", "error", "issue", "broken", "failing", "debug"},
	"documentation": {"docs", "readme", "comment", "explain", "document", "documentation"},
	"testing":       {"test", "spec", "coverage", "mock", "assert", "unittest"},
	"refactoring":   {"refactor", "cleanup", "rename", "extract", "simplify", "restructure"},
	"security":      {"auth", "secret", "credential", "permission", "encrypt", "security"},
	"performance":   {"optimize", "slow", "fast", "cache", "latency", "performance"},

	"api":           {"api", "endpoint", "rest", "grpc", "graphql"},
	"database":      {"database", "sql", "postgres", "mysql", "mongodb", "redis"},
	"frontend":      {"frontend", "ui", "react", "vue", "angular", "css"},
	"backend":       {"backend", "server", "service", "handler"},
	"microservices": {"microservice", "service mesh", "istio"},
}

DefaultTagRules maps tags to keywords/patterns that indicate them.

Functions ¶

func ExtractDomain ¶

func ExtractDomain(tags []string) string

ExtractDomain tries to determine the domain/area from tags.

Types ¶

type Config ¶

type Config struct {
	Model     string `json:"model,omitempty"`
	APIKey    string `json:"api_key,omitempty"`
	BaseURL   string `json:"base_url,omitempty"`
	MaxTokens int    `json:"max_tokens,omitempty"`
	Timeout   int    `json:"timeout,omitempty"`
}

Config holds provider-specific configuration.

type Decision ¶

type Decision struct {
	Summary      string   `json:"summary"`
	Alternatives []string `json:"alternatives,omitempty"`
	Reasoning    string   `json:"reasoning,omitempty"`
	Tags         []string `json:"tags,omitempty"`
	Confidence   float64  `json:"confidence"`
}

Decision represents a refined, structured decision extracted from conversation.

type DecisionCandidate ¶

type DecisionCandidate struct {
	SessionID      string   `json:"session_id"`
	MessageUUID    string   `json:"message_uuid"`
	Content        string   `json:"content"`
	Context        []string `json:"context,omitempty"` // Surrounding messages
	PatternMatched string   `json:"pattern_matched"`
	Confidence     float64  `json:"confidence"`
	NeedsLLMRefine bool     `json:"needs_llm_refine"`
}

DecisionCandidate represents a potential decision found in messages.

type DecisionExtractor ¶

type DecisionExtractor interface {
	// Extract finds decision candidates in messages.
	Extract(messages []RawMessage) ([]DecisionCandidate, error)
}

DecisionExtractor extracts decision candidates from messages.

func NewDecisionExtractor ¶

func NewDecisionExtractor(cfg ExtractionConfig) (DecisionExtractor, error)

NewDecisionExtractor creates a decision extractor based on configuration.

type DefaultTagExtractor ¶

type DefaultTagExtractor struct {
	// contains filtered or unexported fields
}

DefaultTagExtractor implements TagExtractor using keyword matching.

func NewTagExtractor ¶

func NewTagExtractor(rules map[string][]string) *DefaultTagExtractor

NewTagExtractor creates a new tag extractor with the given rules.

func (*DefaultTagExtractor) ExtractTags ¶

func (t *DefaultTagExtractor) ExtractTags(content string) []string

ExtractTags extracts tags from content based on keyword matching.

func (*DefaultTagExtractor) ExtractTagsFromFiles ¶

func (t *DefaultTagExtractor) ExtractTagsFromFiles(paths []string) []string

ExtractTagsFromFiles extracts tags based on file extensions and paths.

type ExtractionConfig ¶

type ExtractionConfig struct {
	Enabled   bool              `json:"enabled"`
	Provider  string            `json:"provider"` // "disabled", "heuristic", "anthropic", "openai"
	Providers map[string]Config `json:"providers,omitempty"`

	// Heuristic configuration
	Patterns              []Pattern `json:"patterns,omitempty"`
	ConfidenceThreshold   float64   `json:"confidence_threshold"`
	LLMRefineThreshold    float64   `json:"llm_refine_threshold"`
	ContextWindowMessages int       `json:"context_window_messages"`
}

ExtractionConfig holds configuration for extraction operations.

func DefaultConfig ¶

func DefaultConfig() ExtractionConfig

DefaultConfig returns a default extraction configuration.

type HeuristicExtractor ¶

type HeuristicExtractor struct {
	// contains filtered or unexported fields
}

HeuristicExtractor implements DecisionExtractor using pattern matching.

func NewHeuristicExtractor ¶

func NewHeuristicExtractor(cfg ExtractionConfig) (*HeuristicExtractor, error)

NewHeuristicExtractor creates a new heuristic decision extractor.

func (*HeuristicExtractor) Extract ¶

func (h *HeuristicExtractor) Extract(messages []RawMessage) ([]DecisionCandidate, error)

Extract finds decision candidates in messages using pattern matching.

type NoOpExtractor ¶

type NoOpExtractor struct{}

NoOpExtractor is a no-op implementation of DecisionExtractor.

func (*NoOpExtractor) Extract ¶

func (n *NoOpExtractor) Extract(messages []RawMessage) ([]DecisionCandidate, error)

Extract returns an empty slice.

type NoOpSummarizer ¶

type NoOpSummarizer struct{}

NoOpSummarizer is a no-op implementation of Summarizer.

func (*NoOpSummarizer) Available ¶

func (n *NoOpSummarizer) Available() bool

Available returns false for NoOpSummarizer.

func (*NoOpSummarizer) Summarize ¶

func (n *NoOpSummarizer) Summarize(ctx context.Context, candidate DecisionCandidate) (Decision, error)

Summarize returns the candidate as-is without LLM refinement.

type Pattern ¶

type Pattern struct {
	Name   string  `json:"name"`
	Regex  string  `json:"regex"`
	Weight float64 `json:"weight"`
}

Pattern represents a decision detection pattern.

func DefaultPatterns ¶

func DefaultPatterns() []Pattern

DefaultPatterns returns the default decision detection patterns.

type RawMessage ¶

type RawMessage struct {
	SessionID string `json:"session_id"`
	UUID      string `json:"uuid"`
	Role      string `json:"role"`
	Content   string `json:"content"`
}

RawMessage is the interface expected from conversation.RawMessage. We define it here to avoid circular imports.

type Summarizer ¶

type Summarizer interface {
	// Summarize refines a decision candidate into a structured decision.
	Summarize(ctx context.Context, candidate DecisionCandidate) (Decision, error)

	// Available returns true if the summarizer is configured and ready.
	Available() bool
}

Summarizer refines decision candidates using LLM or other methods.

func NewSummarizer ¶

func NewSummarizer(cfg ExtractionConfig) (Summarizer, error)

NewSummarizer creates a summarizer based on configuration.

type TagExtractor ¶

type TagExtractor interface {
	// ExtractTags returns tags found in the content.
	ExtractTags(content string) []string

	// ExtractTagsFromFiles returns tags based on file paths.
	ExtractTagsFromFiles(paths []string) []string
}

TagExtractor extracts tags from content based on rules.

type TagRules ¶

type TagRules struct {
	Rules map[string][]string
}

TagRules represents configurable tag extraction rules.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL