extraction

package
v0.3.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2026 License: GPL-3.0 Imports: 12 Imported by: 0

Documentation

Overview

Package extraction provides decision extraction capabilities from conversation messages using heuristic pattern matching.

The package supports:

  • Heuristic-based decision detection using configurable regex patterns
  • Confidence scoring for extracted decisions
  • Context window building from surrounding messages
  • Extensible pattern configuration

Architecture

The main components are:

  • HeuristicExtractor: Pattern-based decision extraction
  • Pattern: Configurable regex patterns with weights and tags
  • DecisionCandidate: Represents a potential decision with confidence score

Usage

Create a heuristic extractor with default patterns:

extractor, err := extraction.NewHeuristicExtractor(extraction.ExtractionConfig{
    ConfidenceThreshold: 0.5,
    LLMRefineThreshold:  0.8,
})

Extract decisions from messages:

candidates, err := extractor.Extract(messages)
for _, c := range candidates {
    fmt.Printf("Decision: %s (confidence: %.2f)\n", c.PatternMatched, c.Confidence)
}

Pattern Configuration

Patterns can be customized via ExtractionConfig.Patterns. Each pattern has:

  • Regex: The pattern to match
  • Weight: Confidence score (0.0-1.0) when matched
  • Name: Human-readable pattern name
  • Tags: Categories for grouping (e.g., "architecture", "refactoring")

Default patterns detect common decision indicators like "I decided to", "The approach I'll take", "After considering", etc.

LLM Refinement

The NeedsLLMRefine field on DecisionCandidate indicates whether the decision should be refined by an LLM for higher accuracy. This is set when the confidence score is above ConfidenceThreshold but below LLMRefineThreshold.

Note: LLM-based refinement is not yet implemented. The field is reserved for future enhancement.

Package extraction provides decision detection and tag extraction from Claude Code conversation messages. It supports both heuristic (pattern-based) and LLM-based extraction methods.

Index

Constants

This section is empty.

Variables

View Source
var DefaultTagRules = map[string][]string{

	"golang":     {".go", "go mod", "go build", "go test", "golang"},
	"python":     {".py", "pip", "pytest", "python", "django", "flask"},
	"typescript": {".ts", ".tsx", "npm", "yarn", "node", "typescript"},
	"javascript": {".js", ".jsx", "npm", "node", "javascript"},
	"rust":       {".rs", "cargo", "rustc", "rust"},
	"java":       {".java", "maven", "gradle", "java"},

	"kubernetes": {"kubectl", "k8s", "helm", "deployment.yaml", "service.yaml", "kubernetes"},
	"terraform":  {".tf", "terraform", "tfstate", "tfvars"},
	"docker":     {"Dockerfile", "docker-compose", "container", "image", "docker"},
	"aws":        {"aws", "s3", "ec2", "lambda", "cloudformation", "iam"},
	"gcp":        {"gcloud", "gcp", "pubsub", "bigquery", "gke"},

	"debugging":     {"fix", "bug", "error", "issue", "broken", "failing", "debug"},
	"documentation": {"docs", "readme", "comment", "explain", "document", "documentation"},
	"testing":       {"test", "spec", "coverage", "mock", "assert", "unittest"},
	"refactoring":   {"refactor", "cleanup", "rename", "extract", "simplify", "restructure"},
	"security":      {"auth", "secret", "credential", "permission", "encrypt", "security"},
	"performance":   {"optimize", "slow", "fast", "cache", "latency", "performance"},

	"api":           {"api", "endpoint", "rest", "grpc", "graphql"},
	"database":      {"database", "sql", "postgres", "mysql", "mongodb", "redis"},
	"frontend":      {"frontend", "ui", "react", "vue", "angular", "css"},
	"backend":       {"backend", "server", "service", "handler"},
	"microservices": {"microservice", "service mesh", "istio"},
}

DefaultTagRules maps tags to keywords/patterns that indicate them.

Functions

func ExtractDomain

func ExtractDomain(tags []string) string

ExtractDomain tries to determine the domain/area from tags.

Types

type Config

type Config struct {
	Model     string `json:"model,omitempty"`
	APIKey    string `json:"api_key,omitempty"`
	BaseURL   string `json:"base_url,omitempty"`
	MaxTokens int    `json:"max_tokens,omitempty"`
	Timeout   int    `json:"timeout,omitempty"`
}

Config holds provider-specific configuration.

type Decision

type Decision struct {
	Summary      string   `json:"summary"`
	Alternatives []string `json:"alternatives,omitempty"`
	Reasoning    string   `json:"reasoning,omitempty"`
	Tags         []string `json:"tags,omitempty"`
	Confidence   float64  `json:"confidence"`
}

Decision represents a refined, structured decision extracted from conversation.

type DecisionCandidate

type DecisionCandidate struct {
	SessionID      string   `json:"session_id"`
	MessageUUID    string   `json:"message_uuid"`
	Content        string   `json:"content"`
	Context        []string `json:"context,omitempty"` // Surrounding messages
	PatternMatched string   `json:"pattern_matched"`
	Confidence     float64  `json:"confidence"`
	NeedsLLMRefine bool     `json:"needs_llm_refine"`
}

DecisionCandidate represents a potential decision found in messages.

type DecisionExtractor

type DecisionExtractor interface {
	// Extract finds decision candidates in messages.
	Extract(messages []RawMessage) ([]DecisionCandidate, error)
}

DecisionExtractor extracts decision candidates from messages.

func NewDecisionExtractor

func NewDecisionExtractor(cfg ExtractionConfig) (DecisionExtractor, error)

NewDecisionExtractor creates a decision extractor based on configuration.

type DefaultTagExtractor

type DefaultTagExtractor struct {
	// contains filtered or unexported fields
}

DefaultTagExtractor implements TagExtractor using keyword matching.

func NewTagExtractor

func NewTagExtractor(rules map[string][]string) *DefaultTagExtractor

NewTagExtractor creates a new tag extractor with the given rules.

func (*DefaultTagExtractor) ExtractTags

func (t *DefaultTagExtractor) ExtractTags(content string) []string

ExtractTags extracts tags from content based on keyword matching.

func (*DefaultTagExtractor) ExtractTagsFromFiles

func (t *DefaultTagExtractor) ExtractTagsFromFiles(paths []string) []string

ExtractTagsFromFiles extracts tags based on file extensions and paths.

type ExtractionConfig

type ExtractionConfig struct {
	Enabled   bool              `json:"enabled"`
	Provider  string            `json:"provider"` // "disabled", "heuristic", "anthropic", "openai"
	Providers map[string]Config `json:"providers,omitempty"`

	// Heuristic configuration
	Patterns              []Pattern `json:"patterns,omitempty"`
	ConfidenceThreshold   float64   `json:"confidence_threshold"`
	LLMRefineThreshold    float64   `json:"llm_refine_threshold"`
	ContextWindowMessages int       `json:"context_window_messages"`
}

ExtractionConfig holds configuration for extraction operations.

func DefaultConfig

func DefaultConfig() ExtractionConfig

DefaultConfig returns a default extraction configuration.

type HeuristicExtractor

type HeuristicExtractor struct {
	// contains filtered or unexported fields
}

HeuristicExtractor implements DecisionExtractor using pattern matching.

func NewHeuristicExtractor

func NewHeuristicExtractor(cfg ExtractionConfig) (*HeuristicExtractor, error)

NewHeuristicExtractor creates a new heuristic decision extractor.

func (*HeuristicExtractor) Extract

func (h *HeuristicExtractor) Extract(messages []RawMessage) ([]DecisionCandidate, error)

Extract finds decision candidates in messages using pattern matching.

type NoOpExtractor

type NoOpExtractor struct{}

NoOpExtractor is a no-op implementation of DecisionExtractor.

func (*NoOpExtractor) Extract

func (n *NoOpExtractor) Extract(messages []RawMessage) ([]DecisionCandidate, error)

Extract returns an empty slice.

type NoOpSummarizer

type NoOpSummarizer struct{}

NoOpSummarizer is a no-op implementation of Summarizer.

func (*NoOpSummarizer) Available

func (n *NoOpSummarizer) Available() bool

Available returns false for NoOpSummarizer.

func (*NoOpSummarizer) Summarize

func (n *NoOpSummarizer) Summarize(ctx context.Context, candidate DecisionCandidate) (Decision, error)

Summarize returns the candidate as-is without LLM refinement.

type Pattern

type Pattern struct {
	Name   string  `json:"name"`
	Regex  string  `json:"regex"`
	Weight float64 `json:"weight"`
}

Pattern represents a decision detection pattern.

func DefaultPatterns

func DefaultPatterns() []Pattern

DefaultPatterns returns the default decision detection patterns.

type RawMessage

type RawMessage struct {
	SessionID string `json:"session_id"`
	UUID      string `json:"uuid"`
	Role      string `json:"role"`
	Content   string `json:"content"`
}

RawMessage is the interface expected from conversation.RawMessage. We define it here to avoid circular imports.

type Summarizer

type Summarizer interface {
	// Summarize refines a decision candidate into a structured decision.
	Summarize(ctx context.Context, candidate DecisionCandidate) (Decision, error)

	// Available returns true if the summarizer is configured and ready.
	Available() bool
}

Summarizer refines decision candidates using LLM or other methods.

func NewSummarizer

func NewSummarizer(cfg ExtractionConfig) (Summarizer, error)

NewSummarizer creates a summarizer based on configuration.

type TagExtractor

type TagExtractor interface {
	// ExtractTags returns tags found in the content.
	ExtractTags(content string) []string

	// ExtractTagsFromFiles returns tags based on file paths.
	ExtractTagsFromFiles(paths []string) []string
}

TagExtractor extracts tags from content based on rules.

type TagRules

type TagRules struct {
	Rules map[string][]string
}

TagRules represents configurable tag extraction rules.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL