embedding

package
v0.0.0-...-6a3e998 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 23, 2025 License: MIT Imports: 33 Imported by: 0

README

Embedding Package

Overview

The embedding package provides a sophisticated, production-ready system for generating, storing, and searching vector embeddings in the DevOps MCP platform. It supports multiple AI providers, intelligent routing, cost optimization, and fault tolerance with enterprise-grade features for multi-agent orchestration.

Architecture

embedding/
├── providers/          # Embedding provider implementations
├── bedrock.go         # AWS Bedrock integration
├── openai.go          # OpenAI integration
├── service_v2.go      # Enhanced embedding service
├── pipeline.go        # Content processing pipeline
├── router.go          # Intelligent provider routing
├── repository.go      # pgvector storage layer
├── search.go          # Vector search capabilities
├── circuit_breaker.go # Fault tolerance
└── dimension_adapter.go # Cross-model compatibility

Key Features

  • Multi-Provider Support: OpenAI, AWS Bedrock, Anthropic, Google, Voyage
  • Smart Routing: Automatic provider selection based on cost/quality/speed
  • Dimension Normalization: Handle different embedding dimensions seamlessly
  • Fault Tolerance: Circuit breakers prevent cascade failures
  • Cost Optimization: Track and optimize embedding generation costs
  • Batch Processing: Efficient batch operations with progress tracking
  • Vector Search: Advanced similarity search with metadata filtering
  • Multi-Agent Support: Different strategies per agent
  • Caching: Reduce costs with intelligent caching

Provider Support

Supported Models
Provider Model Dimensions Cost/1K Tokens Use Case
OpenAI text-embedding-3-small 1536 $0.02 General purpose
text-embedding-3-large 3072 $0.13 High quality
text-embedding-ada-002 1536 $0.10 Legacy
AWS Bedrock amazon.titan-embed-text-v1 1536 $0.10 AWS native
amazon.titan-embed-text-v2 1024 $0.02 Cost optimized
cohere.embed-english-v3 1024 $0.10 English text
cohere.embed-multilingual-v3 1024 $0.10 Multi-language
Anthropic claude-synthetic 1536 Variable Via Claude
Google textembedding-gecko 768 $0.05 Small & fast
Voyage voyage-02 1024 $0.10 Specialized
Provider Interface

All providers implement this interface:

type Provider interface {
    // Generate single embedding
    GenerateEmbedding(ctx context.Context, text string) ([]float32, error)
    
    // Batch generate embeddings
    BatchGenerateEmbeddings(ctx context.Context, texts []string) ([][]float32, error)
    
    // Get supported models
    GetSupportedModels() []ModelInfo
    
    // Health check
    HealthCheck(ctx context.Context) error
    
    // Get metadata
    GetProviderInfo() ProviderInfo
}

Usage Examples

Basic Embedding Generation
// Initialize service
service, err := embedding.NewServiceV2(
    repo,
    cache,
    logger,
    tracer,
)

// Generate embedding
result, err := service.GenerateEmbedding(ctx, &GenerateRequest{
    Text:    "Implement user authentication",
    Model:   "text-embedding-3-small",
    AgentID: agentID,
})

// Access embedding
vector := result.Embedding // []float32
Batch Processing
// Batch generate with progress tracking
results, err := service.BatchGenerateEmbeddings(ctx, &BatchRequest{
    Texts: []string{
        "Fix memory leak in worker",
        "Optimize database queries",
        "Add unit tests for API",
    },
    Model: "amazon.titan-embed-text-v1",
    Options: BatchOptions{
        Concurrency: 10,
        OnProgress: func(processed, total int) {
            log.Printf("Progress: %d/%d", processed, total)
        },
    },
})
Smart Routing
// Configure routing strategy
router := embedding.NewSmartRouter(
    providers,
    embedding.RouterConfig{
        Strategy: embedding.StrategyBalanced,
        CostLimit: 0.50, // Max $0.50 per request
        QualityThreshold: 0.8,
    },
)

// Router automatically selects best provider
provider := router.SelectProvider(ctx, &SelectionCriteria{
    TaskType: "code_analysis",
    AgentCapabilities: []string{"premium"},
    TextLength: len(text),
})
// Search similar embeddings
results, err := service.SearchEmbeddings(ctx, &SearchRequest{
    Query: "authentication bug",
    Model: "text-embedding-3-small",
    TopK: 10,
    Filters: map[string]interface{}{
        "type": "issue",
        "status": "open",
    },
    MinSimilarity: 0.7,
})

// Process results
for _, result := range results.Results {
    fmt.Printf("Score: %.3f, ID: %s\n", 
        result.Score, 
        result.ContextID,
    )
}
// Search across embeddings from different models
results, err := service.CrossModelSearch(ctx, &CrossModelRequest{
    Query: "performance optimization",
    Models: []string{
        "text-embedding-3-small",
        "amazon.titan-embed-text-v1",
    },
    TopK: 20,
    Strategy: embedding.MergeStrategyWeighted,
})

Pipeline Processing

The embedding pipeline processes different content types:

// Initialize pipeline
pipeline := embedding.NewPipeline(
    embeddingService,
    chunkingService,
    githubAdapter,
    logger,
)

// Process GitHub issue
err := pipeline.ProcessIssue(ctx, &IssueRequest{
    Owner: "S-Corkum",
    Repo: "devops-mcp",
    Number: 123,
})

// Process code file
err := pipeline.ProcessCode(ctx, &CodeRequest{
    FilePath: "pkg/services/agent.go",
    Language: "go",
    ChunkSize: 500,
})

Dimension Adaptation

Handle different embedding dimensions:

// Initialize adapter
adapter := embedding.NewDimensionAdapter()

// Adapt embeddings to target dimension
adapted, err := adapter.AdaptEmbedding(
    sourceEmbedding, // 1536 dimensions
    1024,           // target dimensions
    "text-embedding-3-small", // source model
    "amazon.titan-embed-text-v1", // target model
)

// Check adaptation quality
quality := adapter.GetAdaptationQuality(
    sourceModel,
    targetModel,
)

Circuit Breaker

Fault tolerance for provider failures:

// Wrap provider with circuit breaker
protected := embedding.NewCircuitBreaker(
    provider,
    embedding.CircuitConfig{
        FailureThreshold: 5,
        SuccessThreshold: 2,
        Timeout: 30 * time.Second,
        ResetTimeout: 60 * time.Second,
    },
)

// Circuit breaker states
// Closed -> Open (after failures)
// Open -> Half-Open (after reset timeout)
// Half-Open -> Closed (after successes)

Cost Management

Track and optimize embedding costs:

// Get cost estimate
estimate := service.EstimateCost(&CostRequest{
    Texts: texts,
    Model: "text-embedding-3-large",
})

// Set cost limits
service.SetCostLimit(10.0) // $10 limit

// Get usage report
report := service.GetUsageReport(ctx, &UsageRequest{
    StartTime: time.Now().Add(-24 * time.Hour),
    EndTime: time.Now(),
    GroupBy: "model",
})

Repository Operations

Store and retrieve embeddings:

// Insert embedding
err := repo.InsertEmbedding(ctx, &Embedding{
    ID: uuid.New(),
    ContextID: contextID,
    Model: "text-embedding-3-small",
    Vector: vector,
    Metadata: map[string]interface{}{
        "type": "code",
        "language": "go",
    },
})

// Get embeddings by context
embeddings, err := repo.GetEmbeddingsByContext(ctx, contextID)

// Bulk operations
err := repo.BulkInsertEmbeddings(ctx, embeddings)

Configuration

Environment Variables
# OpenAI
OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=org-...

# AWS Bedrock
AWS_REGION=us-east-1
BEDROCK_ENABLED=true

# Cost controls
EMBEDDING_COST_LIMIT=10.0
EMBEDDING_CACHE_TTL=3600

# Performance
EMBEDDING_BATCH_SIZE=100
EMBEDDING_CONCURRENCY=10
Service Configuration
config := embedding.ServiceConfig{
    // Provider settings
    DefaultModel: "text-embedding-3-small",
    EnabledProviders: []string{"openai", "bedrock"},
    
    // Cost controls
    MaxCostPerRequest: 0.50,
    DailyCostLimit: 100.0,
    
    // Performance
    BatchSize: 100,
    Concurrency: 10,
    CacheTTL: time.Hour,
    
    // Retry policy
    MaxRetries: 3,
    RetryDelay: time.Second,
}

service := embedding.NewServiceV2WithConfig(config)

Routing Strategies

Different strategies for provider selection:

// Quality First - Highest quality embeddings
router.SetStrategy(embedding.StrategyQuality)

// Cost Optimized - Lowest cost provider
router.SetStrategy(embedding.StrategyCost)

// Speed Optimized - Fastest response time
router.SetStrategy(embedding.StrategySpeed)

// Balanced - Balance all factors
router.SetStrategy(embedding.StrategyBalanced)

// Custom strategy
router.SetCustomStrategy(func(providers []Provider, criteria Criteria) Provider {
    // Custom selection logic
    return selectedProvider
})

Metrics and Monitoring

Built-in metrics collection:

// Metrics automatically collected:
- embedding_generation_total (counter)
- embedding_generation_duration_seconds (histogram)
- embedding_cost_dollars (counter)
- embedding_cache_hit_rate (gauge)
- embedding_provider_errors_total (counter)
- embedding_dimension_adaptations_total (counter)
- circuit_breaker_state (gauge)

Error Handling

Comprehensive error types:

// Check error types
switch err := err.(type) {
case *embedding.ProviderError:
    // Provider-specific error
    if err.Retryable {
        // Can retry
    }
case *embedding.DimensionError:
    // Dimension mismatch
case *embedding.CostLimitError:
    // Cost limit exceeded
case *embedding.RateLimitError:
    // Rate limited
    time.Sleep(err.RetryAfter)
}

Best Practices

1. Use Batch Processing
// Good: Batch multiple texts
results, err := service.BatchGenerateEmbeddings(ctx, texts)

// Avoid: Individual calls in loop
for _, text := range texts {
    result, err := service.GenerateEmbedding(ctx, text)
}
2. Enable Caching
// Cache frequently used embeddings
service.EnableCache(cache.NewRedisCache(redis))
3. Monitor Costs
// Set cost alerts
service.OnCostThreshold(5.0, func(usage float64) {
    alert.Send("Embedding cost at $%.2f", usage)
})
4. Handle Dimension Differences
// Always check dimensions when switching models
if sourceModel != targetModel {
    adapted, err := adapter.AdaptEmbedding(...)
}

Testing

Unit Tests
// Use mock providers
mockProvider := embedding.NewMockProvider()
mockProvider.On("GenerateEmbedding", text).Return(vector, nil)

// Test with mock
service := embedding.NewServiceV2(repo, cache, logger, tracer)
service.AddProvider("mock", mockProvider)
Integration Tests
// Test with real providers
func TestRealProviders(t *testing.T) {
    if testing.Short() {
        t.Skip("Skipping integration test")
    }
    
    // Test each provider
    for _, provider := range providers {
        t.Run(provider.Name(), func(t *testing.T) {
            // Test embedding generation
        })
    }
}

Performance Considerations

  • Batch Size: Optimal batch size is 50-100 texts
  • Caching: Can reduce costs by 60-80% for repeated content
  • Dimension Adaptation: ~5% quality loss when reducing dimensions
  • Circuit Breaker: Prevents cascade failures, adds <1ms overhead
  • pgvector Indexing: Use IVFFLAT index for large datasets

Future Enhancements

  • Support for multimodal embeddings (text + image)
  • Custom fine-tuned models
  • Embedding compression techniques
  • Real-time embedding updates
  • Federated search across regions
  • GPU acceleration for local models

References

Documentation

Overview

Package embedding provides embedding vector functionality for different model providers.

Index

Constants

View Source
const (

	// Content types
	ContentTypeCodeChunk  = "code_chunk"
	ContentTypeIssue      = "issue"
	ContentTypeComment    = "comment"
	ContentTypeDiscussion = "discussion"

	// Metadata keys
	MetadataKeyRepositoryOwner = "repository_owner"
	MetadataKeyRepositoryName  = "repository_name"
	MetadataKeyLanguage        = "language"
	MetadataKeyChunkType       = "chunk_type"
	MetadataKeySourceFile      = "source_file"
	MetadataKeyCreatedAt       = "created_at"
	MetadataKeyContentType     = "content_type"
)
View Source
const (
	ProviderOpenAI = "openai"
	ProviderVoyage = "voyage" // Anthropic's partner
	ProviderAmazon = "amazon"
	ProviderCohere = "cohere" // Available on Bedrock
	ProviderGoogle = "google"
)

Provider constants

View Source
const (
	ModelTypeText       = "text"
	ModelTypeCode       = "code"
	ModelTypeMultimodal = "multimodal"
)

Model type constants

View Source
const StandardDimension = 1536 // OpenAI standard for cross-model compatibility

Variables

View Source
var ProviderCapabilities = map[string]ProviderCapability{
	"openai": {
		SupportsEmbeddings: true,
		DefaultModel:       "text-embedding-3-small",
		EmbeddingModels: []ModelInfo{
			{ModelID: "text-embedding-3-small", Dimensions: 1536, MaxTokens: 8191, CostPer1M: 0.02},
			{ModelID: "text-embedding-3-large", Dimensions: 3072, MaxTokens: 8191, CostPer1M: 0.13},
			{ModelID: "text-embedding-ada-002", Dimensions: 1536, MaxTokens: 8191, CostPer1M: 0.10},
		},
	},
	"bedrock": {
		SupportsEmbeddings: true,
		DefaultModel:       "amazon.titan-embed-text-v2:0",
		EmbeddingModels: []ModelInfo{
			{ModelID: "amazon.titan-embed-text-v1", Dimensions: 1536, MaxTokens: 8192, CostPer1M: 0.02},
			{ModelID: "amazon.titan-embed-text-v2:0", Dimensions: 1024, MaxTokens: 8192, CostPer1M: 0.02},
			{ModelID: "cohere.embed-english-v3", Dimensions: 1024, MaxTokens: 0, CostPer1M: 0.10},
			{ModelID: "cohere.embed-multilingual-v3", Dimensions: 1024, MaxTokens: 0, CostPer1M: 0.10},
		},
	},
	"anthropic": {
		SupportsEmbeddings: false,
		EmbeddingModels:    []ModelInfo{},
	},
	"voyage": {
		SupportsEmbeddings: true,
		DefaultModel:       "voyage-2",
		EmbeddingModels: []ModelInfo{
			{ModelID: "voyage-2", Dimensions: 1024, MaxTokens: 0, CostPer1M: 0.10},
			{ModelID: "voyage-large-2", Dimensions: 1024, MaxTokens: 0, CostPer1M: 0.12},
			{ModelID: "voyage-code-2", Dimensions: 1024, MaxTokens: 0, CostPer1M: 0.10},
		},
	},
}

ProviderCapabilities defines what each provider supports

Functions

func CreateProviders

func CreateProviders(config *ProviderConfig) (map[string]Provider, error)

CreateProviders creates all configured providers

func GetEmbeddingModelDimensions

func GetEmbeddingModelDimensions(modelType ModelType, modelName string) (int, error)

GetEmbeddingModelDimensions returns the dimensions for a given model

func ValidateEmbeddingModel

func ValidateEmbeddingModel(modelType ModelType, modelName string) error

ValidateEmbeddingModel validates an embedding model name

Types

type AdvancedSearchService

type AdvancedSearchService interface {
	SearchService

	// CrossModelSearch performs search across embeddings from different models
	CrossModelSearch(ctx context.Context, req CrossModelSearchRequest) ([]CrossModelSearchResult, error)

	// HybridSearch performs hybrid search combining semantic and keyword search
	HybridSearch(ctx context.Context, req HybridSearchRequest) ([]HybridSearchResult, error)
}

AdvancedSearchService extends SearchService with cross-model and hybrid search capabilities

type AgentService

type AgentService interface {
	GetConfig(ctx context.Context, agentID string) (*agents.AgentConfig, error)
	GetModelsForAgent(ctx context.Context, agentID string, taskType agents.TaskType) (primary []string, fallback []string, err error)
	CreateConfig(ctx context.Context, config *agents.AgentConfig) error
	UpdateConfig(ctx context.Context, agentID string, update *agents.ConfigUpdateRequest) (*agents.AgentConfig, error)
}

AgentService defines the interface for agent configuration management

type AnthropicBatchEmbeddingRequest

type AnthropicBatchEmbeddingRequest struct {
	Model string   `json:"model"`
	Texts []string `json:"texts"`
}

AnthropicBatchEmbeddingRequest represents a request to the Anthropic embeddings API for batch processing

type AnthropicBatchEmbeddingResponse

type AnthropicBatchEmbeddingResponse struct {
	Object     string      `json:"object"`
	Embeddings [][]float32 `json:"embeddings"`
	Model      string      `json:"model"`
	Error      interface{} `json:"error,omitempty"`
}

AnthropicBatchEmbeddingResponse represents a response from the Anthropic embeddings API for batch processing

type AnthropicConfig

type AnthropicConfig struct {
	// Anthropic API key
	APIKey string
	// Anthropic API endpoint (optional)
	Endpoint string
	// Anthropic model name
	Model string
	// For testing environments
	UseMockEmbeddings bool
}

AnthropicConfig contains configuration for the Anthropic API

type AnthropicEmbeddingRequest

type AnthropicEmbeddingRequest struct {
	Model string `json:"model"`
	Text  string `json:"text"`
}

AnthropicEmbeddingRequest represents a request to the Anthropic embeddings API

type AnthropicEmbeddingResponse

type AnthropicEmbeddingResponse struct {
	Object    string      `json:"object"`
	Embedding []float32   `json:"embedding"`
	Model     string      `json:"model"`
	Error     interface{} `json:"error,omitempty"`
}

AnthropicEmbeddingResponse represents a response from the Anthropic embeddings API

type AnthropicEmbeddingService

type AnthropicEmbeddingService struct {
	// contains filtered or unexported fields
}

AnthropicEmbeddingService implements EmbeddingService using Anthropic

func NewAnthropicEmbeddingService

func NewAnthropicEmbeddingService(config *AnthropicConfig) (*AnthropicEmbeddingService, error)

NewAnthropicEmbeddingService creates a new Anthropic embedding service

func NewMockAnthropicEmbeddingService

func NewMockAnthropicEmbeddingService(modelName string) (*AnthropicEmbeddingService, error)

NewMockAnthropicEmbeddingService creates a mock Anthropic embedding service for testing

func (*AnthropicEmbeddingService) BatchGenerateEmbeddings

func (s *AnthropicEmbeddingService) BatchGenerateEmbeddings(ctx context.Context, texts []string, contentType string, contentIDs []string) ([]*EmbeddingVector, error)

BatchGenerateEmbeddings creates embeddings for multiple texts

func (*AnthropicEmbeddingService) GenerateEmbedding

func (s *AnthropicEmbeddingService) GenerateEmbedding(ctx context.Context, text string, contentType string, contentID string) (*EmbeddingVector, error)

GenerateEmbedding creates an embedding for a single text

func (*AnthropicEmbeddingService) GetModelConfig

func (s *AnthropicEmbeddingService) GetModelConfig() ModelConfig

GetModelConfig returns the model configuration

func (*AnthropicEmbeddingService) GetModelDimensions

func (s *AnthropicEmbeddingService) GetModelDimensions() int

GetModelDimensions returns the dimensions of the embeddings generated by this model

type BedrockConfig

type BedrockConfig struct {
	// AWS Region
	Region string
	// AWS credentials
	AccessKeyID     string
	SecretAccessKey string
	SessionToken    string
	// Model ID
	ModelID string
	// For testing environments when AWS credentials aren't available
	UseMockEmbeddings bool
}

BedrockConfig contains configuration for AWS Bedrock

type BedrockEmbeddingService

type BedrockEmbeddingService struct {
	// contains filtered or unexported fields
}

BedrockEmbeddingService implements EmbeddingService using AWS Bedrock

func NewBedrockEmbeddingService

func NewBedrockEmbeddingService(config *BedrockConfig) (*BedrockEmbeddingService, error)

NewBedrockEmbeddingService creates a new AWS Bedrock embedding service

func NewMockBedrockEmbeddingService

func NewMockBedrockEmbeddingService(modelID string) (*BedrockEmbeddingService, error)

NewMockBedrockEmbeddingService creates a mock Bedrock embedding service for testing This allows testing without requiring actual AWS credentials

func (*BedrockEmbeddingService) BatchGenerateEmbeddings

func (s *BedrockEmbeddingService) BatchGenerateEmbeddings(ctx context.Context, texts []string, contentType string, contentIDs []string) ([]*EmbeddingVector, error)

BatchGenerateEmbeddings creates embeddings for multiple texts

func (*BedrockEmbeddingService) GenerateEmbedding

func (s *BedrockEmbeddingService) GenerateEmbedding(ctx context.Context, text string, contentType string, contentID string) (*EmbeddingVector, error)

GenerateEmbedding creates an embedding for a single text

func (*BedrockEmbeddingService) GetModelConfig

func (s *BedrockEmbeddingService) GetModelConfig() ModelConfig

GetModelConfig returns the model configuration

func (*BedrockEmbeddingService) GetModelDimensions

func (s *BedrockEmbeddingService) GetModelDimensions() int

GetModelDimensions returns the dimensions of the embeddings generated by this model

type BedrockProvider

type BedrockProvider struct {
	// contains filtered or unexported fields
}

BedrockProvider implements the Provider interface for Amazon Bedrock embeddings

func NewBedrockProvider

func NewBedrockProvider(region string) (*BedrockProvider, error)

NewBedrockProvider creates a new Bedrock embedding provider

func (*BedrockProvider) GenerateEmbedding

func (p *BedrockProvider) GenerateEmbedding(ctx context.Context, content string, model string) ([]float32, error)

GenerateEmbedding generates an embedding using Amazon Bedrock

func (*BedrockProvider) GetSupportedModels

func (p *BedrockProvider) GetSupportedModels() []string

GetSupportedModels returns the list of supported Bedrock models

func (*BedrockProvider) ValidateAPIKey

func (p *BedrockProvider) ValidateAPIKey() error

ValidateAPIKey validates AWS credentials

type BedrockRuntimeClient

type BedrockRuntimeClient interface {
	InvokeModel(ctx context.Context, params *bedrockruntime.InvokeModelInput, optFns ...func(*bedrockruntime.Options)) (*bedrockruntime.InvokeModelOutput, error)
}

BedrockRuntimeClient defines an interface to allow for mocking in tests

type CachedEmbedding

type CachedEmbedding struct {
	Embedding  []float32              `json:"embedding"`
	Model      string                 `json:"model"`
	Provider   string                 `json:"provider"`
	Dimensions int                    `json:"dimensions"`
	Metadata   map[string]interface{} `json:"metadata"`
	CachedAt   time.Time              `json:"cached_at"`
}

CachedEmbedding represents a cached embedding

type ChunkingInterface

type ChunkingInterface interface {
	// Here we define the minimum methods needed from the chunking service
	// These methods should match what we actually use in the pipeline
	ChunkCode(ctx context.Context, content string, path string) ([]*chunking.CodeChunk, error)
}

ChunkingInterface defines the interface for chunking services

type CircuitBreaker

type CircuitBreaker struct {
	// contains filtered or unexported fields
}

CircuitBreaker implements the circuit breaker pattern

func NewCircuitBreaker

func NewCircuitBreaker(config CircuitBreakerConfig) *CircuitBreaker

NewCircuitBreaker creates a new circuit breaker

func (*CircuitBreaker) CanRequest

func (cb *CircuitBreaker) CanRequest() bool

CanRequest checks if a request can be made

func (*CircuitBreaker) HealthScore

func (cb *CircuitBreaker) HealthScore() float64

HealthScore returns a health score between 0 and 1

func (*CircuitBreaker) RecordFailure

func (cb *CircuitBreaker) RecordFailure()

RecordFailure records a failed request

func (*CircuitBreaker) RecordSuccess

func (cb *CircuitBreaker) RecordSuccess()

RecordSuccess records a successful request

func (*CircuitBreaker) Status

func (cb *CircuitBreaker) Status() *CircuitBreakerStatus

Status returns the current status

type CircuitBreakerConfig

type CircuitBreakerConfig struct {
	FailureThreshold    int
	SuccessThreshold    int
	Timeout             time.Duration
	HalfOpenMaxRequests int
}

CircuitBreakerConfig configures a circuit breaker

type CircuitBreakerState

type CircuitBreakerState string

CircuitBreakerState represents the state of a circuit breaker

const (
	StateClosed   CircuitBreakerState = "closed"
	StateOpen     CircuitBreakerState = "open"
	StateHalfOpen CircuitBreakerState = "half_open"
)

type CircuitBreakerStatus

type CircuitBreakerStatus struct {
	State               string    `json:"state"`
	FailureCount        int       `json:"failure_count"`
	SuccessCount        int       `json:"success_count"`
	LastFailureTime     time.Time `json:"last_failure_time,omitempty"`
	LastStateChangeTime time.Time `json:"last_state_change_time"`
}

CircuitBreakerStatus represents the current status

type CostOptimizer

type CostOptimizer struct {
	// contains filtered or unexported fields
}

CostOptimizer tracks and optimizes costs

func NewCostOptimizer

func NewCostOptimizer(config CostOptimizerConfig) *CostOptimizer

type CostOptimizerConfig

type CostOptimizerConfig struct {
	MaxCostPerRequest float64
}

type CostSummary

type CostSummary struct {
	AgentID      string             `json:"agent_id"`
	Period       string             `json:"period"`
	TotalCostUSD float64            `json:"total_cost_usd"`
	ByProvider   map[string]float64 `json:"by_provider"`
	ByModel      map[string]float64 `json:"by_model"`
	RequestCount int                `json:"request_count"`
	TokensUsed   int                `json:"tokens_used"`
}

type CrossModelSearchRequest

type CrossModelSearchRequest struct {
	// Query is the search query text
	Query string `json:"query"`
	// QueryEmbedding is the pre-computed query embedding (optional)
	QueryEmbedding []float32 `json:"query_embedding,omitempty"`
	// SearchModel is the model to use for generating query embeddings
	SearchModel string `json:"search_model"`
	// IncludeModels limits results to specific models (empty means all)
	IncludeModels []string `json:"include_models,omitempty"`
	// ExcludeModels excludes results from specific models
	ExcludeModels []string `json:"exclude_models,omitempty"`
	// TenantID is the tenant to search within
	TenantID uuid.UUID `json:"tenant_id"`
	// ContextID optionally limits search to a specific context
	ContextID *uuid.UUID `json:"context_id,omitempty"`
	// Limit is the maximum number of results to return
	Limit int `json:"limit"`
	// MinSimilarity is the minimum similarity threshold
	MinSimilarity float32 `json:"min_similarity"`
	// MetadataFilter is a JSONB filter for metadata
	MetadataFilter map[string]interface{} `json:"metadata_filter,omitempty"`
	// TaskType optionally specifies the type of task for scoring
	TaskType string `json:"task_type,omitempty"`
	// Options for additional search parameters
	Options *SearchOptions `json:"options,omitempty"`
}

CrossModelSearchRequest defines parameters for cross-model search

type CrossModelSearchResult

type CrossModelSearchResult struct {
	// ID is the embedding ID
	ID uuid.UUID `json:"id"`
	// ContextID is the context this embedding belongs to
	ContextID *uuid.UUID `json:"context_id,omitempty"`
	// Content is the text content
	Content string `json:"content"`
	// OriginalModel is the model that created this embedding
	OriginalModel string `json:"original_model"`
	// OriginalDimension is the original embedding dimension
	OriginalDimension int `json:"original_dimension"`
	// Similarity is the normalized similarity score
	Similarity float32 `json:"similarity"`
	// RawSimilarity is the raw similarity score before normalization
	RawSimilarity float32 `json:"raw_similarity"`
	// AgentID is the agent that created this content
	AgentID string `json:"agent_id,omitempty"`
	// Metadata contains additional information
	Metadata map[string]interface{} `json:"metadata,omitempty"`
	// CreatedAt is when the embedding was created
	CreatedAt time.Time `json:"created_at"`
	// ModelQualityScore is the quality score for this model
	ModelQualityScore float32 `json:"model_quality_score"`
	// FinalScore is the final weighted score
	FinalScore float32 `json:"final_score"`
}

CrossModelSearchResult represents a result from cross-model search

type DefaultEmbeddingPipeline

type DefaultEmbeddingPipeline struct {
	// contains filtered or unexported fields
}

DefaultEmbeddingPipeline implements EmbeddingPipeline for processing different content types

func NewEmbeddingPipeline

func NewEmbeddingPipeline(
	embeddingService EmbeddingService,
	storage EmbeddingStorage,
	chunkingService *chunking.ChunkingService,
	contentProvider GitHubContentProvider,
	config *EmbeddingPipelineConfig,
) (*DefaultEmbeddingPipeline, error)

NewEmbeddingPipeline creates a new embedding pipeline

func (*DefaultEmbeddingPipeline) BatchProcessContent

func (p *DefaultEmbeddingPipeline) BatchProcessContent(ctx context.Context, contents []string, contentType string, contentIDs []string) error

BatchProcessContent processes multiple content items in a batch

func (*DefaultEmbeddingPipeline) ProcessCodeChunks

func (p *DefaultEmbeddingPipeline) ProcessCodeChunks(ctx context.Context, contentType string, contentID string, chunkIDs []string) error

ProcessCodeChunks processes code chunks to generate and store embeddings

func (*DefaultEmbeddingPipeline) ProcessContent

func (p *DefaultEmbeddingPipeline) ProcessContent(ctx context.Context, content string, contentType string, contentID string) error

ProcessContent processes a single content item to generate and store embeddings

func (*DefaultEmbeddingPipeline) ProcessDiscussions

func (p *DefaultEmbeddingPipeline) ProcessDiscussions(ctx context.Context, ownerRepo string, discussionIDs []string) error

ProcessDiscussions processes GitHub discussions to generate and store embeddings

func (*DefaultEmbeddingPipeline) ProcessIssues

func (p *DefaultEmbeddingPipeline) ProcessIssues(ctx context.Context, ownerRepo string, issueNumbers []int) error

ProcessIssues processes GitHub issues to generate and store embeddings

type DimensionAdapter

type DimensionAdapter struct {
	// contains filtered or unexported fields
}

DimensionAdapter handles dimension normalization and projection

func NewDimensionAdapter

func NewDimensionAdapter() *DimensionAdapter

NewDimensionAdapter creates a new dimension adapter

func NewDimensionAdapterWithDB

func NewDimensionAdapterWithDB(db *sql.DB) *DimensionAdapter

NewDimensionAdapterWithDB creates a new dimension adapter with database support

func (*DimensionAdapter) GetProjectionQuality

func (da *DimensionAdapter) GetProjectionQuality(fromDim, toDim int, provider, model string) float64

GetProjectionQuality returns the quality score for a projection

func (*DimensionAdapter) Normalize

func (da *DimensionAdapter) Normalize(embedding []float32, fromDim, toDim int) []float32

Normalize normalizes an embedding to the target dimension

func (*DimensionAdapter) NormalizeWithProvider

func (da *DimensionAdapter) NormalizeWithProvider(embedding []float32, fromDim, toDim int, provider, model string) []float32

NormalizeWithProvider normalizes using provider-specific projection if available

func (*DimensionAdapter) TrainProjectionMatrix

func (da *DimensionAdapter) TrainProjectionMatrix(fromDim, toDim int, provider, model string, trainingData [][]float32) error

TrainProjectionMatrix trains a new projection matrix (would be async in production)

type Embedding

type Embedding struct {
	ID                   uuid.UUID       `json:"id" db:"id"`
	ContextID            uuid.UUID       `json:"context_id" db:"context_id"`
	ContentIndex         int             `json:"content_index" db:"content_index"`
	ChunkIndex           int             `json:"chunk_index" db:"chunk_index"`
	Content              string          `json:"content" db:"content"`
	ContentHash          string          `json:"content_hash" db:"content_hash"`
	ContentTokens        *int            `json:"content_tokens,omitempty" db:"content_tokens"`
	ModelID              uuid.UUID       `json:"model_id" db:"model_id"`
	ModelProvider        string          `json:"model_provider" db:"model_provider"`
	ModelName            string          `json:"model_name" db:"model_name"`
	ModelDimensions      int             `json:"model_dimensions" db:"model_dimensions"`
	ConfiguredDimensions *int            `json:"configured_dimensions,omitempty" db:"configured_dimensions"`
	ProcessingTimeMS     *int            `json:"processing_time_ms,omitempty" db:"processing_time_ms"`
	EmbeddingCreatedAt   time.Time       `json:"embedding_created_at" db:"embedding_created_at"`
	Magnitude            float64         `json:"magnitude" db:"magnitude"`
	TenantID             uuid.UUID       `json:"tenant_id" db:"tenant_id"`
	Metadata             json.RawMessage `json:"metadata" db:"metadata"`
	CreatedAt            time.Time       `json:"created_at" db:"created_at"`
	UpdatedAt            time.Time       `json:"updated_at" db:"updated_at"`
}

Embedding represents a stored embedding

type EmbeddingCache

type EmbeddingCache interface {
	Get(ctx context.Context, key string) (*CachedEmbedding, error)
	Set(ctx context.Context, key string, embedding *CachedEmbedding, ttl time.Duration) error
	Delete(ctx context.Context, key string) error
}

EmbeddingCache defines the interface for caching embeddings

type EmbeddingFactory

type EmbeddingFactory struct {
	// contains filtered or unexported fields
}

EmbeddingFactory creates and configures embedding components

func NewEmbeddingFactory

func NewEmbeddingFactory(config *EmbeddingFactoryConfig) (*EmbeddingFactory, error)

NewEmbeddingFactory creates a new embedding factory with the specified configuration

func (*EmbeddingFactory) CreateEmbeddingPipeline

func (f *EmbeddingFactory) CreateEmbeddingPipeline(
	chunkingService *chunking.ChunkingService,
	contentProvider GitHubContentProvider,
) (*DefaultEmbeddingPipeline, error)

CreateEmbeddingPipeline creates a complete embedding pipeline

func (*EmbeddingFactory) CreateEmbeddingService

func (f *EmbeddingFactory) CreateEmbeddingService() (EmbeddingService, error)

CreateEmbeddingService creates an embedding service based on the factory configuration

func (*EmbeddingFactory) CreateEmbeddingStorage

func (f *EmbeddingFactory) CreateEmbeddingStorage() (EmbeddingStorage, error)

CreateEmbeddingStorage creates an embedding storage based on the factory configuration

func (*EmbeddingFactory) Initialize

func (f *EmbeddingFactory) Initialize(ctx context.Context, chunkingService *chunking.ChunkingService, contentProvider GitHubContentProvider) (*DefaultEmbeddingPipeline, error)

Initialize tests all components and returns a fully configured pipeline ready for use

type EmbeddingFactoryConfig

type EmbeddingFactoryConfig struct {
	// Model configuration
	ModelType       ModelType `json:"model_type"`
	ModelName       string    `json:"model_name"`
	ModelAPIKey     string    `json:"model_api_key,omitempty"`
	ModelEndpoint   string    `json:"model_endpoint,omitempty"`
	ModelDimensions int       `json:"model_dimensions"`

	// Additional model parameters (used for provider-specific configurations)
	Parameters map[string]interface{} `json:"parameters,omitempty"`

	// Storage configuration
	DatabaseConnection *sql.DB `json:"-"`
	DatabaseSchema     string  `json:"database_schema"`

	// Pipeline configuration
	Concurrency     int  `json:"concurrency"`
	BatchSize       int  `json:"batch_size"`
	IncludeComments bool `json:"include_comments"`
	EnrichMetadata  bool `json:"enrich_metadata"`
}

EmbeddingFactoryConfig contains configuration for the embedding factory

type EmbeddingMetric

type EmbeddingMetric struct {
	ID                     uuid.UUID `json:"id" db:"id"`
	AgentID                string    `json:"agent_id" db:"agent_id"`
	ModelProvider          string    `json:"model_provider" db:"model_provider"`
	ModelName              string    `json:"model_name" db:"model_name"`
	ModelDimensions        int       `json:"model_dimensions" db:"model_dimensions"`
	RequestID              uuid.UUID `json:"request_id" db:"request_id"`
	TokenCount             int       `json:"token_count" db:"token_count"`
	TotalLatencyMs         int       `json:"total_latency_ms" db:"total_latency_ms"`
	ProviderLatencyMs      int       `json:"provider_latency_ms" db:"provider_latency_ms"`
	NormalizationLatencyMs int       `json:"normalization_latency_ms" db:"normalization_latency_ms"`
	CostUSD                float64   `json:"cost_usd" db:"cost_usd"`
	Status                 string    `json:"status" db:"status"`
	ErrorMessage           string    `json:"error_message" db:"error_message"`
	RetryCount             int       `json:"retry_count" db:"retry_count"`
	FinalProvider          string    `json:"final_provider" db:"final_provider"`
	TenantID               uuid.UUID `json:"tenant_id" db:"tenant_id"`
	Timestamp              time.Time `json:"timestamp" db:"timestamp"`
}

EmbeddingMetric represents a single metric entry

type EmbeddingPipelineConfig

type EmbeddingPipelineConfig struct {
	// Number of goroutines to use for parallel processing
	Concurrency int

	// Batch size for processing
	BatchSize int

	// Whether to include code comments in embeddings
	IncludeComments bool

	// Whether to enrich embeddings with metadata
	EnrichMetadata bool
}

EmbeddingPipelineConfig holds configuration for the embedding pipeline

func DefaultEmbeddingPipelineConfig

func DefaultEmbeddingPipelineConfig() *EmbeddingPipelineConfig

DefaultEmbeddingPipelineConfig returns the default embedding pipeline configuration

type EmbeddingProviderSelector

type EmbeddingProviderSelector struct {
	// Explicit configuration overrides
	PreferredProvider string
	PreferredModel    string

	// Auto-detection settings
	EnableAutoDetection bool
	ValidationMode      string // "strict" or "permissive"
	// contains filtered or unexported fields
}

EmbeddingProviderSelector intelligently selects and validates embedding providers

func NewEmbeddingProviderSelector

func NewEmbeddingProviderSelector() *EmbeddingProviderSelector

NewEmbeddingProviderSelector creates a new selector with auto-detection

func (*EmbeddingProviderSelector) GetProviderSummary

func (s *EmbeddingProviderSelector) GetProviderSummary() string

GetProviderSummary returns a summary of available providers

func (*EmbeddingProviderSelector) SelectProvider

func (s *EmbeddingProviderSelector) SelectProvider() (provider string, model string, dimensions int, err error)

SelectProvider returns the best available provider and model

type EmbeddingSearchResult

type EmbeddingSearchResult struct {
	ID            uuid.UUID       `json:"id" db:"id"`
	ContextID     uuid.UUID       `json:"context_id" db:"context_id"`
	Content       string          `json:"content" db:"content"`
	Similarity    float64         `json:"similarity" db:"similarity"`
	Metadata      json.RawMessage `json:"metadata" db:"metadata"`
	ModelProvider string          `json:"model_provider" db:"model_provider"`
}

EmbeddingSearchResult represents a search result

type EmbeddingService

type EmbeddingService interface {
	GenerateEmbedding(ctx context.Context, text string, contentType string, contentID string) (*EmbeddingVector, error)
	BatchGenerateEmbeddings(ctx context.Context, texts []string, contentType string, contentIDs []string) ([]*EmbeddingVector, error)
	GetModelConfig() ModelConfig
	GetModelDimensions() int
}

EmbeddingService defines the interface for generating embeddings - TEMPORARY for legacy code cleanup

type EmbeddingStorage

type EmbeddingStorage interface {
	StoreEmbedding(ctx context.Context, embedding *EmbeddingVector) error
	BatchStoreEmbeddings(ctx context.Context, embeddings []*EmbeddingVector) error
	FindSimilarEmbeddings(ctx context.Context, embedding *EmbeddingVector, limit int, threshold float32) ([]*EmbeddingVector, error)
	GetEmbeddingsByContentIDs(ctx context.Context, contentIDs []string) ([]*EmbeddingVector, error)
	DeleteEmbeddingsByContentIDs(ctx context.Context, contentIDs []string) error
}

EmbeddingStorage defines the interface for storing and retrieving embeddings - TEMPORARY for legacy code cleanup

type EmbeddingVector

type EmbeddingVector struct {
	Vector      []float32              `json:"vector"`
	Dimensions  int                    `json:"dimensions"`
	ModelID     string                 `json:"model_id"`
	ContentType string                 `json:"content_type"`
	ContentID   string                 `json:"content_id"`
	Metadata    map[string]interface{} `json:"metadata,omitempty"`
}

EmbeddingVector represents a vector embedding with metadata - TEMPORARY for legacy code cleanup

type GenerateEmbeddingRequest

type GenerateEmbeddingRequest struct {
	AgentID   string                 `json:"agent_id" validate:"required"`
	Text      string                 `json:"text" validate:"required,max=50000"`
	TaskType  agents.TaskType        `json:"task_type"`
	Metadata  map[string]interface{} `json:"metadata"`
	RequestID string                 `json:"request_id"`
	TenantID  uuid.UUID              `json:"tenant_id"`
	ContextID uuid.UUID              `json:"context_id"`
}

GenerateEmbeddingRequest represents a request to generate an embedding

type GenerateEmbeddingResponse

type GenerateEmbeddingResponse struct {
	EmbeddingID          uuid.UUID              `json:"embedding_id"`
	RequestID            string                 `json:"request_id"`
	ModelUsed            string                 `json:"model_used"`
	Provider             string                 `json:"provider"`
	Dimensions           int                    `json:"dimensions"`
	NormalizedDimensions int                    `json:"normalized_dimensions"`
	CostUSD              float64                `json:"cost_usd"`
	TokensUsed           int                    `json:"tokens_used"`
	GenerationTimeMs     int64                  `json:"generation_time_ms"`
	Cached               bool                   `json:"cached"`
	Metadata             map[string]interface{} `json:"metadata"`
}

GenerateEmbeddingResponse represents the response from generating an embedding

type GitHubComment

type GitHubComment struct {
	ID        int       `json:"id"`
	Body      string    `json:"body"`
	CreatedAt time.Time `json:"created_at"`
	UpdatedAt time.Time `json:"updated_at"`
	User      struct {
		Login string `json:"login"`
	} `json:"user"`
}

GitHubComment represents a GitHub comment

type GitHubCommentData

type GitHubCommentData struct {
	ID        int       `json:"id"`
	Body      string    `json:"body"`
	CreatedAt time.Time `json:"created_at"`
	UpdatedAt time.Time `json:"updated_at"`
	User      struct {
		Login string `json:"login"`
	} `json:"user"`
}

GitHubCommentData represents a GitHub comment for the adapter

type GitHubContentAdapter

type GitHubContentAdapter struct {
	// contains filtered or unexported fields
}

GitHubContentAdapter adapts the GitHubContentManager to the GitHubContentProvider interface

func NewGitHubContentAdapter

func NewGitHubContentAdapter(contentManager *core.GitHubContentManager) *GitHubContentAdapter

NewGitHubContentAdapter creates a new GitHub content adapter

func (*GitHubContentAdapter) GetContent

func (a *GitHubContentAdapter) GetContent(ctx context.Context, owner string, repo string, path string) ([]byte, error)

GetContent retrieves file content from GitHub

func (*GitHubContentAdapter) GetIssue

func (a *GitHubContentAdapter) GetIssue(ctx context.Context, owner string, repo string, issueNumber int) (*GitHubIssueData, error)

GetIssue retrieves issue details from GitHub

func (*GitHubContentAdapter) GetIssueComments

func (a *GitHubContentAdapter) GetIssueComments(ctx context.Context, owner string, repo string, issueNumber int) ([]*GitHubCommentData, error)

GetIssueComments retrieves issue comments from GitHub

type GitHubContentProvider

type GitHubContentProvider interface {
	// GetContent retrieves file content from GitHub
	GetContent(ctx context.Context, owner, repo, path string) ([]byte, error)

	// GetIssue retrieves issue details from GitHub
	GetIssue(ctx context.Context, owner, repo string, issueNumber int) (*GitHubIssueData, error)

	// GetIssueComments retrieves issue comments from GitHub
	GetIssueComments(ctx context.Context, owner, repo string, issueNumber int) ([]*GitHubCommentData, error)
}

GitHubContentProvider defines the interface for accessing GitHub content

type GitHubIssue

type GitHubIssue struct {
	Title     string    `json:"title"`
	Body      string    `json:"body"`
	State     string    `json:"state"`
	CreatedAt time.Time `json:"created_at"`
	UpdatedAt time.Time `json:"updated_at"`
}

GitHubIssue represents a GitHub issue

type GitHubIssueData

type GitHubIssueData struct {
	Title     string    `json:"title"`
	Body      string    `json:"body"`
	State     string    `json:"state"`
	CreatedAt time.Time `json:"created_at"`
	UpdatedAt time.Time `json:"updated_at"`
}

GitHubIssueData represents a GitHub issue for the adapter

type GoogleProvider

type GoogleProvider struct {
	// contains filtered or unexported fields
}

GoogleProvider implements the Provider interface for Google Vertex AI embeddings

func NewGoogleProvider

func NewGoogleProvider(projectID, location, apiKey string) *GoogleProvider

NewGoogleProvider creates a new Google Vertex AI embedding provider

func (*GoogleProvider) GenerateEmbedding

func (p *GoogleProvider) GenerateEmbedding(ctx context.Context, content string, model string) ([]float32, error)

GenerateEmbedding generates an embedding using Google Vertex AI

func (*GoogleProvider) GetSupportedModels

func (p *GoogleProvider) GetSupportedModels() []string

GetSupportedModels returns the list of supported Google models

func (*GoogleProvider) ValidateAPIKey

func (p *GoogleProvider) ValidateAPIKey() error

ValidateAPIKey validates the Google API key

type HybridSearchRequest

type HybridSearchRequest struct {
	// Query is the main search query for semantic search
	Query string `json:"query"`
	// Keywords are additional keywords for keyword-based search
	Keywords []string `json:"keywords,omitempty"`
	// HybridWeight determines the balance between semantic and keyword results (0.0 to 1.0)
	HybridWeight float32 `json:"hybrid_weight"`
	// TenantID is the tenant to search within
	TenantID uuid.UUID `json:"tenant_id"`
	// ModelName is the embedding model to use
	ModelName string `json:"model_name"`
	// Limit is the maximum number of results
	Limit int `json:"limit"`
	// MinSimilarity is the minimum similarity threshold
	MinSimilarity float32 `json:"min_similarity"`
	// MetadataFilter is a JSONB filter for metadata
	MetadataFilter map[string]interface{} `json:"metadata_filter,omitempty"`
	// Options for additional search parameters
	Options *SearchOptions `json:"options,omitempty"`
	// QueryEmbedding allows pre-computed embedding to be passed
	QueryEmbedding []float32 `json:"query_embedding,omitempty"`
}

HybridSearchRequest defines parameters for hybrid search

type HybridSearchResult

type HybridSearchResult struct {
	// Embed the cross-model search result
	CrossModelSearchResult
	// Result is the combined search result
	Result *SearchResult `json:"result"`
	// SemanticScore is the semantic similarity score
	SemanticScore float32 `json:"semantic_score"`
	// KeywordScore is the keyword relevance score
	KeywordScore float32 `json:"keyword_score"`
	// HybridScore is the combined score
	HybridScore float32 `json:"hybrid_score"`
}

HybridSearchResult represents a result from hybrid search

type InsertRequest

type InsertRequest struct {
	ContextID            uuid.UUID       `json:"context_id"`
	Content              string          `json:"content"`
	Embedding            []float32       `json:"embedding"`
	ModelName            string          `json:"model_name"`
	TenantID             uuid.UUID       `json:"tenant_id"`
	Metadata             json.RawMessage `json:"metadata,omitempty"`
	ContentIndex         int             `json:"content_index"`
	ChunkIndex           int             `json:"chunk_index"`
	ConfiguredDimensions *int            `json:"configured_dimensions,omitempty"` // For models that support reduction
}

InsertRequest represents a request to insert an embedding

type LoadBalancer

type LoadBalancer struct {
	// contains filtered or unexported fields
}

LoadBalancer tracks provider load

func NewLoadBalancer

func NewLoadBalancer(config LoadBalancerConfig) *LoadBalancer

func (*LoadBalancer) GetLoad

func (lb *LoadBalancer) GetLoad(provider string) float64

func (*LoadBalancer) RecordLatency

func (lb *LoadBalancer) RecordLatency(provider string, latency time.Duration)

type LoadBalancerConfig

type LoadBalancerConfig struct {
	Strategy string
}

type MetricsFilter

type MetricsFilter struct {
	AgentID   string
	Provider  string
	StartTime time.Time
	EndTime   time.Time
	Status    string
	Limit     int
}

type MetricsRepository

type MetricsRepository interface {
	RecordMetric(ctx context.Context, metric *EmbeddingMetric) error
	GetMetrics(ctx context.Context, filter MetricsFilter) ([]*EmbeddingMetric, error)
	GetAgentCosts(ctx context.Context, agentID string, period time.Duration) (*CostSummary, error)
}

MetricsRepository stores embedding metrics

type MockBedrockClient

type MockBedrockClient struct{}

MockBedrockClient provides a mock implementation of the BedrockRuntimeClient interface for testing

func (*MockBedrockClient) InvokeModel

InvokeModel provides a mock implementation that always returns an error Since we're using the useMockEmbeddings flag, this function should never actually be called

type MockGitHubContentProvider

type MockGitHubContentProvider struct{}

MockGitHubContentProvider implements GitHubContentProvider for testing

func NewMockGitHubContentProvider

func NewMockGitHubContentProvider() *MockGitHubContentProvider

NewMockGitHubContentProvider creates a new mock GitHub content provider

func (*MockGitHubContentProvider) GetContent

func (m *MockGitHubContentProvider) GetContent(ctx context.Context, owner string, repo string, path string) ([]byte, error)

GetContent mocks retrieving file content from GitHub

func (*MockGitHubContentProvider) GetIssue

func (m *MockGitHubContentProvider) GetIssue(ctx context.Context, owner string, repo string, issueNumber int) (*GitHubIssueData, error)

GetIssue mocks retrieving issue details from GitHub

func (*MockGitHubContentProvider) GetIssueComments

func (m *MockGitHubContentProvider) GetIssueComments(ctx context.Context, owner string, repo string, issueNumber int) ([]*GitHubCommentData, error)

GetIssueComments mocks retrieving issue comments from GitHub

type Model

type Model struct {
	ID                              uuid.UUID       `json:"id" db:"id"`
	Provider                        string          `json:"provider" db:"provider"`
	ModelName                       string          `json:"model_name" db:"model_name"`
	ModelVersion                    *string         `json:"model_version,omitempty" db:"model_version"`
	Dimensions                      int             `json:"dimensions" db:"dimensions"`
	MaxTokens                       *int            `json:"max_tokens,omitempty" db:"max_tokens"`
	SupportsBinary                  bool            `json:"supports_binary" db:"supports_binary"`
	SupportsDimensionalityReduction bool            `json:"supports_dimensionality_reduction" db:"supports_dimensionality_reduction"`
	MinDimensions                   *int            `json:"min_dimensions,omitempty" db:"min_dimensions"`
	CostPerMillionTokens            *float64        `json:"cost_per_million_tokens,omitempty" db:"cost_per_million_tokens"`
	ModelID                         *string         `json:"model_id,omitempty" db:"model_id"` // For Bedrock models
	ModelType                       *string         `json:"model_type,omitempty" db:"model_type"`
	IsActive                        bool            `json:"is_active" db:"is_active"`
	Capabilities                    json.RawMessage `json:"capabilities" db:"capabilities"`
	CreatedAt                       time.Time       `json:"created_at" db:"created_at"`
}

Model represents an embedding model

type ModelConfig

type ModelConfig struct {
	Type       ModelType              `json:"type"`
	Name       string                 `json:"name"`
	APIKey     string                 `json:"api_key,omitempty"`
	Endpoint   string                 `json:"endpoint,omitempty"`
	Dimensions int                    `json:"dimensions"`
	Parameters map[string]interface{} `json:"parameters,omitempty"`
}

ModelConfig contains configuration for embedding models - TEMPORARY for legacy code cleanup

type ModelFilter

type ModelFilter struct {
	Provider  *string `json:"provider,omitempty"`
	ModelType *string `json:"model_type,omitempty"`
	IsActive  *bool   `json:"is_active,omitempty"`
}

ModelFilter for querying available models

type ModelInfo

type ModelInfo struct {
	ModelID    string
	Dimensions int
	MaxTokens  int
	CostPer1M  float64
	Notes      string
}

ModelInfo contains model metadata

type ModelType

type ModelType string

ModelType represents the type of embedding model - TEMPORARY for legacy code cleanup

const (
	ModelTypeOpenAI      ModelType = "openai"
	ModelTypeHuggingFace ModelType = "huggingface"
	ModelTypeBedrock     ModelType = "bedrock"
	ModelTypeAnthropic   ModelType = "anthropic"
	ModelTypeCustom      ModelType = "custom"
)

type OpenAIEmbeddingData

type OpenAIEmbeddingData struct {
	Embedding []float32 `json:"embedding"`
	Index     int       `json:"index"`
}

OpenAIEmbeddingData represents embedding data in an OpenAI API response

type OpenAIEmbeddingRequest

type OpenAIEmbeddingRequest struct {
	Model string   `json:"model"`
	Input []string `json:"input"`
}

OpenAIEmbeddingRequest represents a request to the OpenAI embeddings API

type OpenAIEmbeddingResponse

type OpenAIEmbeddingResponse struct {
	Data  []OpenAIEmbeddingData `json:"data"`
	Model string                `json:"model"`
	Usage OpenAIUsage           `json:"usage"`
}

OpenAIEmbeddingResponse represents a response from the OpenAI embeddings API

type OpenAIEmbeddingService

type OpenAIEmbeddingService struct {
	// contains filtered or unexported fields
}

OpenAIEmbeddingService implements EmbeddingService using OpenAI's API

func NewOpenAIEmbeddingService

func NewOpenAIEmbeddingService(apiKey string, modelName string, dimensions int) (*OpenAIEmbeddingService, error)

NewOpenAIEmbeddingService creates a new OpenAI embedding service

func (*OpenAIEmbeddingService) BatchGenerateEmbeddings

func (s *OpenAIEmbeddingService) BatchGenerateEmbeddings(ctx context.Context, texts []string, contentType string, contentIDs []string) ([]*EmbeddingVector, error)

BatchGenerateEmbeddings creates embeddings for multiple texts

func (*OpenAIEmbeddingService) GenerateEmbedding

func (s *OpenAIEmbeddingService) GenerateEmbedding(ctx context.Context, text string, contentType string, contentID string) (*EmbeddingVector, error)

GenerateEmbedding creates an embedding for a single text

func (*OpenAIEmbeddingService) GetModelConfig

func (s *OpenAIEmbeddingService) GetModelConfig() ModelConfig

GetModelConfig returns the model configuration

func (*OpenAIEmbeddingService) GetModelDimensions

func (s *OpenAIEmbeddingService) GetModelDimensions() int

GetModelDimensions returns the dimensions of the embeddings generated by this model

type OpenAIProvider

type OpenAIProvider struct {
	// contains filtered or unexported fields
}

OpenAIProvider implements the Provider interface for OpenAI embeddings

func NewOpenAIProvider

func NewOpenAIProvider(apiKey string) *OpenAIProvider

NewOpenAIProvider creates a new OpenAI embedding provider

func (*OpenAIProvider) GenerateEmbedding

func (p *OpenAIProvider) GenerateEmbedding(ctx context.Context, content string, model string) ([]float32, error)

GenerateEmbedding generates an embedding using OpenAI API

func (*OpenAIProvider) GetSupportedModels

func (p *OpenAIProvider) GetSupportedModels() []string

GetSupportedModels returns the list of supported OpenAI models

func (*OpenAIProvider) ValidateAPIKey

func (p *OpenAIProvider) ValidateAPIKey() error

ValidateAPIKey validates the OpenAI API key

type OpenAIUsage

type OpenAIUsage struct {
	PromptTokens int `json:"prompt_tokens"`
	TotalTokens  int `json:"total_tokens"`
}

OpenAIUsage represents usage information in an OpenAI API response

type PgVectorStorage

type PgVectorStorage struct {
	// contains filtered or unexported fields
}

PgVectorStorage implements EmbeddingStorage for PostgreSQL with pgvector

func NewPgVectorStorage

func NewPgVectorStorage(db *sql.DB, schema string) (*PgVectorStorage, error)

NewPgVectorStorage creates a new PostgreSQL vector storage

func (*PgVectorStorage) BatchStoreEmbeddings

func (s *PgVectorStorage) BatchStoreEmbeddings(ctx context.Context, embeddings []*EmbeddingVector) error

BatchStoreEmbeddings stores multiple embeddings in a batch

func (*PgVectorStorage) DeleteEmbeddingsByContentIDs

func (s *PgVectorStorage) DeleteEmbeddingsByContentIDs(ctx context.Context, contentIDs []string) error

DeleteEmbeddingsByContentIDs deletes embeddings by content IDs

func (*PgVectorStorage) FindSimilarEmbeddings

func (s *PgVectorStorage) FindSimilarEmbeddings(ctx context.Context, embedding *EmbeddingVector, limit int, threshold float32) ([]*EmbeddingVector, error)

FindSimilarEmbeddings finds embeddings similar to the provided one

func (*PgVectorStorage) GetEmbeddingsByContentIDs

func (s *PgVectorStorage) GetEmbeddingsByContentIDs(ctx context.Context, contentIDs []string) ([]*EmbeddingVector, error)

GetEmbeddingsByContentIDs retrieves embeddings by content IDs

func (*PgVectorStorage) StoreEmbedding

func (s *PgVectorStorage) StoreEmbedding(ctx context.Context, embedding *EmbeddingVector) error

StoreEmbedding stores a single embedding

type ProjectionMatrix

type ProjectionMatrix struct {
	ID             int       `json:"id" db:"id"`
	FromDimensions int       `json:"from_dimensions" db:"from_dimensions"`
	ToDimensions   int       `json:"to_dimensions" db:"to_dimensions"`
	FromProvider   string    `json:"from_provider" db:"from_provider"`
	FromModel      string    `json:"from_model" db:"from_model"`
	Matrix         []float32 `json:"matrix" db:"matrix"`
	QualityScore   float64   `json:"quality_score" db:"quality_score"`
	IsActive       bool      `json:"is_active" db:"is_active"`
}

ProjectionMatrix represents a dimension projection matrix

type Provider

type Provider interface {
	GenerateEmbedding(ctx context.Context, content string, model string) ([]float32, error)
	GetSupportedModels() []string
	ValidateAPIKey() error
}

Provider interface for embedding providers - TEMPORARY for legacy code cleanup

type ProviderCandidate

type ProviderCandidate struct {
	Provider string
	Model    string
	Score    float64
	Reasons  []string
}

ProviderCandidate represents a provider/model candidate

type ProviderCapability

type ProviderCapability struct {
	SupportsEmbeddings bool
	EmbeddingModels    []ModelInfo
	DefaultModel       string
}

ProviderCapability describes what a provider can do

type ProviderConfig

type ProviderConfig struct {
	// OpenAI configuration
	OpenAIAPIKey string

	// AWS/Bedrock configuration
	AWSRegion string

	// Google configuration
	GoogleProjectID string
	GoogleLocation  string
	GoogleAPIKey    string

	// Voyage AI configuration
	VoyageAPIKey string
}

ProviderConfig contains configuration for creating providers

func NewProviderConfigFromEnv

func NewProviderConfigFromEnv() *ProviderConfig

NewProviderConfigFromEnv creates provider config from environment variables

type ProviderHealth

type ProviderHealth struct {
	Name                string `json:"name"`
	Status              string `json:"status"`
	Error               string `json:"error,omitempty"`
	CircuitBreakerState string `json:"circuit_breaker_state,omitempty"`
	FailureCount        int    `json:"failure_count,omitempty"`
}

type ProviderLoad

type ProviderLoad struct {
	CurrentRequests int
	AvgLatency      time.Duration
	LastUpdated     time.Time
}

type QualityConfig

type QualityConfig struct {
	MinQualityScore float64
}

type QualityScore

type QualityScore struct {
	SuccessCount int
	FailureCount int
	LastUpdated  time.Time
}

type QualityTracker

type QualityTracker struct {
	// contains filtered or unexported fields
}

QualityTracker tracks provider quality

func NewQualityTracker

func NewQualityTracker(config QualityConfig) *QualityTracker

func (*QualityTracker) GetScore

func (qt *QualityTracker) GetScore(provider, model string) float64

func (*QualityTracker) RecordFailure

func (qt *QualityTracker) RecordFailure(provider string)

func (*QualityTracker) RecordSuccess

func (qt *QualityTracker) RecordSuccess(provider string)

type RelationshipContextEnricher

type RelationshipContextEnricher struct {
	// contains filtered or unexported fields
}

RelationshipContextEnricher enhances embedding vectors with relationship context

func NewRelationshipContextEnricher

func NewRelationshipContextEnricher(service relationship.Service) *RelationshipContextEnricher

NewRelationshipContextEnricher creates a new enricher for enhancing embeddings with relationship context

func (*RelationshipContextEnricher) EnrichEmbeddingMetadata

func (e *RelationshipContextEnricher) EnrichEmbeddingMetadata(
	ctx context.Context,
	contentType string,
	contentID string,
	owner string,
	repo string,
	metadata map[string]interface{},
) (map[string]interface{}, error)

EnrichEmbeddingMetadata adds relationship context to embedding metadata

func (*RelationshipContextEnricher) EnrichEmbeddingText

func (e *RelationshipContextEnricher) EnrichEmbeddingText(
	ctx context.Context,
	contentType string,
	contentID string,
	owner string,
	repo string,
	originalText string,
) (string, error)

EnrichEmbeddingText adds relationship context to the text for embedding

func (*RelationshipContextEnricher) WithContextDepth

func (e *RelationshipContextEnricher) WithContextDepth(depth int) *RelationshipContextEnricher

WithContextDepth sets the depth of relationships to include (1=direct, 2+=indirect)

func (*RelationshipContextEnricher) WithDirection

func (e *RelationshipContextEnricher) WithDirection(direction string) *RelationshipContextEnricher

WithDirection sets the relationship direction to include

func (*RelationshipContextEnricher) WithMaxRelationships

func (e *RelationshipContextEnricher) WithMaxRelationships(max int) *RelationshipContextEnricher

WithMaxRelationships sets the maximum number of relationships to include in context

type Repository

type Repository struct {
	// contains filtered or unexported fields
}

func NewRepository

func NewRepository(db *sql.DB) *Repository

func NewRepositoryWithObservability

func NewRepositoryWithObservability(db *sql.DB, logger observability.Logger, metrics observability.MetricsClient) *Repository

NewRepositoryWithObservability creates a repository with custom observability components

func (*Repository) GetAvailableModels

func (r *Repository) GetAvailableModels(ctx context.Context, filter ModelFilter) ([]Model, error)

GetAvailableModels retrieves available embedding models

func (*Repository) GetEmbeddingsByContext

func (r *Repository) GetEmbeddingsByContext(ctx context.Context, contextID, tenantID uuid.UUID) ([]Embedding, error)

GetEmbeddingsByContext retrieves all embeddings for a context

func (*Repository) GetModelByName

func (r *Repository) GetModelByName(ctx context.Context, modelName string) (*Model, error)

GetModelByName retrieves a model by name

func (*Repository) InsertEmbedding

func (r *Repository) InsertEmbedding(ctx context.Context, req InsertRequest) (uuid.UUID, error)

InsertEmbedding inserts a new embedding with automatic padding

func (*Repository) SearchEmbeddings

func (r *Repository) SearchEmbeddings(ctx context.Context, req SearchRequest) ([]EmbeddingSearchResult, error)

SearchEmbeddings performs similarity search with optional metadata filtering

type RouterConfig

type RouterConfig struct {
	CircuitBreakerConfig CircuitBreakerConfig
	LoadBalancerConfig   LoadBalancerConfig
	CostOptimizerConfig  CostOptimizerConfig
	QualityConfig        QualityConfig
}

RouterConfig configures the smart router

func DefaultRouterConfig

func DefaultRouterConfig() *RouterConfig

DefaultRouterConfig returns default router configuration

type RoutingDecision

type RoutingDecision struct {
	Candidates []ProviderCandidate
	Strategy   string
}

RoutingDecision represents the routing decision

type RoutingRequest

type RoutingRequest struct {
	AgentConfig *agents.AgentConfig
	TaskType    agents.TaskType
	RequestID   string
}

RoutingRequest represents a request for routing decision

type SearchFilter

type SearchFilter struct {
	// Field is the metadata field to filter on
	Field string `json:"field"`
	// Value is the value to match
	Value interface{} `json:"value"`
	// Operator is the comparison operator (eq, ne, gt, lt, gte, lte, in, contains)
	Operator string `json:"operator"`
}

SearchFilter defines a filter for metadata fields

type SearchOptions

type SearchOptions struct {
	// ContentTypes filters results to specific content types
	ContentTypes []string `json:"content_types,omitempty"`
	// Filters are metadata filters to apply to the search
	Filters []SearchFilter `json:"filters,omitempty"`
	// Sorts defines the sort order for results
	Sorts []SearchSort `json:"sorts,omitempty"`
	// Limit is the maximum number of results to return
	Limit int `json:"limit"`
	// Offset is the number of results to skip (for pagination)
	Offset int `json:"offset"`
	// MinSimilarity is the minimum similarity score required (0.0 to 1.0)
	MinSimilarity float32 `json:"min_similarity"`
	// WeightFactors defines how to weight different scoring factors
	WeightFactors map[string]float32 `json:"weight_factors,omitempty"`
}

SearchOptions contains options for search queries

type SearchRequest

type SearchRequest struct {
	QueryEmbedding []float32       `json:"query_embedding"`
	ModelName      string          `json:"model_name"`
	TenantID       uuid.UUID       `json:"tenant_id"`
	ContextID      *uuid.UUID      `json:"context_id,omitempty"`
	Limit          int             `json:"limit"`
	Threshold      float64         `json:"threshold"`
	MetadataFilter json.RawMessage `json:"metadata_filter,omitempty"` // JSONB filter
}

SearchRequest represents a similarity search request

type SearchResult

type SearchResult struct {
	// Content is the embedding that matched
	Content *EmbeddingVector `json:"content"`
	// Score is the calculated relevance score (0.0 to 1.0)
	Score float32 `json:"score"`
	// Matches contains information about why this result matched
	Matches map[string]interface{} `json:"matches,omitempty"`
}

SearchResult represents a single search result

type SearchResults

type SearchResults struct {
	// Results is the list of search results
	Results []*SearchResult `json:"results"`
	// Total is the total number of results found (for pagination)
	Total int `json:"total"`
	// HasMore indicates if there are more results available
	HasMore bool `json:"has_more"`
}

SearchResults represents a collection of search results

type SearchService

type SearchService interface {
	// Search performs a vector search with the given text
	Search(ctx context.Context, text string, options *SearchOptions) (*SearchResults, error)

	// SearchByVector performs a vector search with a pre-computed vector
	SearchByVector(ctx context.Context, vector []float32, options *SearchOptions) (*SearchResults, error)

	// SearchByContentID performs a "more like this" search based on an existing content ID
	SearchByContentID(ctx context.Context, contentID string, options *SearchOptions) (*SearchResults, error)
}

SearchService defines the interface for vector search operations

type SearchSort

type SearchSort struct {
	// Field is the field to sort on (can be "similarity" or any metadata field)
	Field string `json:"field"`
	// Direction is the sort direction ("asc" or "desc")
	Direction string `json:"direction"`
}

SearchSort defines a sort order for results

type ServiceV2

type ServiceV2 struct {
	// contains filtered or unexported fields
}

ServiceV2 is the enhanced embedding service with multi-agent support

func NewServiceV2

func NewServiceV2(config ServiceV2Config) (*ServiceV2, error)

NewServiceV2 creates a new enhanced embedding service

func (*ServiceV2) BatchGenerateEmbeddings

func (s *ServiceV2) BatchGenerateEmbeddings(ctx context.Context, reqs []GenerateEmbeddingRequest) ([]*GenerateEmbeddingResponse, error)

BatchGenerateEmbeddings generates embeddings for multiple texts

func (*ServiceV2) GenerateBatch

func (s *ServiceV2) GenerateBatch(ctx context.Context, texts []string, model string) ([][]float32, error)

GenerateBatch generates embeddings for multiple texts with progress tracking

func (*ServiceV2) GenerateEmbedding

GenerateEmbedding generates an embedding for the given request

func (*ServiceV2) GetProviderHealth

func (s *ServiceV2) GetProviderHealth(ctx context.Context) map[string]ProviderHealth

GetProviderHealth returns health status of all providers

func (*ServiceV2) SetProgressCallback

func (s *ServiceV2) SetProgressCallback(fn func(float64))

SetProgressCallback sets the progress callback function

type ServiceV2Config

type ServiceV2Config struct {
	Providers    map[string]providers.Provider
	AgentService AgentService
	Repository   *Repository
	MetricsRepo  MetricsRepository
	Cache        EmbeddingCache
	RouterConfig *RouterConfig
}

ServiceV2Config contains configuration for the service

type SmartRouter

type SmartRouter struct {
	// contains filtered or unexported fields
}

SmartRouter handles intelligent routing between providers

func NewSmartRouter

func NewSmartRouter(config *RouterConfig, providers map[string]providers.Provider) *SmartRouter

NewSmartRouter creates a new smart router

func (*SmartRouter) GetCircuitBreakerStatus

func (r *SmartRouter) GetCircuitBreakerStatus(provider string) *CircuitBreakerStatus

GetCircuitBreakerStatus returns the status of a provider's circuit breaker

func (*SmartRouter) RecordResult

func (r *SmartRouter) RecordResult(provider string, success bool, latency time.Duration)

RecordResult records the result of using a provider

func (*SmartRouter) SelectProvider

func (r *SmartRouter) SelectProvider(ctx context.Context, req *RoutingRequest) (*RoutingDecision, error)

SelectProvider selects the best provider for the request

type UnifiedSearchConfig

type UnifiedSearchConfig struct {
	DB               *sql.DB
	Repository       *Repository
	SearchRepository repositorySearch.Repository
	EmbeddingService EmbeddingService
	DimensionAdapter *DimensionAdapter
	Logger           observability.Logger
	Metrics          observability.MetricsClient
}

UnifiedSearchConfig contains configuration for the unified search service

type UnifiedSearchService

type UnifiedSearchService struct {
	// contains filtered or unexported fields
}

UnifiedSearchService implements the SearchService interface with advanced features

func NewUnifiedSearchService

func NewUnifiedSearchService(config *UnifiedSearchConfig) (*UnifiedSearchService, error)

NewUnifiedSearchService creates a new unified search service

func (*UnifiedSearchService) CrossModelSearch

CrossModelSearch performs search across embeddings from different models

func (*UnifiedSearchService) HybridSearch

HybridSearch performs hybrid search combining semantic and keyword search

func (*UnifiedSearchService) Search

func (s *UnifiedSearchService) Search(ctx context.Context, text string, options *SearchOptions) (*SearchResults, error)

Search performs a vector search with the given text

func (*UnifiedSearchService) SearchByContentID

func (s *UnifiedSearchService) SearchByContentID(ctx context.Context, contentID string, options *SearchOptions) (*SearchResults, error)

SearchByContentID performs a "more like this" search based on an existing content ID

func (*UnifiedSearchService) SearchByVector

func (s *UnifiedSearchService) SearchByVector(ctx context.Context, vector []float32, options *SearchOptions) (*SearchResults, error)

SearchByVector performs a vector search with a pre-computed vector

type VoyageProvider

type VoyageProvider struct {
	// contains filtered or unexported fields
}

VoyageProvider implements the Provider interface for Voyage AI embeddings

func NewVoyageProvider

func NewVoyageProvider(apiKey string) *VoyageProvider

NewVoyageProvider creates a new Voyage AI embedding provider

func (*VoyageProvider) GenerateEmbedding

func (p *VoyageProvider) GenerateEmbedding(ctx context.Context, content string, model string) ([]float32, error)

GenerateEmbedding generates an embedding using Voyage AI API

func (*VoyageProvider) GetSupportedModels

func (p *VoyageProvider) GetSupportedModels() []string

GetSupportedModels returns the list of supported Voyage AI models

func (*VoyageProvider) ValidateAPIKey

func (p *VoyageProvider) ValidateAPIKey() error

ValidateAPIKey validates the Voyage AI API key

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL