embedding

package
v0.0.12 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2025 License: MIT Imports: 8 Imported by: 0

README

Enhanced Embedding Package

This package provides advanced embedding generation and manipulation capabilities for the Agent SDK. It includes features for configuring embedding models, batch processing, similarity calculations, and metadata filtering.

Features

  • Configurable Embedding Generation: Fine-tune embedding parameters such as dimensions, encoding format, and truncation behavior.
  • Batch Processing: Generate embeddings for multiple texts in a single API call.
  • Similarity Calculations: Calculate similarity between embeddings using different metrics (cosine, euclidean, dot product).
  • Advanced Metadata Filtering: Create complex filter conditions for precise document retrieval.

Usage

Basic Embedding Generation
// Create an embedder with default configuration
embedder := embedding.NewOpenAIEmbedder(apiKey, "text-embedding-3-small")

// Generate an embedding
vector, err := embedder.Embed(ctx, "Your text here")
if err != nil {
    // Handle error
}
Custom Configuration
// Create a custom configuration
config := embedding.DefaultEmbeddingConfig()
config.Model = "text-embedding-3-large"
config.Dimensions = 1536
config.SimilarityMetric = "cosine"

// Create an embedder with custom configuration
embedder := embedding.NewOpenAIEmbedderWithConfig(apiKey, config)

// Generate an embedding with custom configuration
vector, err := embedder.EmbedWithConfig(ctx, "Your text here", config)
if err != nil {
    // Handle error
}
Batch Processing
// Generate embeddings for multiple texts
texts := []string{
    "First text",
    "Second text",
    "Third text",
}

vectors, err := embedder.EmbedBatch(ctx, texts)
if err != nil {
    // Handle error
}
Similarity Calculation
// Calculate similarity between two vectors
similarity, err := embedder.CalculateSimilarity(vector1, vector2, "cosine")
if err != nil {
    // Handle error
}

Metadata Filtering

The package includes powerful metadata filtering capabilities for precise document retrieval.

Simple Filters
// Create a simple filter
filter := embedding.NewMetadataFilter("category", "=", "science")

// Create a filter group
filterGroup := embedding.NewMetadataFilterGroup("and", filter)
Complex Filters
// Create a complex filter group
filterGroup := embedding.NewMetadataFilterGroup("and",
    embedding.NewMetadataFilter("category", "=", "science"),
    embedding.NewMetadataFilter("published_date", ">", "2023-01-01"),
)

// Add another filter
filterGroup.AddFilter(embedding.NewMetadataFilter("author", "=", "John Doe"))

// Create a sub-group with OR logic
subGroup := embedding.NewMetadataFilterGroup("or",
    embedding.NewMetadataFilter("tags", "contains", "physics"),
    embedding.NewMetadataFilter("tags", "contains", "chemistry"),
)

// Add the sub-group to the main group
filterGroup.AddSubGroup(subGroup)
Using Filters with Vector Store
// Convert filter group to map for vector store
filterMap := embedding.FilterToMap(filterGroup)

// Use with vector store search
results, err := store.Search(ctx, "query", 10, 
    interfaces.WithEmbedding(true),
    interfaces.WithFilters(filterMap),
)
Filtering Documents in Memory
// Filter documents in memory
filteredDocs := embedding.ApplyFilters(documents, filterGroup)

Supported Operators

Comparison Operators
  • =, ==, eq: Equal
  • !=, <>, ne: Not equal
  • >, gt: Greater than
  • >=, gte: Greater than or equal
  • <, lt: Less than
  • <=, lte: Less than or equal
  • contains: String contains
  • in: Value in collection
  • not_in: Value not in collection
Logical Operators
  • and: All conditions must be true
  • or: At least one condition must be true

Configuration Options

Embedding Models
  • text-embedding-3-small: Smaller, faster model (1536 dimensions by default)
  • text-embedding-3-large: Larger, more accurate model (3072 dimensions by default)
  • text-embedding-ada-002: Legacy model (1536 dimensions)
Dimensions

Specify the dimensionality of the embedding vectors. Only supported by some models.

Encoding Format
  • float: Standard floating-point format
  • base64: Base64-encoded format for more compact storage
Truncation
  • none: Error on token limit overflow
  • truncate: Truncate text to fit within token limit
Similarity Metrics
  • cosine: Cosine similarity (default)
  • euclidean: Euclidean distance (converted to similarity score)
  • dot_product: Dot product

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultTimeFormats = []string{
	time.RFC3339,
	"2006-01-02T15:04:05",
	"2006-01-02 15:04:05",
	"2006-01-02",
}

DefaultTimeFormats provides a list of common time formats for parsing

Functions

func ApplyFilters

func ApplyFilters(docs []interfaces.Document, filterGroup MetadataFilterGroup) []interfaces.Document

ApplyFilters filters a list of documents based on metadata filters

func CreateWeaviateAndFilter

func CreateWeaviateAndFilter(conditions ...map[string]interface{}) map[string]interface{}

CreateWeaviateAndFilter creates a Weaviate filter with AND logic for multiple conditions. This is a convenience function for common filter cases.

func CreateWeaviateFilter

func CreateWeaviateFilter(field string, operator string, value interface{}) map[string]interface{}

CreateWeaviateFilter creates a simple Weaviate filter for a single field. This is a convenience function for simple filter cases. For complex filters, use MetadataFilterGroup and FilterToWeaviateFormat.

func CreateWeaviateOrFilter

func CreateWeaviateOrFilter(conditions ...map[string]interface{}) map[string]interface{}

CreateWeaviateOrFilter creates a Weaviate filter with OR logic for multiple conditions. This is a convenience function for common filter cases.

func FilterToMap

func FilterToMap(group MetadataFilterGroup) map[string]interface{}

FilterToMap converts a MetadataFilterGroup to a map for use with vector store filters Deprecated: This function produces a format that may not be compatible with all vector stores. For Weaviate, use FilterToWeaviateFormat instead.

func FilterToWeaviateFormat

func FilterToWeaviateFormat(group MetadataFilterGroup) map[string]interface{}

FilterToWeaviateFormat converts a MetadataFilterGroup to a Weaviate-compatible filter format. This is the recommended function to use for Weaviate vector store filters.

Types

type Client

type Client interface {
	// Embed generates an embedding for the given text
	Embed(ctx context.Context, text string) ([]float32, error)

	// EmbedWithConfig generates an embedding with custom configuration
	EmbedWithConfig(ctx context.Context, text string, config EmbeddingConfig) ([]float32, error)

	// EmbedBatch generates embeddings for multiple texts
	EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

	// EmbedBatchWithConfig generates embeddings for multiple texts with custom configuration
	EmbedBatchWithConfig(ctx context.Context, texts []string, config EmbeddingConfig) ([][]float32, error)

	// CalculateSimilarity calculates the similarity between two embeddings
	CalculateSimilarity(vec1, vec2 []float32, metric string) (float32, error)
}

Client defines the interface for an embedding client

type EmbeddingConfig

type EmbeddingConfig struct {
	// Model is the embedding model to use
	Model string

	// Dimensions specifies the dimensionality of the embedding vectors
	// Only supported by some models (e.g., text-embedding-3-*)
	Dimensions int

	// EncodingFormat specifies the format of the embedding vectors
	// Options: "float", "base64"
	EncodingFormat string

	// Truncation controls how the input text is handled if it exceeds the model's token limit
	// Options: "none" (error on overflow), "truncate" (truncate to limit)
	Truncation string

	// SimilarityMetric specifies the similarity metric to use when comparing embeddings
	// Options: "cosine" (default), "euclidean", "dot_product"
	SimilarityMetric string

	// SimilarityThreshold specifies the minimum similarity score for search results
	SimilarityThreshold float32

	// UserID is an optional identifier for tracking embedding usage
	UserID string
}

EmbeddingConfig contains configuration options for embedding generation

func DefaultEmbeddingConfig

func DefaultEmbeddingConfig(model string) EmbeddingConfig

DefaultEmbeddingConfig returns a default configuration for embedding generation

type MetadataFilter

type MetadataFilter struct {
	// Field is the metadata field to filter on
	Field string

	// Operator is the comparison operator
	// Supported operators: "=", "!=", ">", ">=", "<", "<=", "contains", "in", "not_in"
	Operator string

	// Value is the value to compare against
	Value interface{}
}

MetadataFilter represents a filter condition for document metadata. It defines a single condition to be applied on a specific field. Example: field="word_count", operator=">", value=10

func NewMetadataFilter

func NewMetadataFilter(field, operator string, value interface{}) MetadataFilter

NewMetadataFilter creates a new metadata filter

type MetadataFilterGroup

type MetadataFilterGroup struct {
	// Filters is the list of filters in this group
	Filters []MetadataFilter

	// SubGroups is the list of sub-groups in this group
	SubGroups []MetadataFilterGroup

	// Operator is the logical operator to apply between filters
	// Supported operators: "and", "or"
	Operator string
}

MetadataFilterGroup represents a group of filters with a logical operator. It allows for complex nested conditions with AND/OR logic. Example: (word_count > 10 AND type = "article") OR (category IN ["news", "blog"])

func NewMetadataFilterGroup

func NewMetadataFilterGroup(operator string, filters ...MetadataFilter) MetadataFilterGroup

NewMetadataFilterGroup creates a new metadata filter group

func (*MetadataFilterGroup) AddFilter

func (g *MetadataFilterGroup) AddFilter(filter MetadataFilter)

AddFilter adds a filter to the group

func (*MetadataFilterGroup) AddSubGroup

func (g *MetadataFilterGroup) AddSubGroup(subGroup MetadataFilterGroup)

AddSubGroup adds a sub-group to the group

type OpenAIEmbedder

type OpenAIEmbedder struct {
	// contains filtered or unexported fields
}

OpenAIEmbedder implements embedding generation using OpenAI API

func NewOpenAIEmbedder

func NewOpenAIEmbedder(apiKey, model string) *OpenAIEmbedder

NewOpenAIEmbedder creates a new OpenAIEmbedder instance with default configuration

func NewOpenAIEmbedderWithConfig

func NewOpenAIEmbedderWithConfig(apiKey string, config EmbeddingConfig) *OpenAIEmbedder

NewOpenAIEmbedderWithConfig creates a new OpenAIEmbedder with custom configuration

func (*OpenAIEmbedder) CalculateSimilarity

func (e *OpenAIEmbedder) CalculateSimilarity(vec1, vec2 []float32, metric string) (float32, error)

CalculateSimilarity calculates the similarity between two embeddings

func (*OpenAIEmbedder) Embed

func (e *OpenAIEmbedder) Embed(ctx context.Context, text string) ([]float32, error)

Embed generates an embedding using OpenAI API with default configuration

func (*OpenAIEmbedder) EmbedBatch

func (e *OpenAIEmbedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

EmbedBatch generates embeddings for multiple texts using default configuration

func (*OpenAIEmbedder) EmbedBatchWithConfig

func (e *OpenAIEmbedder) EmbedBatchWithConfig(ctx context.Context, texts []string, config EmbeddingConfig) ([][]float32, error)

EmbedBatchWithConfig generates embeddings for multiple texts with custom configuration

func (*OpenAIEmbedder) EmbedWithConfig

func (e *OpenAIEmbedder) EmbedWithConfig(ctx context.Context, text string, config EmbeddingConfig) ([]float32, error)

EmbedWithConfig generates an embedding using OpenAI API with custom configuration

func (*OpenAIEmbedder) GetConfig

func (e *OpenAIEmbedder) GetConfig() EmbeddingConfig

GetConfig returns the current embedding configuration

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL