embedding

package
v0.2.33 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 12, 2026 License: MIT Imports: 12 Imported by: 0

README

Enhanced Embedding Package

This package provides advanced embedding generation and manipulation capabilities for the Agent SDK. It includes features for configuring embedding models, batch processing, similarity calculations, and metadata filtering.

Supported Providers

  • OpenAI: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002
  • Google Gemini/Vertex AI: text-embedding-004, text-embedding-005, text-multilingual-embedding-002

Features

  • Configurable Embedding Generation: Fine-tune embedding parameters such as dimensions, encoding format, and truncation behavior.
  • Batch Processing: Generate embeddings for multiple texts in a single API call.
  • Similarity Calculations: Calculate similarity between embeddings using different metrics (cosine, euclidean, dot product).
  • Advanced Metadata Filtering: Create complex filter conditions for precise document retrieval.
  • Multi-Provider Support: Use OpenAI or Google Gemini/Vertex AI for embedding generation.

Usage

OpenAI Embedding Generation
// Create an embedder with default configuration
embedder := embedding.NewOpenAIEmbedder(apiKey, "text-embedding-3-small")

// Generate an embedding
vector, err := embedder.Embed(ctx, "Your text here")
if err != nil {
    // Handle error
}
Gemini/Vertex AI Embedding Generation
// Create a Gemini embedder with API key (Gemini API)
embedder, err := embedding.NewGeminiEmbedder(ctx,
    embedding.WithGeminiAPIKey(apiKey),
    embedding.WithGeminiModel(embedding.ModelTextEmbedding004),
)
if err != nil {
    // Handle error
}

// Generate an embedding
vector, err := embedder.Embed(ctx, "Your text here")
if err != nil {
    // Handle error
}

// Create a Vertex AI embedder with project credentials
vertexEmbedder, err := embedding.NewGeminiEmbedder(ctx,
    embedding.WithGeminiBackend(genai.BackendVertexAI),
    embedding.WithGeminiProjectID("your-project-id"),
    embedding.WithGeminiLocation("us-central1"),
    embedding.WithGeminiCredentialsFile("/path/to/service-account.json"),
)
if err != nil {
    // Handle error
}

// Use task type for optimized embeddings
queryEmbedder, err := embedding.NewGeminiEmbedder(ctx,
    embedding.WithGeminiAPIKey(apiKey),
    embedding.WithGeminiTaskType("RETRIEVAL_QUERY"),
)
Custom Configuration
// Create a custom configuration
config := embedding.DefaultEmbeddingConfig()
config.Model = "text-embedding-3-large"
config.Dimensions = 1536
config.SimilarityMetric = "cosine"

// Create an embedder with custom configuration
embedder := embedding.NewOpenAIEmbedderWithConfig(apiKey, config)

// Generate an embedding with custom configuration
vector, err := embedder.EmbedWithConfig(ctx, "Your text here", config)
if err != nil {
    // Handle error
}
Batch Processing
// Generate embeddings for multiple texts
texts := []string{
    "First text",
    "Second text",
    "Third text",
}

vectors, err := embedder.EmbedBatch(ctx, texts)
if err != nil {
    // Handle error
}
Similarity Calculation
// Calculate similarity between two vectors
similarity, err := embedder.CalculateSimilarity(vector1, vector2, "cosine")
if err != nil {
    // Handle error
}

Metadata Filtering

The package includes powerful metadata filtering capabilities for precise document retrieval.

Simple Filters
// Create a simple filter
filter := embedding.NewMetadataFilter("category", "=", "science")

// Create a filter group
filterGroup := embedding.NewMetadataFilterGroup("and", filter)
Complex Filters
// Create a complex filter group
filterGroup := embedding.NewMetadataFilterGroup("and",
    embedding.NewMetadataFilter("category", "=", "science"),
    embedding.NewMetadataFilter("published_date", ">", "2023-01-01"),
)

// Add another filter
filterGroup.AddFilter(embedding.NewMetadataFilter("author", "=", "John Doe"))

// Create a sub-group with OR logic
subGroup := embedding.NewMetadataFilterGroup("or",
    embedding.NewMetadataFilter("tags", "contains", "physics"),
    embedding.NewMetadataFilter("tags", "contains", "chemistry"),
)

// Add the sub-group to the main group
filterGroup.AddSubGroup(subGroup)
Using Filters with Vector Store
// Convert filter group to map for vector store
filterMap := embedding.FilterToMap(filterGroup)

// Use with vector store search
results, err := store.Search(ctx, "query", 10,
    interfaces.WithEmbedding(true),
    interfaces.WithFilters(filterMap),
)
Filtering Documents in Memory
// Filter documents in memory
filteredDocs := embedding.ApplyFilters(documents, filterGroup)

Supported Operators

Comparison Operators
  • =, ==, eq: Equal
  • !=, <>, ne: Not equal
  • >, gt: Greater than
  • >=, gte: Greater than or equal
  • <, lt: Less than
  • <=, lte: Less than or equal
  • contains: String contains
  • in: Value in collection
  • not_in: Value not in collection
Logical Operators
  • and: All conditions must be true
  • or: At least one condition must be true

Configuration Options

OpenAI Embedding Models
  • text-embedding-3-small: Smaller, faster model (1536 dimensions by default)
  • text-embedding-3-large: Larger, more accurate model (3072 dimensions by default)
  • text-embedding-ada-002: Legacy model (1536 dimensions)
Gemini/Vertex AI Embedding Models
  • text-embedding-004: Latest text embedding model (768 dimensions)
  • text-embedding-005: Newest text embedding model (768 dimensions)
  • text-multilingual-embedding-002: Multilingual text embedding (768 dimensions)
Gemini Task Types

Use task types to optimize embeddings for specific use cases:

  • RETRIEVAL_QUERY: Optimized for search queries
  • RETRIEVAL_DOCUMENT: Optimized for documents to be searched
  • SEMANTIC_SIMILARITY: For comparing text similarity
  • CLASSIFICATION: For classification tasks
  • CLUSTERING: For clustering documents
  • QUESTION_ANSWERING: For Q&A applications
  • FACT_VERIFICATION: For fact-checking applications
Dimensions

Specify the dimensionality of the embedding vectors. Only supported by some models.

Encoding Format
  • float: Standard floating-point format
  • base64: Base64-encoded format for more compact storage
Truncation
  • none: Error on token limit overflow
  • truncate: Truncate text to fit within token limit
Similarity Metrics
  • cosine: Cosine similarity (default)
  • euclidean: Euclidean distance (converted to similarity score)
  • dot_product: Dot product

Documentation

Index

Constants

View Source
const (
	// ModelTextEmbedding004 is the latest text embedding model (768 dimensions)
	ModelTextEmbedding004 = "text-embedding-004"

	// ModelTextEmbedding005 is the newest text embedding model (768 dimensions)
	ModelTextEmbedding005 = "text-embedding-005"

	// ModelTextMultilingualEmbedding002 is for multilingual text (768 dimensions)
	ModelTextMultilingualEmbedding002 = "text-multilingual-embedding-002"

	// DefaultGeminiEmbeddingModel is the default embedding model
	DefaultGeminiEmbeddingModel = ModelTextEmbedding004
)

Gemini embedding model constants

View Source
const (
	// ModelTextEmbedding3Small is the smaller, faster OpenAI embedding model (1536 dimensions by default)
	ModelTextEmbedding3Small = "text-embedding-3-small"

	// ModelTextEmbedding3Large is the larger, more accurate OpenAI embedding model (3072 dimensions by default)
	ModelTextEmbedding3Large = "text-embedding-3-large"

	// ModelTextEmbeddingAda002 is the legacy OpenAI embedding model (1536 dimensions)
	ModelTextEmbeddingAda002 = "text-embedding-ada-002"

	// DefaultOpenAIEmbeddingModel is the default OpenAI embedding model
	DefaultOpenAIEmbeddingModel = ModelTextEmbedding3Small
)

OpenAI embedding model constants

Variables

View Source
var DefaultTimeFormats = []string{
	time.RFC3339,
	"2006-01-02T15:04:05",
	"2006-01-02 15:04:05",
	"2006-01-02",
}

DefaultTimeFormats provides a list of common time formats for parsing

Functions

func ApplyFilters

func ApplyFilters(docs []interfaces.Document, filterGroup MetadataFilterGroup) []interfaces.Document

ApplyFilters filters a list of documents based on metadata filters

func CalculateSimilarity added in v0.2.18

func CalculateSimilarity(vec1, vec2 []float32, metric string) (float32, error)

CalculateSimilarity is a standalone function to calculate similarity between two embeddings

func CreateWeaviateAndFilter

func CreateWeaviateAndFilter(conditions ...map[string]interface{}) map[string]interface{}

CreateWeaviateAndFilter creates a Weaviate filter with AND logic for multiple conditions. This is a convenience function for common filter cases.

func CreateWeaviateFilter

func CreateWeaviateFilter(field string, operator string, value interface{}) map[string]interface{}

CreateWeaviateFilter creates a simple Weaviate filter for a single field. This is a convenience function for simple filter cases. For complex filters, use MetadataFilterGroup and FilterToWeaviateFormat.

func CreateWeaviateOrFilter

func CreateWeaviateOrFilter(conditions ...map[string]interface{}) map[string]interface{}

CreateWeaviateOrFilter creates a Weaviate filter with OR logic for multiple conditions. This is a convenience function for common filter cases.

func FilterToMap

func FilterToMap(group MetadataFilterGroup) map[string]interface{}

FilterToMap converts a MetadataFilterGroup to a map for use with vector store filters Deprecated: This function produces a format that may not be compatible with all vector stores. For Weaviate, use FilterToWeaviateFormat instead.

func FilterToWeaviateFormat

func FilterToWeaviateFormat(group MetadataFilterGroup) map[string]interface{}

FilterToWeaviateFormat converts a MetadataFilterGroup to a Weaviate-compatible filter format. This is the recommended function to use for Weaviate vector store filters.

Types

type Client

type Client interface {
	// Embed generates an embedding for the given text
	Embed(ctx context.Context, text string) ([]float32, error)

	// EmbedWithConfig generates an embedding with custom configuration
	EmbedWithConfig(ctx context.Context, text string, config EmbeddingConfig) ([]float32, error)

	// EmbedBatch generates embeddings for multiple texts
	EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

	// EmbedBatchWithConfig generates embeddings for multiple texts with custom configuration
	EmbedBatchWithConfig(ctx context.Context, texts []string, config EmbeddingConfig) ([][]float32, error)

	// CalculateSimilarity calculates the similarity between two embeddings
	CalculateSimilarity(vec1, vec2 []float32, metric string) (float32, error)
}

Client defines the interface for an embedding client

type EmbeddingConfig

type EmbeddingConfig struct {
	// Model is the embedding model to use
	Model string

	// Dimensions specifies the dimensionality of the embedding vectors
	// Only supported by some models (e.g., text-embedding-3-*)
	Dimensions int

	// EncodingFormat specifies the format of the embedding vectors
	// Options: "float", "base64"
	EncodingFormat string

	// Truncation controls how the input text is handled if it exceeds the model's token limit
	// Options: "none" (error on overflow), "truncate" (truncate to limit)
	Truncation string

	// SimilarityMetric specifies the similarity metric to use when comparing embeddings
	// Options: "cosine" (default), "euclidean", "dot_product"
	SimilarityMetric string

	// SimilarityThreshold specifies the minimum similarity score for search results
	SimilarityThreshold float32

	// UserID is an optional identifier for tracking embedding usage
	UserID string
}

EmbeddingConfig contains configuration options for embedding generation

func DefaultEmbeddingConfig

func DefaultEmbeddingConfig(model string) EmbeddingConfig

DefaultEmbeddingConfig returns a default configuration for embedding generation

func DefaultGeminiEmbeddingConfig added in v0.2.18

func DefaultGeminiEmbeddingConfig(model string) EmbeddingConfig

DefaultGeminiEmbeddingConfig returns a default configuration for Gemini embedding generation

type GeminiEmbedder added in v0.2.18

type GeminiEmbedder struct {
	// contains filtered or unexported fields
}

GeminiEmbedder implements embedding generation using Google Gemini/Vertex AI API

func NewGeminiEmbedder added in v0.2.18

func NewGeminiEmbedder(ctx context.Context, options ...GeminiEmbedderOption) (*GeminiEmbedder, error)

NewGeminiEmbedder creates a new Gemini embedder with the provided options

func (*GeminiEmbedder) CalculateSimilarity added in v0.2.18

func (e *GeminiEmbedder) CalculateSimilarity(vec1, vec2 []float32, metric string) (float32, error)

CalculateSimilarity calculates the similarity between two embeddings

func (*GeminiEmbedder) Embed added in v0.2.18

func (e *GeminiEmbedder) Embed(ctx context.Context, text string) ([]float32, error)

Embed generates an embedding using Gemini API with default configuration

func (*GeminiEmbedder) EmbedBatch added in v0.2.18

func (e *GeminiEmbedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

EmbedBatch generates embeddings for multiple texts using default configuration

func (*GeminiEmbedder) EmbedBatchWithConfig added in v0.2.18

func (e *GeminiEmbedder) EmbedBatchWithConfig(ctx context.Context, texts []string, config EmbeddingConfig) ([][]float32, error)

EmbedBatchWithConfig generates embeddings for multiple texts with custom configuration

func (*GeminiEmbedder) EmbedWithConfig added in v0.2.18

func (e *GeminiEmbedder) EmbedWithConfig(ctx context.Context, text string, config EmbeddingConfig) ([]float32, error)

EmbedWithConfig generates an embedding using Gemini API with custom configuration

func (*GeminiEmbedder) GetConfig added in v0.2.18

func (e *GeminiEmbedder) GetConfig() EmbeddingConfig

GetConfig returns the current configuration

func (*GeminiEmbedder) GetModel added in v0.2.18

func (e *GeminiEmbedder) GetModel() string

GetModel returns the model name being used

type GeminiEmbedderOption added in v0.2.18

type GeminiEmbedderOption func(*GeminiEmbedder)

GeminiEmbedderOption represents an option for configuring the Gemini embedder

func WithGeminiAPIKey added in v0.2.18

func WithGeminiAPIKey(apiKey string) GeminiEmbedderOption

WithGeminiAPIKey sets the API key for Gemini API backend

func WithGeminiBackend added in v0.2.18

func WithGeminiBackend(backend genai.Backend) GeminiEmbedderOption

WithGeminiBackend sets the backend for the Gemini embedder

func WithGeminiClient added in v0.2.18

func WithGeminiClient(existing *genai.Client) GeminiEmbedderOption

WithGeminiClient injects an already initialized genai.Client

func WithGeminiConfig added in v0.2.18

func WithGeminiConfig(config EmbeddingConfig) GeminiEmbedderOption

WithGeminiConfig sets the embedding configuration for the Gemini embedder

func WithGeminiCredentialsFile added in v0.2.18

func WithGeminiCredentialsFile(credentialsFile string) GeminiEmbedderOption

WithGeminiCredentialsFile sets the path to a service account key file for Vertex AI authentication

func WithGeminiCredentialsJSON added in v0.2.18

func WithGeminiCredentialsJSON(credentialsJSON []byte) GeminiEmbedderOption

WithGeminiCredentialsJSON sets the service account key JSON bytes for Vertex AI authentication

func WithGeminiLocation added in v0.2.18

func WithGeminiLocation(location string) GeminiEmbedderOption

WithGeminiLocation sets the GCP location for Vertex AI backend

func WithGeminiLogger added in v0.2.18

func WithGeminiLogger(logger logging.Logger) GeminiEmbedderOption

WithGeminiLogger sets the logger for the Gemini embedder

func WithGeminiModel added in v0.2.18

func WithGeminiModel(model string) GeminiEmbedderOption

WithGeminiModel sets the embedding model for the Gemini embedder

func WithGeminiProjectID added in v0.2.18

func WithGeminiProjectID(projectID string) GeminiEmbedderOption

WithGeminiProjectID sets the GCP project ID for Vertex AI backend

func WithGeminiTaskType added in v0.2.18

func WithGeminiTaskType(taskType string) GeminiEmbedderOption

WithGeminiTaskType sets the task type for better embedding optimization Valid values: "RETRIEVAL_QUERY", "RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING", "FACT_VERIFICATION"

type MetadataFilter

type MetadataFilter struct {
	// Field is the metadata field to filter on
	Field string

	// Operator is the comparison operator
	// Supported operators: "=", "!=", ">", ">=", "<", "<=", "contains", "in", "not_in"
	Operator string

	// Value is the value to compare against
	Value interface{}
}

MetadataFilter represents a filter condition for document metadata. It defines a single condition to be applied on a specific field. Example: field="word_count", operator=">", value=10

func NewMetadataFilter

func NewMetadataFilter(field, operator string, value interface{}) MetadataFilter

NewMetadataFilter creates a new metadata filter

type MetadataFilterGroup

type MetadataFilterGroup struct {
	// Filters is the list of filters in this group
	Filters []MetadataFilter

	// SubGroups is the list of sub-groups in this group
	SubGroups []MetadataFilterGroup

	// Operator is the logical operator to apply between filters
	// Supported operators: "and", "or"
	Operator string
}

MetadataFilterGroup represents a group of filters with a logical operator. It allows for complex nested conditions with AND/OR logic. Example: (word_count > 10 AND type = "article") OR (category IN ["news", "blog"])

func NewMetadataFilterGroup

func NewMetadataFilterGroup(operator string, filters ...MetadataFilter) MetadataFilterGroup

NewMetadataFilterGroup creates a new metadata filter group

func (*MetadataFilterGroup) AddFilter

func (g *MetadataFilterGroup) AddFilter(filter MetadataFilter)

AddFilter adds a filter to the group

func (*MetadataFilterGroup) AddSubGroup

func (g *MetadataFilterGroup) AddSubGroup(subGroup MetadataFilterGroup)

AddSubGroup adds a sub-group to the group

type OpenAIEmbedder

type OpenAIEmbedder struct {
	// contains filtered or unexported fields
}

OpenAIEmbedder implements embedding generation using OpenAI API

func NewOpenAIEmbedder

func NewOpenAIEmbedder(apiKey, model string) *OpenAIEmbedder

NewOpenAIEmbedder creates a new OpenAIEmbedder instance with default configuration

func NewOpenAIEmbedderWithConfig

func NewOpenAIEmbedderWithConfig(apiKey string, config EmbeddingConfig) *OpenAIEmbedder

NewOpenAIEmbedderWithConfig creates a new OpenAIEmbedder with custom configuration

func (*OpenAIEmbedder) CalculateSimilarity

func (e *OpenAIEmbedder) CalculateSimilarity(vec1, vec2 []float32, metric string) (float32, error)

CalculateSimilarity calculates the similarity between two embeddings

func (*OpenAIEmbedder) Embed

func (e *OpenAIEmbedder) Embed(ctx context.Context, text string) ([]float32, error)

Embed generates an embedding using OpenAI API with default configuration

func (*OpenAIEmbedder) EmbedBatch

func (e *OpenAIEmbedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

EmbedBatch generates embeddings for multiple texts using default configuration

func (*OpenAIEmbedder) EmbedBatchWithConfig

func (e *OpenAIEmbedder) EmbedBatchWithConfig(ctx context.Context, texts []string, config EmbeddingConfig) ([][]float32, error)

EmbedBatchWithConfig generates embeddings for multiple texts with custom configuration

func (*OpenAIEmbedder) EmbedWithConfig

func (e *OpenAIEmbedder) EmbedWithConfig(ctx context.Context, text string, config EmbeddingConfig) ([]float32, error)

EmbedWithConfig generates an embedding using OpenAI API with custom configuration

func (*OpenAIEmbedder) GetConfig

func (e *OpenAIEmbedder) GetConfig() EmbeddingConfig

GetConfig returns the current configuration

func (*OpenAIEmbedder) GetModel added in v0.2.18

func (e *OpenAIEmbedder) GetModel() string

GetModel returns the model name being used

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL