embedding

package
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 13, 2026 License: MIT Imports: 11 Imported by: 0

Documentation

Overview

Package downloader provides functionality for downloading embedding models from remote sources. It supports progress tracking, caching, and multi-file downloads.

Package embedding provides interfaces and implementations for text embedding generation. It supports both local and remote embedding models with batch processing and caching capabilities.

Package models provides pre-configured providers for popular embedding models. It includes automatic model type detection and factory functions for easy model loading.

Package local provides implementations for local embedding model providers. It supports various model formats and includes a tokenizer for text preprocessing.

Package local provides implementations for local embedding model providers. It supports various model formats and includes a tokenizer for text preprocessing.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BGEProvider added in v0.1.2

type BGEProvider struct {
	// contains filtered or unexported fields
}

BGEProvider creates a provider for BGE models

func NewBGEProvider added in v0.1.2

func NewBGEProvider(modelPath string) (*BGEProvider, error)

NewBGEProvider creates a new BGE provider

func (*BGEProvider) Dimension added in v0.1.2

func (p *BGEProvider) Dimension() int

Dimension returns the embedding dimension

func (*BGEProvider) Embed added in v0.1.2

func (p *BGEProvider) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed generates embeddings for the given texts

type BatchOptions

type BatchOptions struct {
	MaxBatchSize  int // Maximum batch size for embedding generation
	MaxConcurrent int // Maximum number of concurrent batches
}

BatchOptions contains configuration options for batch processing.

type BatchProcessor

type BatchProcessor struct {
	// contains filtered or unexported fields
}

BatchProcessor handles batch processing of embeddings with optimization features including caching, concurrent processing, and progress tracking.

func NewBatchProcessor

func NewBatchProcessor(provider Provider, options BatchOptions) *BatchProcessor

NewBatchProcessor creates a new batch processor with the given provider and options.

Parameters: - provider: The embedding provider to use for generating embeddings - options: Configuration options for batch processing

Returns: - *BatchProcessor: A new batch processor instance

func (*BatchProcessor) Process

func (bp *BatchProcessor) Process(ctx context.Context, texts []string) ([][]float32, error)

Process processes embeddings with batch processing optimization

func (*BatchProcessor) ProcessWithProgress

func (bp *BatchProcessor) ProcessWithProgress(
	ctx context.Context,
	texts []string,
	callback ProgressCallback,
) ([][]float32, error)

ProcessWithProgress processes embeddings with progress tracking

type Config added in v0.1.2

type Config struct {
	Model        EmbeddingModel // Local embedding model implementation
	Dimension    int            // Embedding dimension
	MaxBatchSize int            // Maximum batch size for embedding generation
}

Config contains configuration parameters for the local embedding provider.

type DownloadModelInfo added in v0.1.2

type DownloadModelInfo struct {
	Name        string   // Model name
	Type        string   // Model type (e.g., "bge", "sentence-bert")
	URLs        []string // URLs of files to download
	Size        string   // Approximate total size
	Description string   // Model description
}

DownloadModelInfo contains information about a model available for download.

type DownloadProgressCallback added in v0.1.2

type DownloadProgressCallback func(modelName, fileName string, downloaded, total int64)

DownloadProgressCallback is a callback function for tracking download progress.

Parameters: - modelName: Name of the model being downloaded - fileName: Name of the file being downloaded - downloaded: Number of bytes downloaded so far - total: Total number of bytes to download (0 if unknown)

type Downloader added in v0.1.2

type Downloader struct {
	// contains filtered or unexported fields
}

Downloader handles the downloading and caching of embedding models.

func NewDownloader added in v0.1.2

func NewDownloader(cacheDir string) *Downloader

NewDownloader creates a new downloader with the specified cache directory.

Parameters: - cacheDir: Directory to cache downloaded models (empty for default)

Returns: - *Downloader: A new downloader instance

func (*Downloader) DownloadModel added in v0.1.2

func (d *Downloader) DownloadModel(modelName string, callback DownloadProgressCallback) (string, error)

DownloadModel downloads a model by name

func (*Downloader) GetModelInfo added in v0.1.2

func (d *Downloader) GetModelInfo() []DownloadModelInfo

GetModelInfo returns information about available models

type EmbeddingModel added in v0.1.2

type EmbeddingModel interface {
	// Run performs inference on the given inputs and returns the model outputs.
	Run(inputs map[string]interface{}) (map[string]interface{}, error)

	// Close releases any resources associated with the model.
	Close() error
}

EmbeddingModel defines the interface for local embedding models. Implementations should handle model loading, inference, and resource cleanup.

type LocalProvider added in v0.1.2

type LocalProvider struct {
	// contains filtered or unexported fields
}

LocalProvider implements the embedding.Provider interface for local models. It handles tokenization, batch processing, and model inference.

func New added in v0.1.2

func New(config Config) (*LocalProvider, error)

New creates a new local embedding provider with the given configuration.

Parameters: - config: Configuration parameters for the provider

Returns: - *LocalProvider: A new local embedding provider instance - error: Error if configuration is invalid or initialization fails

func (*LocalProvider) Dimension added in v0.1.2

func (p *LocalProvider) Dimension() int

Dimension returns the embedding dimension

func (*LocalProvider) Embed added in v0.1.2

func (p *LocalProvider) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed generates embeddings for the given texts

type Model added in v0.1.2

type Model struct {
	// contains filtered or unexported fields
}

Model is a generic model implementation for embedding generation

func NewModel added in v0.1.2

func NewModel(dimension int, modelPath string) (*Model, error)

NewModel creates a new model instance

func (*Model) Close added in v0.1.2

func (m *Model) Close() error

Close closes the model

func (*Model) Run added in v0.1.2

func (m *Model) Run(inputs map[string]interface{}) (map[string]interface{}, error)

Run runs inference on the given inputs

type ModelInfo added in v0.1.2

type ModelInfo struct {
	Type      ModelType
	Name      string
	Dimension int
	ModelPath string
}

ModelInfo contains information about a model including its type, name, and dimensions.

func NewModelInfo added in v0.1.2

func NewModelInfo(modelPath string) (*ModelInfo, error)

NewModelInfo creates a new ModelInfo instance by analyzing the model file path.

type ModelType added in v0.1.2

type ModelType string

ModelType defines the type of embedding model.

const (
	// ModelTypeBERT represents BERT models
	ModelTypeBERT ModelType = "bert"
	// ModelTypeSentenceBERT represents Sentence-BERT models
	ModelTypeSentenceBERT ModelType = "sentence-bert"
	// ModelTypeBGE represents BGE models
	ModelTypeBGE ModelType = "bge"
	// ModelTypeGPT represents GPT models
	ModelTypeGPT ModelType = "gpt"
	// ModelTypeFastText represents FastText models
	ModelTypeFastText ModelType = "fasttext"
	// ModelTypeGloVe represents GloVe models
	ModelTypeGloVe ModelType = "glove"
)

type ProgressCallback

type ProgressCallback func(current, total int, err error) bool

ProgressCallback is a callback function for tracking batch processing progress.

Parameters: - current: Number of texts processed so far - total: Total number of texts to process - error: Error if any occurred during processing

Returns: - bool: True to continue processing, false to cancel the operation

type Provider

type Provider interface {
	// Embed generates embeddings for the given texts.
	//
	// Parameters:
	// - ctx: Context for cancellation and timeout
	// - texts: Slice of texts to embed
	//
	// Returns:
	// - [][]float32: Slice of embeddings, one for each text
	// - error: Error if embedding generation fails
	Embed(ctx context.Context, texts []string) ([][]float32, error)

	// Dimension returns the dimension of the embeddings generated by this provider.
	//
	// Returns:
	// - int: Embedding dimension
	Dimension() int
}

Provider defines the interface for embedding providers.

This interface is implemented by all embedding model providers and allows the application to generate embeddings for text.

Example implementation:

type LocalProvider struct {
    model  *LocalModel
    dimension int
}

func (p *LocalProvider) Embed(ctx context.Context, texts []string) ([][]float32, error) {
    // Generate embeddings using local model
}

func (p *LocalProvider) Dimension() int {
    return p.dimension
}

func NewProvider added in v0.1.2

func NewProvider(modelPath string) (Provider, error)

NewProvider creates a new provider based on the model path

type SentenceBERTProvider added in v0.1.2

type SentenceBERTProvider struct {
	// contains filtered or unexported fields
}

SentenceBERTProvider creates a provider for Sentence-BERT models

func NewSentenceBERTProvider added in v0.1.2

func NewSentenceBERTProvider(modelPath string) (*SentenceBERTProvider, error)

NewSentenceBERTProvider creates a new Sentence-BERT provider

func (*SentenceBERTProvider) Dimension added in v0.1.2

func (p *SentenceBERTProvider) Dimension() int

Dimension returns the embedding dimension

func (*SentenceBERTProvider) Embed added in v0.1.2

func (p *SentenceBERTProvider) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed generates embeddings for the given texts

type Tokenizer added in v0.1.2

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer handles text tokenization for embedding models. It converts raw text into token IDs that can be processed by embedding models.

func NewTokenizer added in v0.1.2

func NewTokenizer() (*Tokenizer, error)

NewTokenizer creates a new tokenizer instance.

Note: This is a simple whitespace tokenizer for demonstration purposes. In a production implementation, you would use a proper tokenizer like BPE or WordPiece.

Returns: - *Tokenizer: A new tokenizer instance - error: Error if initialization fails

func (*Tokenizer) TokenizeBatch added in v0.1.2

func (t *Tokenizer) TokenizeBatch(texts []string) ([][]int64, [][]int64, error)

TokenizeBatch tokenizes a batch of texts

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL