Documentation
¶
Overview ¶
Package downloader provides functionality for downloading embedding models from remote sources. It supports progress tracking, caching, and multi-file downloads.
Package embedding provides interfaces and implementations for text embedding generation. It supports both local and remote embedding models with batch processing and caching capabilities.
Package models provides pre-configured providers for popular embedding models. It includes automatic model type detection and factory functions for easy model loading.
Package local provides implementations for local embedding model providers. It supports various model formats and includes a tokenizer for text preprocessing.
Package local provides implementations for local embedding model providers. It supports various model formats and includes a tokenizer for text preprocessing.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BGEProvider ¶ added in v0.1.2
type BGEProvider struct {
// contains filtered or unexported fields
}
BGEProvider creates a provider for BGE models
func NewBGEProvider ¶ added in v0.1.2
func NewBGEProvider(modelPath string) (*BGEProvider, error)
NewBGEProvider creates a new BGE provider
func (*BGEProvider) Dimension ¶ added in v0.1.2
func (p *BGEProvider) Dimension() int
Dimension returns the embedding dimension
type BatchOptions ¶
type BatchOptions struct {
MaxBatchSize int // Maximum batch size for embedding generation
MaxConcurrent int // Maximum number of concurrent batches
}
BatchOptions contains configuration options for batch processing.
type BatchProcessor ¶
type BatchProcessor struct {
// contains filtered or unexported fields
}
BatchProcessor handles batch processing of embeddings with optimization features including caching, concurrent processing, and progress tracking.
func NewBatchProcessor ¶
func NewBatchProcessor(provider Provider, options BatchOptions) *BatchProcessor
NewBatchProcessor creates a new batch processor with the given provider and options.
Parameters: - provider: The embedding provider to use for generating embeddings - options: Configuration options for batch processing
Returns: - *BatchProcessor: A new batch processor instance
func (*BatchProcessor) ProcessWithProgress ¶
func (bp *BatchProcessor) ProcessWithProgress( ctx context.Context, texts []string, callback ProgressCallback, ) ([][]float32, error)
ProcessWithProgress processes embeddings with progress tracking
type Config ¶ added in v0.1.2
type Config struct {
Model EmbeddingModel // Local embedding model implementation
Dimension int // Embedding dimension
MaxBatchSize int // Maximum batch size for embedding generation
}
Config contains configuration parameters for the local embedding provider.
type DownloadModelInfo ¶ added in v0.1.2
type DownloadModelInfo struct {
Name string // Model name
Type string // Model type (e.g., "bge", "sentence-bert")
URLs []string // URLs of files to download
Size string // Approximate total size
Description string // Model description
}
DownloadModelInfo contains information about a model available for download.
type DownloadProgressCallback ¶ added in v0.1.2
DownloadProgressCallback is a callback function for tracking download progress.
Parameters: - modelName: Name of the model being downloaded - fileName: Name of the file being downloaded - downloaded: Number of bytes downloaded so far - total: Total number of bytes to download (0 if unknown)
type Downloader ¶ added in v0.1.2
type Downloader struct {
// contains filtered or unexported fields
}
Downloader handles the downloading and caching of embedding models.
func NewDownloader ¶ added in v0.1.2
func NewDownloader(cacheDir string) *Downloader
NewDownloader creates a new downloader with the specified cache directory.
Parameters: - cacheDir: Directory to cache downloaded models (empty for default)
Returns: - *Downloader: A new downloader instance
func (*Downloader) DownloadModel ¶ added in v0.1.2
func (d *Downloader) DownloadModel(modelName string, callback DownloadProgressCallback) (string, error)
DownloadModel downloads a model by name
func (*Downloader) GetModelInfo ¶ added in v0.1.2
func (d *Downloader) GetModelInfo() []DownloadModelInfo
GetModelInfo returns information about available models
type EmbeddingModel ¶ added in v0.1.2
type EmbeddingModel interface {
// Run performs inference on the given inputs and returns the model outputs.
Run(inputs map[string]interface{}) (map[string]interface{}, error)
// Close releases any resources associated with the model.
Close() error
}
EmbeddingModel defines the interface for local embedding models. Implementations should handle model loading, inference, and resource cleanup.
type LocalProvider ¶ added in v0.1.2
type LocalProvider struct {
// contains filtered or unexported fields
}
LocalProvider implements the embedding.Provider interface for local models. It handles tokenization, batch processing, and model inference.
func New ¶ added in v0.1.2
func New(config Config) (*LocalProvider, error)
New creates a new local embedding provider with the given configuration.
Parameters: - config: Configuration parameters for the provider
Returns: - *LocalProvider: A new local embedding provider instance - error: Error if configuration is invalid or initialization fails
func (*LocalProvider) Dimension ¶ added in v0.1.2
func (p *LocalProvider) Dimension() int
Dimension returns the embedding dimension
type Model ¶ added in v0.1.2
type Model struct {
// contains filtered or unexported fields
}
Model is a generic model implementation for embedding generation
type ModelInfo ¶ added in v0.1.2
ModelInfo contains information about a model including its type, name, and dimensions.
func NewModelInfo ¶ added in v0.1.2
NewModelInfo creates a new ModelInfo instance by analyzing the model file path.
type ModelType ¶ added in v0.1.2
type ModelType string
ModelType defines the type of embedding model.
const ( // ModelTypeBERT represents BERT models ModelTypeBERT ModelType = "bert" // ModelTypeSentenceBERT represents Sentence-BERT models ModelTypeSentenceBERT ModelType = "sentence-bert" // ModelTypeBGE represents BGE models ModelTypeBGE ModelType = "bge" // ModelTypeGPT represents GPT models ModelTypeGPT ModelType = "gpt" // ModelTypeFastText represents FastText models ModelTypeFastText ModelType = "fasttext" // ModelTypeGloVe represents GloVe models ModelTypeGloVe ModelType = "glove" )
type ProgressCallback ¶
ProgressCallback is a callback function for tracking batch processing progress.
Parameters: - current: Number of texts processed so far - total: Total number of texts to process - error: Error if any occurred during processing
Returns: - bool: True to continue processing, false to cancel the operation
type Provider ¶
type Provider interface {
// Embed generates embeddings for the given texts.
//
// Parameters:
// - ctx: Context for cancellation and timeout
// - texts: Slice of texts to embed
//
// Returns:
// - [][]float32: Slice of embeddings, one for each text
// - error: Error if embedding generation fails
Embed(ctx context.Context, texts []string) ([][]float32, error)
// Dimension returns the dimension of the embeddings generated by this provider.
//
// Returns:
// - int: Embedding dimension
Dimension() int
}
Provider defines the interface for embedding providers.
This interface is implemented by all embedding model providers and allows the application to generate embeddings for text.
Example implementation:
type LocalProvider struct {
model *LocalModel
dimension int
}
func (p *LocalProvider) Embed(ctx context.Context, texts []string) ([][]float32, error) {
// Generate embeddings using local model
}
func (p *LocalProvider) Dimension() int {
return p.dimension
}
func NewProvider ¶ added in v0.1.2
NewProvider creates a new provider based on the model path
type SentenceBERTProvider ¶ added in v0.1.2
type SentenceBERTProvider struct {
// contains filtered or unexported fields
}
SentenceBERTProvider creates a provider for Sentence-BERT models
func NewSentenceBERTProvider ¶ added in v0.1.2
func NewSentenceBERTProvider(modelPath string) (*SentenceBERTProvider, error)
NewSentenceBERTProvider creates a new Sentence-BERT provider
func (*SentenceBERTProvider) Dimension ¶ added in v0.1.2
func (p *SentenceBERTProvider) Dimension() int
Dimension returns the embedding dimension
type Tokenizer ¶ added in v0.1.2
type Tokenizer struct {
// contains filtered or unexported fields
}
Tokenizer handles text tokenization for embedding models. It converts raw text into token IDs that can be processed by embedding models.
func NewTokenizer ¶ added in v0.1.2
NewTokenizer creates a new tokenizer instance.
Note: This is a simple whitespace tokenizer for demonstration purposes. In a production implementation, you would use a proper tokenizer like BPE or WordPiece.
Returns: - *Tokenizer: A new tokenizer instance - error: Error if initialization fails