reembedding

package
v0.9.17 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 6, 2026 License: MIT Imports: 7 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CollectionReader

type CollectionReader struct {
	// contains filtered or unexported fields
}

CollectionReader reads documents from a source collection in paginated batches

func NewCollectionReader

func NewCollectionReader(client vectordb.VectorDBClient, collectionName string, batchSize int) (*CollectionReader, error)

NewCollectionReader creates a new CollectionReader for the given collection

func (*CollectionReader) HasMore

func (r *CollectionReader) HasMore() bool

HasMore returns true if there are more documents to read

func (*CollectionReader) Progress

func (r *CollectionReader) Progress() (int, int64)

Progress returns the current progress (documents read, total documents)

func (*CollectionReader) ReadBatch

func (r *CollectionReader) ReadBatch(ctx context.Context) ([]*vectordb.Document, error)

ReadBatch reads the next batch of documents from the collection

func (*CollectionReader) Reset

func (r *CollectionReader) Reset()

Reset resets the reader to the beginning of the collection

type EmbeddingPipeline

type EmbeddingPipeline struct {
	// contains filtered or unexported fields
}

EmbeddingPipeline handles re-generating embeddings for documents

func NewEmbeddingPipeline

func NewEmbeddingPipeline(modelName string) (*EmbeddingPipeline, error)

NewEmbeddingPipeline creates a new embedding pipeline with auto-detected dimensions

func (*EmbeddingPipeline) GetDimensions

func (p *EmbeddingPipeline) GetDimensions() int

GetDimensions returns the vector dimensions for this pipeline

func (*EmbeddingPipeline) GetModelName

func (p *EmbeddingPipeline) GetModelName() string

GetModelName returns the embedding model name

func (*EmbeddingPipeline) GetProvider

func (p *EmbeddingPipeline) GetProvider() string

GetProvider returns the embedding provider

func (*EmbeddingPipeline) IsSupported

func (p *EmbeddingPipeline) IsSupported() bool

IsSupported returns true if the model is supported for re-embedding

func (*EmbeddingPipeline) ProcessBatch

func (p *EmbeddingPipeline) ProcessBatch(ctx context.Context, docs []*vectordb.Document) error

ProcessBatch generates new embeddings for a batch of documents

type ProgressTracker

type ProgressTracker struct {
	// contains filtered or unexported fields
}

ProgressTracker tracks and displays progress for re-embedding operations

func NewProgressTracker

func NewProgressTracker(total int64) *ProgressTracker

NewProgressTracker creates a new progress tracker

func (*ProgressTracker) Display

func (p *ProgressTracker) Display()

Display shows the current progress with a progress bar and ETA

func (*ProgressTracker) GetETA

func (p *ProgressTracker) GetETA() time.Duration

GetETA returns the estimated time remaining

func (*ProgressTracker) GetPercentage

func (p *ProgressTracker) GetPercentage() float64

GetPercentage returns the completion percentage

func (*ProgressTracker) GetSpeed

func (p *ProgressTracker) GetSpeed() float64

GetSpeed returns the current processing speed in documents per minute

func (*ProgressTracker) GetStatusMessage

func (p *ProgressTracker) GetStatusMessage() string

GetStatusMessage returns formatted progress message without printing

func (*ProgressTracker) IsComplete

func (p *ProgressTracker) IsComplete() bool

IsComplete returns true if all documents have been processed

func (*ProgressTracker) Update

func (p *ProgressTracker) Update(count int)

Update increments the progress counter

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL