embedder

package

v0.0.0-...-16efc32 Latest Latest Go to latest Published: Jan 28, 2026 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/pprgva/code-memory

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
func DefaultModelDir() string
func DownloadModel(venvDir, modelPath, modelName string) error
func EstimateTokens(text string) int
func InstallDeps(venvDir string) error
func MapResultsToFiles(batches []Batch, results []BatchResult, numFiles int) [][][]float32
func ModelExists(modelPath string) bool
func SetupVenv(venvDir, pythonPath string) error
func SocketPath() string
func VenvExists(venvDir string) bool
func VenvPythonPath(venvDir string) string
type Batch
- func FormBatches(files []FileChunks) []Batch
- func (b *Batch) Contents() []string
- func (b *Batch) Size() int
type BatchEmbedder
type BatchEntry
type BatchProgress
type BatchResult
type E5Embedder
- func NewE5Embedder(modelPath, venvDir string) (*E5Embedder, error)
- func (e *E5Embedder) Close() error
- func (e *E5Embedder) Dimensions() int
- func (e *E5Embedder) Embed(_ context.Context, text string) ([]float32, error)
- func (e *E5Embedder) EmbedBatch(_ context.Context, texts []string) ([][]float32, error)
- func (e *E5Embedder) EmbedBatches(ctx context.Context, batches []Batch, progress BatchProgress) ([]BatchResult, error)
type Embedder
type FileChunks
type SocketEmbedder
- func NewSocketEmbedder() (*SocketEmbedder, error)
- func (e *SocketEmbedder) Close() error
- func (e *SocketEmbedder) Dimensions() int
- func (e *SocketEmbedder) Embed(_ context.Context, text string) ([]float32, error)
- func (e *SocketEmbedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
type SocketServer
- func NewSocketServer(emb Embedder) (*SocketServer, error)
- func (s *SocketServer) Close()
- func (s *SocketServer) Serve(ctx context.Context)

Constants ¶

View Source

const DefaultModelName = "intfloat/multilingual-e5-large"

DefaultModelName est le modèle HuggingFace téléchargé par défaut.

View Source

const MaxBatchSize = 64

MaxBatchSize is the maximum number of inputs per local E5 model embedding call. Limited to 64 to fit within local GPU memory constraints.

View Source

const MaxBatchTokens = 32000

MaxBatchTokens is the maximum total tokens per local E5 model embedding batch. Local E5 model has max_length=512. We use 32,000 as a safe total budget.

Variables ¶

This section is empty.

Functions ¶

func DefaultModelDir ¶

func DefaultModelDir() string

DefaultModelDir retourne le chemin global du modèle : ~/.grepai/models/multilingual-e5-large

func DownloadModel ¶

func DownloadModel(venvDir, modelPath, modelName string) error

DownloadModel télécharge le modèle depuis HuggingFace via le venv Python.

func EstimateTokens ¶

func EstimateTokens(text string) int

EstimateTokens estimates the token count for a text string. Uses a conservative estimate of ~4 characters per token for English text. This is intentionally conservative to avoid hitting API limits.

func InstallDeps ¶

func InstallDeps(venvDir string) error

InstallDeps installe torch et transformers dans le venv.

func MapResultsToFiles ¶

func MapResultsToFiles(batches []Batch, results []BatchResult, numFiles int) [][][]float32

MapResultsToFiles maps batch results back to per-file embeddings. Returns a slice where each index corresponds to a file, containing embeddings for that file's chunks.

func ModelExists ¶

func ModelExists(modelPath string) bool

ModelExists vérifie si le modèle est déjà téléchargé.

func SetupVenv ¶

func SetupVenv(venvDir, pythonPath string) error

SetupVenv crée un venv Python dans le répertoire spécifié.

func SocketPath ¶

func SocketPath() string

SocketPath retourne le chemin du socket Unix pour le daemon embed.

func VenvExists ¶

func VenvExists(venvDir string) bool

VenvExists vérifie si le venv Python existe déjà.

func VenvPythonPath ¶

func VenvPythonPath(venvDir string) string

VenvPythonPath retourne le chemin vers le binaire python3 du venv.

Types ¶

type Batch ¶

type Batch struct {
	// Entries contains chunks with source file tracking
	Entries []BatchEntry
	// Index is the batch number for progress reporting (0-indexed)
	Index int
}

Batch represents a collection of chunks to be embedded in a single API call.

func FormBatches ¶

func FormBatches(files []FileChunks) []Batch

FormBatches splits chunks from multiple files into batches respecting both MaxBatchSize (input count) and MaxBatchTokens (token limit). Chunks maintain their file/chunk index tracking for result mapping.

func (*Batch) Contents ¶

func (b *Batch) Contents() []string

Contents returns the text contents of all entries for embedding.

func (*Batch) Size ¶

func (b *Batch) Size() int

Size returns the number of entries in the batch.

type BatchEmbedder ¶

type BatchEmbedder interface {
	Embedder

	// EmbedBatches processes multiple batches of chunks concurrently.
	// It returns results mapped back to their source files, or an error if any batch fails.
	// The progress callback is called for each batch completion or retry attempt.
	EmbedBatches(ctx context.Context, batches []Batch, progress BatchProgress) ([]BatchResult, error)
}

BatchEmbedder extends Embedder with cross-file batch embedding capabilities. Providers that support advanced batching (like OpenAI) implement this interface to enable parallel processing of multiple batches.

type BatchEntry ¶

type BatchEntry struct {
	// FileIndex is the index of the source file in the files slice
	FileIndex int
	// ChunkIndex is the index of the chunk within the file's chunks
	ChunkIndex int
	// Content is the text content to embed
	Content string
}

BatchEntry represents a single chunk with metadata for tracking its source.

type BatchProgress ¶

type BatchProgress func(batchIndex, totalBatches, completedChunks, totalChunks int, retrying bool, attempt int, statusCode int)

BatchProgress is a callback for reporting batch embedding progress. It receives the batch index, total batches, chunk progress info, and optional retry information. completedChunks and totalChunks track overall progress across all batches. statusCode is the HTTP status code when retrying (429 = rate limited, 5xx = server error).

type BatchResult ¶

type BatchResult struct {
	// BatchIndex is the index of the batch this result belongs to
	BatchIndex int
	// Embeddings contains the embedding vectors in the same order as batch entries
	Embeddings [][]float32
}

BatchResult contains the embeddings for a batch with file/chunk index mapping.

type E5Embedder ¶

type E5Embedder struct {
	// contains filtered or unexported fields
}

E5Embedder runs a Python STDIO worker for local E5 embeddings.

func NewE5Embedder ¶

func NewE5Embedder(modelPath, venvDir string) (*E5Embedder, error)

NewE5Embedder spawns the Python E5 worker process. venvDir est le répertoire du venv Python (.grepai/venv).

func (*E5Embedder) Close ¶

func (e *E5Embedder) Close() error

Close shuts down the Python worker process.

func (*E5Embedder) Dimensions ¶

func (e *E5Embedder) Dimensions() int

Dimensions returns the embedding vector size.

func (*E5Embedder) Embed ¶

func (e *E5Embedder) Embed(_ context.Context, text string) ([]float32, error)

Embed converts a single text into a vector embedding (query prefix).

func (*E5Embedder) EmbedBatch ¶

func (e *E5Embedder) EmbedBatch(_ context.Context, texts []string) ([][]float32, error)

EmbedBatch converts multiple texts into vector embeddings (passage prefix).

func (*E5Embedder) EmbedBatches ¶

func (e *E5Embedder) EmbedBatches(ctx context.Context, batches []Batch, progress BatchProgress) ([]BatchResult, error)

EmbedBatches processes multiple batches sequentially (local GPU).

type Embedder ¶

type Embedder interface {
	// Embed converts text into a vector embedding
	Embed(ctx context.Context, text string) ([]float32, error)

	// EmbedBatch converts multiple texts into vector embeddings
	EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

	// Dimensions returns the vector dimension size for this embedder
	Dimensions() int

	// Close cleanly shuts down the embedder
	Close() error
}

Embedder defines the interface for text embedding providers

type FileChunks ¶

type FileChunks struct {
	// FileIndex is the index of this file in the original files slice
	FileIndex int
	// Chunks is the list of text chunks from this file
	Chunks []string
}

FileChunks represents chunks from a single file for batch formation.

type SocketEmbedder ¶

type SocketEmbedder struct {
	// contains filtered or unexported fields
}

SocketEmbedder est un client qui se connecte au daemon via Unix socket. Implémente l'interface Embedder.

func NewSocketEmbedder ¶

func NewSocketEmbedder() (*SocketEmbedder, error)

NewSocketEmbedder se connecte au daemon socket. Retourne une erreur si le daemon n'est pas actif.

func (*SocketEmbedder) Close ¶

func (e *SocketEmbedder) Close() error

func (*SocketEmbedder) Dimensions ¶

func (e *SocketEmbedder) Dimensions() int

func (*SocketEmbedder) Embed ¶

func (e *SocketEmbedder) Embed(_ context.Context, text string) ([]float32, error)

func (*SocketEmbedder) EmbedBatch ¶

func (e *SocketEmbedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

type SocketServer ¶

type SocketServer struct {
	// contains filtered or unexported fields
}

SocketServer écoute sur un Unix socket et sert les requêtes d'embedding.

func NewSocketServer ¶

func NewSocketServer(emb Embedder) (*SocketServer, error)

NewSocketServer crée un serveur socket qui délègue à l'embedder fourni.

func (*SocketServer) Close ¶

func (s *SocketServer) Close()

Close arrête le serveur et nettoie le socket.

func (*SocketServer) Serve ¶

func (s *SocketServer) Serve(ctx context.Context)

Serve accepte les connexions et traite les requêtes. Bloquant.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL