gobed

package module

v1.0.4 Latest Latest Go to latest Published: Mar 1, 2026 License: MIT Imports: 29 Imported by: 3

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/lee101/gobed

Links

Open Source Insights

README ¶

Gobed - Ultra-Fast Text Search for Go

MIT

Semantic search for Go with efficient int8 embeddings. Gobed provides semantic search using compressed static embeddings. Features automatic GPU detection, int8 quantization for memory efficiency, and 7.9x model compression.

Performance Achievements

6.39s average search time on 243K documents (current)
1.7 queries/sec throughput with parallel processing
Int8 quantization - 7.9x compression, 87.4% space saved
0.151ms embedding latency with 6,629 embeddings/sec
15MB memory usage for full model vs 119MB original

Built on static embeddings with GPU kernel fusion for maximum speed.

Quick Start

CPU Setup

# 1. Install
go get github.com/lee101/gobed

# 2. Download model weights (one-time, 119MB)
git clone https://github.com/lee101/gobed
cd gobed
./setup.sh

# 3. Run!
go run examples/search_demo.go

GPU Setup (for even faster search)

# Prerequisites: CUDA 12.8
./setup_gpu.sh  # Automated GPU setup

# Or manual build:
cd gpu_search/cuda_ops
./build.sh

# Run with GPU
go build -tags="gpu cuda" your_app.go
export LD_LIBRARY_PATH="$PWD/gpu_search:$LD_LIBRARY_PATH"
./your_app

Use It Now

package main

import (
    "fmt"
    "github.com/lee101/gobed"
)

func main() {
    // Load model
    model, _ := gobed.LoadModel()
    
    // Create search engine
    engine := gobed.NewSearchEngine(model)
    
    // Index your documents
    docs := []string{
        "Machine learning transforms data into insights",
        "Deep learning mimics human neural networks",
        "Natural language processing understands text",
    }
    engine.IndexBatch(docs)
    
    // Search - returns results in <1ms
    results, _ := engine.Search("neural networks", 3)
    
    for _, r := range results {
        fmt.Printf("[%.3f] %s\n", r.Similarity, r.Text)
    }
}

Why Gobed?

1ms search latency on datasets that fit in GPU memory
150,000+ embeddings/second on CPU alone
2.5x faster with GPU for large-scale operations
75% less memory with INT8 quantization
Zero dependencies - pure Go with optional CUDA

Real benchmarks on commodity hardware:

Dataset Size	Search Latency	Throughput
1,000 docs	357 μs	2,798 QPS
10,000 docs	1.77 ms	566 QPS
100,000 docs	2.23 ms	448 QPS
1M docs (GPU)	947 ms batch	1,056 QPS

Bed CLI – Semantic Filesystem Search

bed is the command-line front end that applies Gobed embeddings to your local projects. It can index and search using CPU-only mode or take advantage of a CUDA-enabled GPU (via cuVS/CAGRA) for sub-millisecond querying.

Install v1

go install github.com/lee101/gobed/cmd/bed@v1.0.0
# or:
go install github.com/lee101/gobed/bed/cmd/bed@v1.0.0

Quick Start (GPU-accelerated)

# 1. Install CUDA 12.8 and fetch the Gobed model
./setup.sh

# 2. Run bed with GPU support (CAGRA + CUDA)
export LD_LIBRARY_PATH="$(pwd)/gpu:/usr/local/cuda-12.8/lib64:${LD_LIBRARY_PATH}"
bed --gpu "memory leak in handler"  # searches the current directory

Useful sub-commands:

# Index a project (stores the embedding index for faster repeat searches)
bed index /path/to/project

# Run a GPU search against an indexed project
bed --gpu --limit 15 "database connection"

# CPU fallback
bed "keyword"

# Live index mode
bed index . --watch

# Performance + quality
bed bench . --queries 200 --ndcg

Benchmark Against `testdata/`

We added a Go benchmark that indexes the repository's testdata/ directory and measures semantic search throughput:

cd bed
go test ./src -bench BedSearch -run ^$

The benchmark indexes the sample files once and then repeatedly searches using SimpleSearchEngine, reporting queries_per_second so you can compare CPU and GPU configurations on your machine.

Advanced Features

INT8 Mode (75% Less Memory)

// Use 4x less memory with minimal accuracy loss
model, _ := gobed.LoadModelInt8(true)

GPU Acceleration (RTX 3090 Optimized)

// Load model normally
model, _ := gobed.LoadModel()

// Create GPU-accelerated search engine
engine := gobed.NewGPUSearchEngine(model)

// Or with custom config:
import "github.com/lee101/gobed/gpu"
config := gpu.GPUSearchConfig{
    EnableGPU: true,
    DeviceID:  0,
    BatchSize: 1000,
    UseInt8:   true,  // 4x memory reduction
}
engine := gpu.NewGPUSearchEngineWithConfig(model, config)

GPU Implementation Features

Ultra-Fast Static Embeddings (cuda_ultra_fast.cu)
- Simple token→vector lookup (not BERT)
- Pre-quantized int8 embedding table
- Automatic IVF clustering at 50K+ documents
Fused Kernels (cuda_fused_embed_search.cu)
- Single-pass: embed + average + quantize
- No intermediate memory writes
- Direct GPU search pipeline
RTX 3090 Optimizations
- 164KB shared memory per SM fully utilized
- 6MB L2 cache for persistent data
- Warp shuffle reductions
- Multi-stream processing (4 concurrent)

Async Indexing (26x Faster)

config := gobed.AsyncSearchConfig()
engine := gobed.NewSearchEngineWithConfig(model, config)

// Non-blocking indexing
response := engine.IndexBatchAsync(millionDocs)
result := <-response  // Wait when ready
// Note: result.Stats.ProcessingTime contains duration

Shared Memory (Multiple Processes)

// Share index across processes with zero-copy
config := gobed.SearchConfig{
    UseSharedMemory: true,
    SharedBasePath: "/tmp/my_index",
    MaxVectors: 1000000,
}
engine := gobed.NewSearchEngineWithConfig(model, config)

API Reference

Core Functions

// Load model
model, err := gobed.LoadModel()

// Create search engine  
engine := gobed.NewSearchEngine(model)

// Index documents
id, err := engine.Index("your text")
ids, err := engine.IndexBatch(texts)

// Search
results, err := engine.Search("query", topK)

// Direct encoding
embedding, err := model.Encode("text")
similarity, err := model.Similarity("text1", "text2")

Installation Details

Requirements

Go 1.21+
119MB for model weights
Optional: CUDA 12.8 for GPU support
Optional: AVX-512 CPU for INT8 mode

Model

Using sentence-transformers/static-retrieval-mrl-en-v1:

1024-dimensional embeddings
30,522 token vocabulary
Static embeddings with mean pooling
Learn more: Static Embeddings

Important Notes

Model Location: The model files (real_model.safetensors and tokenizer.json) must be in a model/ directory relative to where your code runs. The setup.sh script handles this automatically.

INT8 Mode: Requires a CPU with AVX-512 support. Will crash with "illegal instruction" error on older CPUs. Check your CPU with lscpu | grep avx512 on Linux.

GPU Package: The published Go package has GPU build dependencies. For now, clone the repository locally instead of using go get if you need GPU support:

git clone https://github.com/lee101/gobed
cd gobed
# Use replace directive in your go.mod
go mod edit -replace github.com/lee101/gobed=./gobed

Examples

# Basic search
cd examples
go run search_demo.go

# Large-scale benchmark  
cd cmd/ann_demo
go run main.go

# INT8 demo
cd cmd/int8_demo
go run main.go

Development

# Run tests
make test

# Benchmarks
make bench-cpu

# Format code
make fmt

License

MIT

Documentation ¶

Index ¶

Constants
func BuildCAGRACachePath(namespace string, vectorDim, graphDegree, count int) string
func Compact(model *SimpleInt8Model512, text string, maxTokens int, query string) (string, error)
func CompactBatchInt8Cosine(embeddings []*Int8Result512, refVec *simd.Vec512, refScale float32, ...) []int
func CompactSentences(model *SimpleInt8Model512, sentences []string, maxTokens int, query string) ([]string, error)
func CompactSentencesWithConfig(model *SimpleInt8Model512, sentences []string, cfg CompactConfig) ([]string, error)
func CompactWithConfig(model *SimpleInt8Model512, text string, cfg CompactConfig) (string, error)
func CosineSimilarity(a, b []float32) float32
func CosineSimilarityFloat64(a, b []float64) float64
func CosineSimilarityInt8(a, b []uint8) float32
func CosineSimilarityInt8Fallback(a, b []uint8) float32
func DebugLoggingEnabled() bool
func Debugf(format string, args ...interface{})
func Debugln(args ...interface{})
func DisableDebugLogging()
func EnableDebugLogging()
func EstimateTokens(text string) int
func EstimateTokensSlice(sentences []string) []int
func ExampleVectorSearch()
func FastQuantize(input []float32) ([]int8, float32)
func FusedCAGRAAvailable() bool
func GetCUDADeviceCount() int
func GetCUDAVersion() string
func GetEmbedBuffer() []float32
func GetInt8Buffer() []int8
func GetOptimalBatchSize() int
func GetOptimalGPUBatchSize() int
func GetSearchConfig(preset SearchPreset, estimatedSize int) search.Config
func GetTokenBuffer() []int
func IsCUDAAvailable() bool
func LoadModelUnified(config *UnifiedModelConfig) (interface{}, error)
func PutEmbedBuffer(buf []float32)
func PutInt8Buffer(buf []int8)
func PutTokenBuffer(buf []int)
func SentenceSplit(text string) []string
func SetDebugOutput(w io.Writer)
func SetSimpleInt8Verbose(verbose bool)
func Vec512FromInt8(emb *Int8Result512) *simd.Vec512
func ZeroCopyInt32ToFloat32(src []int32) []float32
type BatchConfig
type BatchEmbeddingResult
type BatchProcessor
- func NewBatchProcessor(batchSize, workers int) *BatchProcessor
- func (bp *BatchProcessor) ProcessBatch(texts []string, model *EmbeddingModel) ([]simd.Vec512, []float32, error)
type BatchResult
type BufferPool
type CAGRAConfig
- func DefaultCAGRAConfig() CAGRAConfig
- func FastCAGRAConfig() CAGRAConfig
- func QualityCAGRAConfig() CAGRAConfig
type CPUBulkIndexer
- func NewCPUBulkIndexer(index *VectorIndex, batchSize int) *CPUBulkIndexer
- func (idx *CPUBulkIndexer) IndexBatch(docs []Document) error
- func (idx *CPUBulkIndexer) Stats() CPUBulkIndexerStats
type CPUBulkIndexerStats
- func (stats CPUBulkIndexerStats) LogStats()
type CachedEmbedding
type CompactConfig
type Document
type EmbedInt8Result
type EmbeddingModel
- func LoadModel() (*EmbeddingModel, error)
- func (m *EmbeddingModel) EmbedInt8(text string) (*EmbedInt8Result, error)
- func (m *EmbeddingModel) Encode(text string) ([]float32, error)
- func (m *EmbeddingModel) FindMostSimilar(query string, candidates []string, limit int) ([]SimilarityResult, error)
- func (m *EmbeddingModel) GetAvailableTexts() []string
- func (m *EmbeddingModel) OptimizedEmbedding(text string, cache *TokenPatternCache) ([]float32, error)
- func (m *EmbeddingModel) Similarity(text1, text2 string) (float32, error)
type EmbeddingModelInt8
- func LoadModelInt8(useInt8 bool) (*EmbeddingModelInt8, error)
- func (m *EmbeddingModelInt8) ComputeEmbeddingFromTokens(tokenIDs []int) ([]uint8, error)
- func (m *EmbeddingModelInt8) Encode(text string) ([]uint8, error)
type FusedCAGRAConfig
- func DefaultFusedCAGRAConfig() FusedCAGRAConfig
type FusedCAGRAEngine
- func NewFusedCAGRAEngine(config FusedCAGRAConfig) (*FusedCAGRAEngine, error)
- func (engine *FusedCAGRAEngine) BuildIndex(embedWeights []int8, embedScales []float32, database []simd.Vec512, ...) error
- func (engine *FusedCAGRAEngine) Close()
- func (engine *FusedCAGRAEngine) GetStats() FusedCAGRAStats
- func (engine *FusedCAGRAEngine) Search(tokens []uint16) ([]SearchResult, error)
- func (engine *FusedCAGRAEngine) SearchBatch(tokenBatch [][]uint16, maxTokens int) ([][]SearchResult, error)
type FusedCAGRAStats
type GPUBatchProcessor
- func NewGPUBatchProcessor(model *EmbeddingModel, cache *TokenPatternCache) *GPUBatchProcessor
- func (p *GPUBatchProcessor) GetMetrics() map[string]interface{}
- func (p *GPUBatchProcessor) GetStats() map[string]interface{}
- func (p *GPUBatchProcessor) ProcessBatch(texts []string) ([]*EmbedInt8Result, error)
- func (p *GPUBatchProcessor) Shutdown()
type GPUBlockPool
type GPUCagraConfig
type GPUEmbeddingModel
- func NewGPUEmbeddingModel(batchSize int, useGPU bool) (*GPUEmbeddingModel, error)
- func (g *GPUEmbeddingModel) EncodeBatch(texts []string) (*BatchEmbeddingResult, error)
- func (g *GPUEmbeddingModel) MemoryOptimizedEncodeBatch(texts []string, maxMemoryMB int) (*BatchEmbeddingResult, error)
- func (g *GPUEmbeddingModel) OptimalBatchSize() int
type GPUIndexer
- func NewGPUIndexer(config IndexConfig) (*GPUIndexer, error)
- func (g *GPUIndexer) AddVectors(vectors [][]int8) error
- func (g *GPUIndexer) BatchSearch(queries [][]int8, k int) ([][]SearchResult, error)
- func (g *GPUIndexer) Close() error
- func (g *GPUIndexer) GetMemoryUsage() uint64
- func (g *GPUIndexer) GetStats() IndexStats
- func (g *GPUIndexer) IndexVectors(vectors []simd.Vec512, scales []float32) error
- func (g *GPUIndexer) Initialize() error
- func (g *GPUIndexer) IsReady() bool
- func (g *GPUIndexer) Search(query simd.Vec512, scale float32, k int) ([]int, []float32, error)
- func (g *GPUIndexer) TrainIndex(vectors [][]int8) error
type GPUMemoryConfig
- func DefaultGPUMemoryConfig() GPUMemoryConfig
type GPUMemoryManager
- func NewGPUMemoryManager(config GPUMemoryConfig) (*GPUMemoryManager, error)
- func (m *GPUMemoryManager) AllocateQueryMemory() (unsafe.Pointer, error)
- func (m *GPUMemoryManager) AllocateResultMemory() (unsafe.Pointer, error)
- func (m *GPUMemoryManager) AllocateVectorMemory() (unsafe.Pointer, error)
- func (m *GPUMemoryManager) Close() error
- func (m *GPUMemoryManager) ForceGarbageCollection()
- func (m *GPUMemoryManager) FreeQueryMemory(ptr unsafe.Pointer)
- func (m *GPUMemoryManager) FreeResultMemory(ptr unsafe.Pointer)
- func (m *GPUMemoryManager) FreeVectorMemory(ptr unsafe.Pointer)
- func (m *GPUMemoryManager) GetMemoryStats() GPUMemoryStats
- func (m *GPUMemoryManager) StartMemoryMonitor(interval time.Duration)
type GPUMemoryStats
type GPUSearchServer
- func NewGPUSearchServer(model *EmbeddingModel, config GPUServerConfig) (*GPUSearchServer, error)
- func (s *GPUSearchServer) Start() error
- func (s *GPUSearchServer) Stop() error
type GPUServerConfig
- func DefaultGPUServerConfig() GPUServerConfig
type GPUStats
type IndexComparison
type IndexConfig
- func DefaultGPUConfig() IndexConfig
type IndexData
type IndexProgress
type IndexRequest
type IndexResponse
type IndexSnapshot
type IndexStats
type IndexingStats
type Int8EmbeddingModel512
- func LoadFastModel() (*Int8EmbeddingModel512, error)
- func LoadInt8Model512() (*Int8EmbeddingModel512, error)
- func (m *Int8EmbeddingModel512) Embed(text string) ([]float32, error)
- func (m *Int8EmbeddingModel512) EmbedInt8(text string) (*Int8Result512, error)
- func (m *Int8EmbeddingModel512) EmbedTokens(tokens []int16) ([]float32, error)
- func (m *Int8EmbeddingModel512) GetInt8Weights() ([]int8, []float32)
- func (m *Int8EmbeddingModel512) GetMemoryUsage() string
- func (m *Int8EmbeddingModel512) Similarity(text1, text2 string) (float32, error)
- func (m *Int8EmbeddingModel512) Tokenize(text string) ([]int16, error)
type Int8Result512
type MemoryOptimizedCache
- func NewMemoryOptimizedCache(maxSize int) *MemoryOptimizedCache
- func (c *MemoryOptimizedCache) Clear()
- func (c *MemoryOptimizedCache) Get(text string) (*EmbedInt8Result, bool)
- func (c *MemoryOptimizedCache) Put(text string, embedding *EmbedInt8Result)
- func (c *MemoryOptimizedCache) Size() int
type ModelCompatibilityWrapper
- func LoadCompatibleModel() (*ModelCompatibilityWrapper, error)
- func (w *ModelCompatibilityWrapper) Encode(text string) ([]float32, error)
- func (w *ModelCompatibilityWrapper) EncodeInt8(text string) ([]int8, error)
type ObjectPool
- func NewObjectPool() *ObjectPool
- func (p *ObjectPool) GetEmbedding() *EmbedInt8Result
- func (p *ObjectPool) GetSlice() *[]float32
- func (p *ObjectPool) GetVector() *simd.Vec512
- func (p *ObjectPool) PutEmbedding(emb *EmbedInt8Result)
- func (p *ObjectPool) PutSlice(slice *[]float32)
- func (p *ObjectPool) PutVector(vec *simd.Vec512)
type OptimizedEmbeddingModel
- func LoadOptimizedModel() (*OptimizedEmbeddingModel, error)
- func (m *OptimizedEmbeddingModel) BatchEmbed(texts []string) ([]*EmbedInt8Result, error)
- func (m *OptimizedEmbeddingModel) EmbedInt8Optimized(text string) (*EmbedInt8Result, error)
- func (m *OptimizedEmbeddingModel) EmbedOptimized(text string) ([]float32, error)
- func (m *OptimizedEmbeddingModel) FastSearch(query string, limit int) ([]float32, error)
- func (m *OptimizedEmbeddingModel) GetStats() map[string]interface{}
- func (m *OptimizedEmbeddingModel) OptimizeForProduction(maxCacheSize int, gpuEnabled bool)
- func (m *OptimizedEmbeddingModel) PrecomputePatterns(patterns []string)
- func (m *OptimizedEmbeddingModel) WarmupCache()
type ParallelIndexConfig
- func DefaultParallelIndexConfig() ParallelIndexConfig
type ParallelIndexStats
type ParallelIndexer
- func NewParallelIndexer(engine *SearchEngine, config ParallelIndexConfig) *ParallelIndexer
- func (p *ParallelIndexer) IndexDocumentsParallel(texts []string) ([]int, error)
- func (p *ParallelIndexer) IndexWithProgress(texts []string) (<-chan IndexProgress, error)
- func (p *ParallelIndexer) OptimizeWorkers(testDocs []string) (int, error)
- func (p *ParallelIndexer) Stats() ParallelIndexStats
type ParallelProcessor
- func NewParallelProcessor() *ParallelProcessor
- func (p *ParallelProcessor) Close()
- func (p *ParallelProcessor) ProcessBatch(items []func())
type ParallelSearchEngine
- func NewParallelSearchEngine(model *EmbeddingModel, config SearchConfig) *ParallelSearchEngine
- func (e *ParallelSearchEngine) IndexBatchParallel(texts []string) ([]int, error)
- func (e *ParallelSearchEngine) IndexBatchWithComparison(texts []string) (*IndexComparison, error)
type PersistenceFormat
type PersistenceStats
- func GetPersistenceStats() PersistenceStats
type PoolStats
type PrecomputedEmbeddings
type PresetConfig
type SaveOptions
- func DefaultSaveOptions() SaveOptions
type SearchConfig
- func AsyncSearchConfig() SearchConfig
- func AutoOptimizedSearchConfig() SearchConfig
- func DefaultSearchConfig() SearchConfig
- func GPUSearchConfig() SearchConfig
type SearchEngine
- func FastSearchEngine(model *EmbeddingModel) *SearchEngine
- func NewAsyncSearchEngine(model *EmbeddingModel) *SearchEngine
- func NewAutoSearchEngine(model *EmbeddingModel) *SearchEngine
- func NewCAGRASearchEngine(model *EmbeddingModel) *SearchEngine
- func NewGPUSearchEngine(model *EmbeddingModel) *SearchEngine
- func NewSearchEngine(model *EmbeddingModel) *SearchEngine
- func NewSearchEngineWithConfig(model *EmbeddingModel, config SearchConfig) *SearchEngine
- func NewSearchEngineWithPreset(model *EmbeddingModel, preset SearchPreset) (*SearchEngine, error)
- func (se *SearchEngine) AutoSave(dir string, interval time.Duration)
- func (se *SearchEngine) Checkpoint(dir string) error
- func (se *SearchEngine) Clear()
- func (se *SearchEngine) Close() error
- func (se *SearchEngine) FindSimilar(documentID int, k int) ([]SearchResult, error)
- func (se *SearchEngine) Flush() error
- func (se *SearchEngine) GetAllDocuments() map[int]string
- func (se *SearchEngine) GetDocument(id int) (string, bool)
- func (se *SearchEngine) Index(text string) (int, error)
- func (se *SearchEngine) IndexBatch(texts []string) ([]int, error)
- func (se *SearchEngine) IndexBatchAsync(texts []string) <-chan IndexResponse
- func (se *SearchEngine) IndexBatchAsyncWithIDs(ids []int, texts []string) <-chan IndexResponse
- func (se *SearchEngine) IndexBatchWithIDs(ids []int, texts []string) error
- func (se *SearchEngine) IndexWithID(id int, text string) error
- func (se *SearchEngine) Load(path string) error
- func (se *SearchEngine) LoadFromDirectory(dir string) error
- func (se *SearchEngine) Optimize() error
- func (se *SearchEngine) QuickSave(path string) error
- func (se *SearchEngine) Save(path string, options SaveOptions) error
- func (se *SearchEngine) SaveToDirectory(dir string, options SaveOptions) error
- func (se *SearchEngine) Search(query string, k int) ([]SearchResult, error)
- func (se *SearchEngine) SearchWithOptions(query string, opts SearchOptions) ([]SearchResult, error)
- func (se *SearchEngine) Size() int
- func (se *SearchEngine) Stats() SearchEngineStats
type SearchEngineStats
type SearchOptions
type SearchPreset
type SearchRequest
type SearchResponse
type SearchResult
type SearchServer
- func NewSearchServer(model *EmbeddingModel, config ServerConfig) (*SearchServer, error)
- func (s *SearchServer) Start() error
- func (s *SearchServer) Stop() error
type ServerConfig
- func DefaultServerConfig() ServerConfig
type ServerDocument
type ServerIndexRequest
type ServerIndexResponse
type SharedIndexHeader
type SharedIndexStats
type SharedMemoryConfig
type SharedMemoryIndex
- func NewSharedMemoryIndex(config SharedMemoryConfig) (*SharedMemoryIndex, error)
- func (idx *SharedMemoryIndex) AddVector(vec *simd.Vec512, scale float32, id int) error
- func (idx *SharedMemoryIndex) BatchSearch(queries []*simd.Vec512, k int) [][]SearchResult
- func (idx *SharedMemoryIndex) Close() error
- func (idx *SharedMemoryIndex) GetVector(index int) (*simd.Vec512, error)
- func (idx *SharedMemoryIndex) ReleaseWriter()
- func (idx *SharedMemoryIndex) SearchTopK(query *simd.Vec512, k int) []SearchResult
- func (idx *SharedMemoryIndex) Stats() SharedIndexStats
- func (idx *SharedMemoryIndex) Sync() error
- func (idx *SharedMemoryIndex) TryAcquireWriter() bool
- func (idx *SharedMemoryIndex) WaitForWrites(targetSeq uint64)
type SimilarityResult
type SimpleInt8Model512
- func LoadSimpleInt8Model512() (*SimpleInt8Model512, error)
- func (m *SimpleInt8Model512) Close() error
- func (m *SimpleInt8Model512) Embed(text string) ([]float32, error)
- func (m *SimpleInt8Model512) EmbedBatchInt8(texts []string) ([]*Int8Result512, error)
- func (m *SimpleInt8Model512) EmbedBatchInt8Optimized(texts []string, progressCallback func(processed, total int)) ([]*Int8Result512, error)
- func (m *SimpleInt8Model512) EmbedDim() int
- func (m *SimpleInt8Model512) EmbedFast(text string) ([]float32, func())
- func (m *SimpleInt8Model512) EmbedInt8(text string) (*Int8Result512, error)
- func (m *SimpleInt8Model512) EmbedTokens(tokens []int16) ([]float32, error)
- func (m *SimpleInt8Model512) EmbedTokensInt8(tokens []int16) (*Int8Result512, error)
- func (m *SimpleInt8Model512) EmbedTokensInto(tokens []int16, result []float32) int
- func (m *SimpleInt8Model512) EmbedTokensIntoSIMD(tokens []int16, result []float32) int
- func (m *SimpleInt8Model512) EmbeddingTable() [][]int8
- func (m *SimpleInt8Model512) ScaleTable() []float32
- func (m *SimpleInt8Model512) Similarity(text1, text2 string) (float32, error)
- func (m *SimpleInt8Model512) SimpleTokenize(text string) []int16
- func (m *SimpleInt8Model512) VocabSize() int
type SimplifiedSearchConfig
type TensorInfo
type TokenData
type TokenFrequencyData
type TokenFrequencySection
type TokenPatternCache
- func NewTokenPatternCache(freqFile, embeddingFile string) (*TokenPatternCache, error)
- func (c *TokenPatternCache) BatchGetEmbeddings(tokenBatches [][]int) ([]*CachedEmbedding, []bool)
- func (c *TokenPatternCache) ComputeEmbeddingWithCache(tokens []int, computeFn func([]int) ([]float32, error)) ([]float32, error)
- func (c *TokenPatternCache) FilterStopwords(tokens []int, textLength int) []int
- func (c *TokenPatternCache) GetCachedEmbedding(tokens []int) (*CachedEmbedding, bool)
- func (c *TokenPatternCache) GetStats() map[string]interface{}
- func (c *TokenPatternCache) PrecomputeCommonPatterns(model *EmbeddingModel, patterns [][]int)
type TokenPatternSection
type TokenizerOptimizations
- func NewTokenizerOptimizations(maxCacheSize int) *TokenizerOptimizations
- func (t *TokenizerOptimizations) ClearCache()
- func (t *TokenizerOptimizations) TokenizeCached(text string, tokenizeFn func(string) ([]uint32, error)) ([]int, error)
type UnifiedModelConfig
- func DefaultFastConfig() *UnifiedModelConfig
type VectorBuffer
- func NewVectorBuffer(capacity int) *VectorBuffer
- func (vb *VectorBuffer) Add(vec simd.Vec512, scale float32)
- func (vb *VectorBuffer) GetVectors() ([]simd.Vec512, []float32)
- func (vb *VectorBuffer) IsFull() bool
- func (vb *VectorBuffer) Len() int
- func (vb *VectorBuffer) Reset()
type VectorIndex
- func NewVectorIndex(model *EmbeddingModel, config VectorIndexConfig) *VectorIndex
- func (idx *VectorIndex) AddDocument(doc Document) error
- func (idx *VectorIndex) AddDocuments(docs []Document) error
- func (idx *VectorIndex) AddDocumentsBulkGPU(docs []Document) error
- func (idx *VectorIndex) AddDocumentsWithMonitoring(docs []Document) (<-chan interface{}, error)
- func (idx *VectorIndex) Search(query string, k int) ([]SearchResult, error)
- func (idx *VectorIndex) Size() int
- func (idx *VectorIndex) Stats() VectorIndexStats
- func (idx *VectorIndex) Train(texts []string) error
type VectorIndexConfig
- func DefaultVectorIndexConfig() VectorIndexConfig
type VectorIndexStats

Constants ¶

View Source

const (
	MaxErrorTextLength = 50
	DefaultBatchSize   = 256
	DefaultTimeout     = 30 * time.Second
)

Constants for configuration

View Source

const (
	// Model dimensions
	Int8EmbeddingDim = 512
	Int8VocabSize    = 30522
)

Variables ¶

This section is empty.

Functions ¶

func BuildCAGRACachePath ¶

func BuildCAGRACachePath(namespace string, vectorDim, graphDegree, count int) string

BuildCAGRACachePath generates the cache path used in tests.

func Compact ¶ added in v1.0.4

func Compact(model *SimpleInt8Model512, text string, maxTokens int, query string) (string, error)

Compact compresses text to fit within maxTokens by embedding all sentences, ranking by similarity to query (or mean embedding if query is empty), and greedily selecting top sentences that fit the budget. Returns the compacted text with sentences in original order.

func CompactBatchInt8Cosine ¶ added in v1.0.4

func CompactBatchInt8Cosine(
	embeddings []*Int8Result512,
	refVec *simd.Vec512,
	refScale float32,
	tokenCounts []int,
	maxTokens int,
) []int

CompactBatchInt8Cosine is the low-level hot path: given pre-computed embeddings, a reference vector, and token counts, returns selected indices within budget. This is useful when you already have embeddings (e.g. from GPU pipeline).

func CompactSentences ¶ added in v1.0.4

func CompactSentences(model *SimpleInt8Model512, sentences []string, maxTokens int, query string) ([]string, error)

CompactSentences takes pre-split sentences, embeds them all in batch, ranks by similarity, and returns the subset that fits within maxTokens in their original order.

func CompactSentencesWithConfig ¶ added in v1.0.4

func CompactSentencesWithConfig(model *SimpleInt8Model512, sentences []string, cfg CompactConfig) ([]string, error)

CompactSentencesWithConfig is the full-featured compaction entry point.

func CompactWithConfig ¶ added in v1.0.4

func CompactWithConfig(model *SimpleInt8Model512, text string, cfg CompactConfig) (string, error)

CompactWithConfig provides full configuration control.

func CosineSimilarity ¶

func CosineSimilarity(a, b []float32) float32

CosineSimilarity calculates cosine similarity between two float32 vectors

func CosineSimilarityFloat64 ¶

func CosineSimilarityFloat64(a, b []float64) float64

CosineSimilarityFloat64 calculates cosine similarity between two float64 vectors

func CosineSimilarityInt8 ¶

func CosineSimilarityInt8(a, b []uint8) float32

CosineSimilarityInt8 computes similarity between INT8 vectors using SIMD

func CosineSimilarityInt8Fallback ¶

func CosineSimilarityInt8Fallback(a, b []uint8) float32

CosineSimilarityInt8Fallback is a pure Go fallback for systems without AVX-512

func DebugLoggingEnabled ¶

func DebugLoggingEnabled() bool

DebugLoggingEnabled returns true when debug logging is currently enabled.

func Debugf ¶

func Debugf(format string, args ...interface{})

Debugf emits a formatted debug message when debug logging is enabled.

func Debugln ¶

func Debugln(args ...interface{})

Debugln emits a debug message with default formatting when enabled.

func DisableDebugLogging ¶

func DisableDebugLogging()

DisableDebugLogging globally disables debug logging output.

func EnableDebugLogging ¶

func EnableDebugLogging()

EnableDebugLogging globally enables debug logging output.

func EstimateTokens ¶ added in v1.0.4

func EstimateTokens(text string) int

EstimateTokens returns a fast heuristic token count (~3.5 chars/token for English).

func EstimateTokensSlice ¶ added in v1.0.4

func EstimateTokensSlice(sentences []string) []int

EstimateTokensSlice returns per-sentence token estimates.

func ExampleVectorSearch ¶

func ExampleVectorSearch()

Example usage function

func FastQuantize ¶

func FastQuantize(input []float32) ([]int8, float32)

FastQuantize performs optimized quantization with SIMD hints

func FusedCAGRAAvailable ¶

func FusedCAGRAAvailable() bool

FusedCAGRAAvailable reports whether the CUDA implementation is linked in.

func GetCUDADeviceCount ¶

func GetCUDADeviceCount() int

GetCUDADeviceCount returns 0 for non-GPU builds

func GetCUDAVersion ¶

func GetCUDAVersion() string

GetCUDAVersion returns empty string for non-GPU builds

func GetEmbedBuffer ¶

func GetEmbedBuffer() []float32

GetEmbedBuffer returns a reusable embedding buffer

func GetInt8Buffer ¶

func GetInt8Buffer() []int8

GetInt8Buffer returns a reusable int8 buffer

func GetOptimalBatchSize ¶

func GetOptimalBatchSize() int

GetOptimalBatchSize returns the optimal batch size based on available memory

func GetOptimalGPUBatchSize ¶

func GetOptimalGPUBatchSize() int

GetOptimalGPUBatchSize returns optimal batch size for GPU processing

func GetSearchConfig ¶

func GetSearchConfig(preset SearchPreset, estimatedSize int) search.Config

GetSearchConfig returns the appropriate search configuration for a preset

func GetTokenBuffer ¶

func GetTokenBuffer() []int

GetTokenBuffer returns a reusable token buffer

func IsCUDAAvailable ¶

func IsCUDAAvailable() bool

IsCUDAAvailable returns false for non-GPU builds

func LoadModelUnified ¶

func LoadModelUnified(config *UnifiedModelConfig) (interface{}, error)

LoadModelUnified loads the best available model with consistent path handling

func PutEmbedBuffer ¶

func PutEmbedBuffer(buf []float32)

PutEmbedBuffer returns an embedding buffer to the pool

func PutInt8Buffer ¶

func PutInt8Buffer(buf []int8)

PutInt8Buffer returns an int8 buffer to the pool

func PutTokenBuffer ¶

func PutTokenBuffer(buf []int)

PutTokenBuffer returns a token buffer to the pool

func SentenceSplit ¶ added in v1.0.4

func SentenceSplit(text string) []string

SentenceSplit splits text into sentences on ., !, ?, newlines, and semicolons. Keeps the delimiter attached to the preceding sentence.

func SetDebugOutput ¶

func SetDebugOutput(w io.Writer)

SetDebugOutput overrides the writer used for debug output.

func SetSimpleInt8Verbose ¶

func SetSimpleInt8Verbose(verbose bool)

SetSimpleInt8Verbose controls whether model loading logs are printed

func Vec512FromInt8 ¶ added in v1.0.4

func Vec512FromInt8(emb *Int8Result512) *simd.Vec512

Vec512FromInt8 converts an Int8Result512 vector into a simd.Vec512 without allocation by reinterpreting the underlying memory.

func ZeroCopyInt32ToFloat32 ¶

func ZeroCopyInt32ToFloat32(src []int32) []float32

ZeroCopyConversion performs zero-copy type conversions when possible

Types ¶

type BatchConfig ¶

type BatchConfig struct {
	MaxBatchSize   int
	OptimalBatch   int
	MinBatch       int
	GPUMemoryLimit uint64
}

BatchConfig holds optimized batch processing configuration

type BatchEmbeddingResult ¶

type BatchEmbeddingResult struct {
	Embeddings  [][]float32
	Duration    time.Duration
	BatchSize   int
	ItemsPerSec float64
}

BatchEmbeddingResult contains results from batch processing

type BatchProcessor ¶

type BatchProcessor struct {
	// contains filtered or unexported fields
}

BatchProcessor provides optimized batch processing

func NewBatchProcessor ¶

func NewBatchProcessor(batchSize, workers int) *BatchProcessor

NewBatchProcessor creates an optimized batch processor

func (*BatchProcessor) ProcessBatch ¶

func (bp *BatchProcessor) ProcessBatch(texts []string, model *EmbeddingModel) ([]simd.Vec512, []float32, error)

ProcessBatch processes documents in optimized batches

type BatchResult ¶

type BatchResult struct {
	Embeddings [][]float32
	Time       time.Duration
}

BatchResult represents the result of batch processing

type BufferPool ¶

type BufferPool struct {
	// contains filtered or unexported fields
}

BufferPool manages reusable buffers to reduce allocations

type CAGRAConfig ¶

type CAGRAConfig struct {
	MaxVectors      int
	VectorDim       int
	GraphDegree     int
	MaxIterations   int
	CachePath       string
	TargetLatencyUs int
	TargetRecall    float32
}

CAGRAConfig captures the subset of configuration fields that tests rely on. The full implementation lives behind the cagra build tag.

func DefaultCAGRAConfig ¶

func DefaultCAGRAConfig() CAGRAConfig

DefaultCAGRAConfig mirrors the exported API so callers do not need build tags.

func FastCAGRAConfig ¶

func FastCAGRAConfig() CAGRAConfig

FastCAGRAConfig returns a speed-optimized configuration without requiring CUDA.

func QualityCAGRAConfig ¶

func QualityCAGRAConfig() CAGRAConfig

QualityCAGRAConfig returns a quality-focused configuration for callers that expect it.

type CPUBulkIndexer ¶

type CPUBulkIndexer struct {
	// contains filtered or unexported fields
}

CPUBulkIndexer provides fast bulk indexing using CPU with parallelization

func NewCPUBulkIndexer ¶

func NewCPUBulkIndexer(index *VectorIndex, batchSize int) *CPUBulkIndexer

NewCPUBulkIndexer creates a new CPU bulk indexer

func (*CPUBulkIndexer) IndexBatch ¶

func (idx *CPUBulkIndexer) IndexBatch(docs []Document) error

IndexBatch processes a batch of documents using CPU with parallelization

func (*CPUBulkIndexer) Stats ¶

func (idx *CPUBulkIndexer) Stats() CPUBulkIndexerStats

Stats returns indexing statistics

type CPUBulkIndexerStats ¶

type CPUBulkIndexerStats struct {
	TotalIndexed int64
	TotalTime    time.Duration
	BatchSize    int
	NumWorkers   int
	Throughput   float64
}

CPUBulkIndexerStats contains CPU bulk indexing statistics

func (CPUBulkIndexerStats) LogStats ¶

func (stats CPUBulkIndexerStats) LogStats()

LogStats logs the indexing statistics

type CachedEmbedding ¶

type CachedEmbedding struct {
	Vector   []float32
	VectorI8 []int8  // Quantized version
	Scale    float32 // Quantization scale
	UseCount uint32  // Track usage for cache eviction
	LastUsed int64   // Unix timestamp
}

CachedEmbedding stores a precomputed embedding

type CompactConfig ¶ added in v1.0.4

type CompactConfig struct {
	MaxTokens     int
	Query         string
	PreserveOrder bool // re-sort selected sentences by original position
}

type Document ¶

type Document struct {
	ID   int
	Text string
}

Document represents a document to index

type EmbedInt8Result ¶

type EmbedInt8Result struct {
	Vector []int8
	Scale  float32
}

EmbedInt8Result represents an int8 quantized embedding

type EmbeddingModel ¶

type EmbeddingModel struct {
	VocabSize int
	EmbedDim  int
	// contains filtered or unexported fields
}

EmbeddingModel provides a clean API for text embeddings using the real static-retrieval-mrl-en-v1 model

func LoadModel ¶

func LoadModel() (*EmbeddingModel, error)

LoadModel loads the real static-retrieval-mrl-en-v1 embedding model

func (*EmbeddingModel) EmbedInt8 ¶

func (m *EmbeddingModel) EmbedInt8(text string) (*EmbedInt8Result, error)

EmbedInt8 generates int8 quantized embeddings

func (*EmbeddingModel) Encode ¶

func (m *EmbeddingModel) Encode(text string) ([]float32, error)

Encode converts text to embedding vector using real model weights

func (*EmbeddingModel) FindMostSimilar ¶

func (m *EmbeddingModel) FindMostSimilar(query string, candidates []string, limit int) ([]SimilarityResult, error)

FindMostSimilar finds the most similar texts to a query from a list of candidates

func (*EmbeddingModel) GetAvailableTexts ¶

func (m *EmbeddingModel) GetAvailableTexts() []string

GetAvailableTexts returns all texts that can be encoded (from reference tokens)

func (*EmbeddingModel) OptimizedEmbedding ¶

func (m *EmbeddingModel) OptimizedEmbedding(text string, cache *TokenPatternCache) ([]float32, error)

OptimizedEmbedding wraps the embedding computation with caching

func (*EmbeddingModel) Similarity ¶

func (m *EmbeddingModel) Similarity(text1, text2 string) (float32, error)

Similarity calculates cosine similarity between two texts

type EmbeddingModelInt8 ¶

type EmbeddingModelInt8 struct {
	VocabSize int
	EmbedDim  int
	// contains filtered or unexported fields
}

EmbeddingModelInt8 provides INT8 quantized embeddings with SIMD acceleration

func LoadModelInt8 ¶

func LoadModelInt8(useInt8 bool) (*EmbeddingModelInt8, error)

LoadModelInt8 loads the model with INT8 quantization support

func (*EmbeddingModelInt8) ComputeEmbeddingFromTokens ¶

func (m *EmbeddingModelInt8) ComputeEmbeddingFromTokens(tokenIDs []int) ([]uint8, error)

ComputeEmbeddingFromTokens computes INT8 embedding from token IDs

func (*EmbeddingModelInt8) Encode ¶

func (m *EmbeddingModelInt8) Encode(text string) ([]uint8, error)

Encode converts text to INT8 embedding (0-255 range)

type FusedCAGRAConfig ¶

type FusedCAGRAConfig struct {
	VocabSize   int
	EmbedDim    int
	MaxVectors  int
	TopK        int
	GraphDegree int
}

FusedCAGRAConfig configures the fused CAGRA engine.

func DefaultFusedCAGRAConfig ¶

func DefaultFusedCAGRAConfig() FusedCAGRAConfig

DefaultFusedCAGRAConfig returns the default configuration used by production builds.

type FusedCAGRAEngine ¶

type FusedCAGRAEngine struct {
	// contains filtered or unexported fields
}

FusedCAGRAEngine provides a CPU fallback that mimics the fused GPU pipeline when CUDA libraries are unavailable.

func NewFusedCAGRAEngine ¶

func NewFusedCAGRAEngine(config FusedCAGRAConfig) (*FusedCAGRAEngine, error)

NewFusedCAGRAEngine creates a CPU-backed fused engine when GPU kernels are not available.

func (*FusedCAGRAEngine) BuildIndex ¶

func (engine *FusedCAGRAEngine) BuildIndex(
	embedWeights []int8,
	embedScales []float32,
	database []simd.Vec512,
	dbScales []float32,
) error

BuildIndex loads quantized embeddings and prepares the CPU search buffers.

func (*FusedCAGRAEngine) Close ¶

func (engine *FusedCAGRAEngine) Close()

Close releases any CPU resources (no-op for fallback mode).

func (*FusedCAGRAEngine) GetStats ¶

func (engine *FusedCAGRAEngine) GetStats() FusedCAGRAStats

GetStats returns statistics collected during CPU fallback execution.

func (*FusedCAGRAEngine) Search ¶

func (engine *FusedCAGRAEngine) Search(tokens []uint16) ([]SearchResult, error)

Search runs a single-query search through the CPU fallback.

func (*FusedCAGRAEngine) SearchBatch ¶

func (engine *FusedCAGRAEngine) SearchBatch(tokenBatch [][]uint16, maxTokens int) ([][]SearchResult, error)

SearchBatch executes a CPU implementation of the fused pipeline.

type FusedCAGRAStats ¶

type FusedCAGRAStats struct {
	VocabSize       int
	EmbedDim        int
	NumVectors      int
	TopK            int
	AvgSearchTimeMs float64
	SearchCount     int64
	IsBuilt         bool
}

FusedCAGRAStats mirrors the GPU implementation statistics payload.

type GPUBatchProcessor ¶

type GPUBatchProcessor struct {
	// contains filtered or unexported fields
}

GPUBatchProcessor stub for non-GPU builds

func NewGPUBatchProcessor ¶

func NewGPUBatchProcessor(model *EmbeddingModel, cache *TokenPatternCache) *GPUBatchProcessor

NewGPUBatchProcessor returns a stub for non-GPU builds

func (*GPUBatchProcessor) GetMetrics ¶

func (p *GPUBatchProcessor) GetMetrics() map[string]interface{}

GetMetrics returns empty metrics for non-GPU builds

func (*GPUBatchProcessor) GetStats ¶

func (p *GPUBatchProcessor) GetStats() map[string]interface{}

GetStats returns empty stats for non-GPU builds

func (*GPUBatchProcessor) ProcessBatch ¶

func (p *GPUBatchProcessor) ProcessBatch(texts []string) ([]*EmbedInt8Result, error)

ProcessBatch returns error for non-GPU builds

func (*GPUBatchProcessor) Shutdown ¶

func (p *GPUBatchProcessor) Shutdown()

Shutdown is a no-op for non-GPU builds

type GPUBlockPool ¶

type GPUBlockPool struct {
	// contains filtered or unexported fields
}

type GPUCagraConfig ¶

type GPUCagraConfig struct {
	NList     int
	Degree    int
	NProbe    int
	VectorDim int
}

GPUCagraConfig is defined in gpu builds; provide a stub for non-GPU builds

type GPUEmbeddingModel ¶

type GPUEmbeddingModel struct {
	*EmbeddingModel
	// contains filtered or unexported fields
}

GPUEmbeddingModel provides GPU-accelerated batch embedding processing

func NewGPUEmbeddingModel ¶

func NewGPUEmbeddingModel(batchSize int, useGPU bool) (*GPUEmbeddingModel, error)

NewGPUEmbeddingModel creates a GPU-optimized embedding model

func (*GPUEmbeddingModel) EncodeBatch ¶

func (g *GPUEmbeddingModel) EncodeBatch(texts []string) (*BatchEmbeddingResult, error)

EncodeBatch processes multiple texts in optimized batches

func (*GPUEmbeddingModel) MemoryOptimizedEncodeBatch ¶

func (g *GPUEmbeddingModel) MemoryOptimizedEncodeBatch(texts []string, maxMemoryMB int) (*BatchEmbeddingResult, error)

MemoryOptimizedEncodeBatch processes with memory efficiency

func (*GPUEmbeddingModel) OptimalBatchSize ¶

func (g *GPUEmbeddingModel) OptimalBatchSize() int

OptimalBatchSize determines the best batch size for the current hardware

type GPUIndexer ¶

type GPUIndexer struct {
	// contains filtered or unexported fields
}

GPUIndexer stub for non-GPU builds

func NewGPUIndexer ¶

func NewGPUIndexer(config IndexConfig) (*GPUIndexer, error)

NewGPUIndexer returns error for non-GPU builds

func (*GPUIndexer) AddVectors ¶

func (g *GPUIndexer) AddVectors(vectors [][]int8) error

AddVectors returns error for non-GPU builds

func (*GPUIndexer) BatchSearch ¶

func (g *GPUIndexer) BatchSearch(queries [][]int8, k int) ([][]SearchResult, error)

BatchSearch returns error for non-GPU builds

func (*GPUIndexer) Close ¶

func (g *GPUIndexer) Close() error

Close is a no-op for non-GPU builds

func (*GPUIndexer) GetMemoryUsage ¶

func (g *GPUIndexer) GetMemoryUsage() uint64

GetMemoryUsage returns 0 for non-GPU builds

func (*GPUIndexer) GetStats ¶

func (g *GPUIndexer) GetStats() IndexStats

GetStats returns empty stats for non-GPU builds

func (*GPUIndexer) IndexVectors ¶

func (g *GPUIndexer) IndexVectors(vectors []simd.Vec512, scales []float32) error

IndexVectors returns error for non-GPU builds

func (*GPUIndexer) Initialize ¶

func (g *GPUIndexer) Initialize() error

Initialize returns error for non-GPU builds

func (*GPUIndexer) IsReady ¶

func (g *GPUIndexer) IsReady() bool

IsReady returns false for non-GPU builds

func (*GPUIndexer) Search ¶

func (g *GPUIndexer) Search(query simd.Vec512, scale float32, k int) ([]int, []float32, error)

Search returns error for non-GPU builds

func (*GPUIndexer) TrainIndex ¶

func (g *GPUIndexer) TrainIndex(vectors [][]int8) error

TrainIndex returns error for non-GPU builds

type GPUMemoryConfig ¶

type GPUMemoryConfig struct {
	DeviceID              int
	MaxMemoryUsagePercent float64
	VectorPoolBlockSize   uint64
	QueryPoolBlockSize    uint64
	ResultPoolBlockSize   uint64
	MaxVectorBlocks       int
	MaxQueryBlocks        int
	MaxResultBlocks       int
	ReserveMemoryMB       uint64
}

func DefaultGPUMemoryConfig ¶

func DefaultGPUMemoryConfig() GPUMemoryConfig

type GPUMemoryManager ¶

type GPUMemoryManager struct {
	// contains filtered or unexported fields
}

func NewGPUMemoryManager ¶

func NewGPUMemoryManager(config GPUMemoryConfig) (*GPUMemoryManager, error)

func (*GPUMemoryManager) AllocateQueryMemory ¶

func (m *GPUMemoryManager) AllocateQueryMemory() (unsafe.Pointer, error)

func (*GPUMemoryManager) AllocateResultMemory ¶

func (m *GPUMemoryManager) AllocateResultMemory() (unsafe.Pointer, error)

func (*GPUMemoryManager) AllocateVectorMemory ¶

func (m *GPUMemoryManager) AllocateVectorMemory() (unsafe.Pointer, error)

func (*GPUMemoryManager) Close ¶

func (m *GPUMemoryManager) Close() error

func (*GPUMemoryManager) ForceGarbageCollection ¶

func (m *GPUMemoryManager) ForceGarbageCollection()

func (*GPUMemoryManager) FreeQueryMemory ¶

func (m *GPUMemoryManager) FreeQueryMemory(ptr unsafe.Pointer)

func (*GPUMemoryManager) FreeResultMemory ¶

func (m *GPUMemoryManager) FreeResultMemory(ptr unsafe.Pointer)

func (*GPUMemoryManager) FreeVectorMemory ¶

func (m *GPUMemoryManager) FreeVectorMemory(ptr unsafe.Pointer)

func (*GPUMemoryManager) GetMemoryStats ¶

func (m *GPUMemoryManager) GetMemoryStats() GPUMemoryStats

func (*GPUMemoryManager) StartMemoryMonitor ¶

func (m *GPUMemoryManager) StartMemoryMonitor(interval time.Duration)

type GPUMemoryStats ¶

type GPUMemoryStats struct {
	TotalMemoryGB   float64
	FreeMemoryGB    float64
	AllocatedGB     float64
	PeakUsageGB     float64
	MaxUsageGB      float64
	AllocationCount uint64
	VectorPoolStats PoolStats
	QueryPoolStats  PoolStats
	ResultPoolStats PoolStats
}

type GPUSearchServer ¶

type GPUSearchServer struct {
	// contains filtered or unexported fields
}

GPUSearchServer provides a high-performance CUDA-accelerated HTTP search server

func NewGPUSearchServer ¶

func NewGPUSearchServer(model *EmbeddingModel, config GPUServerConfig) (*GPUSearchServer, error)

NewGPUSearchServer creates a new CUDA-accelerated search server

func (*GPUSearchServer) Start ¶

func (s *GPUSearchServer) Start() error

Start starts the GPU-accelerated search server

func (*GPUSearchServer) Stop ¶

func (s *GPUSearchServer) Stop() error

Stop gracefully stops the GPU search server

type GPUServerConfig ¶

type GPUServerConfig struct {
	Port              int
	SharedIndexPath   string
	MaxVectors        int
	MaxConcurrency    int
	EnableProfiling   bool
	EnableMetrics     bool
	ReadOnly          bool
	PreloadEmbeddings bool
	WorkerThreads     int

	// GPU-specific configuration
	GPUDeviceID       int
	GPUBatchSize      int
	EnableGPUFallback bool
	GPUMemoryLimitMB  int
	IndexingBatchSize int
}

GPUServerConfig configures the GPU-accelerated search server

func DefaultGPUServerConfig ¶

func DefaultGPUServerConfig() GPUServerConfig

DefaultGPUServerConfig returns optimized GPU server configuration

type GPUStats ¶

type GPUStats struct {
	DeviceID    int
	MemoryUsed  int64
	MemoryTotal int64
	Utilization float32
	Temperature float32
}

GPUStats represents GPU statistics

type IndexComparison ¶

type IndexComparison struct {
	NumDocuments    int
	SequentialTime  time.Duration
	AsyncTime       time.Duration
	ParallelTime    time.Duration
	GPUTime         time.Duration
	SequentialError error
	AsyncError      error
	ParallelError   error
	GPUError        error
	AsyncSpeedup    float64
	ParallelSpeedup float64
	GPUSpeedup      float64
}

IndexComparison contains comparison results

type IndexConfig ¶

type IndexConfig struct {
	VectorDim        int
	NumSubquantizers int
	CodebookSize     int
	IVFClusters      int
	ProbeLists       int
	RerankK          int
	DeviceID         int
}

IndexConfig stub

func DefaultGPUConfig ¶

func DefaultGPUConfig() IndexConfig

DefaultGPUConfig returns a default configuration for GPU indexing (stub)

type IndexData ¶

type IndexData struct {
	// Core index data
	Vectors       [][]float32 `json:"-"` // Skip in JSON, too large
	VectorsBinary []byte      `json:"vectors_binary,omitempty"`
	IDs           []int       `json:"ids"`

	// Index state
	IndexType string `json:"index_type"`
	Trained   bool   `json:"trained"`

	// Stats
	MemoryUsageMB float64 `json:"memory_usage_mb"`
}

IndexData contains the serializable index structures

type IndexProgress ¶

type IndexProgress struct {
	Current    int
	Total      int
	Percentage float64
	DocsPerSec float64
	TimeLeft   time.Duration
}

IndexProgress represents indexing progress

type IndexRequest ¶

type IndexRequest struct {
	IDs      []int
	Texts    []string
	Response chan IndexResponse
	Context  context.Context
}

IndexRequest represents an async indexing request

type IndexResponse ¶

type IndexResponse struct {
	IDs   []int
	Error error
	Stats IndexingStats
}

IndexResponse contains the result of async indexing

type IndexSnapshot ¶

type IndexSnapshot struct {
	Version      string                 `json:"version"`
	CreatedAt    time.Time              `json:"created_at"`
	NumDocuments int                    `json:"num_documents"`
	Config       SearchConfig           `json:"config"`
	Documents    map[int]string         `json:"documents"`
	IndexData    *IndexData             `json:"index_data,omitempty"`
	Metadata     map[string]interface{} `json:"metadata,omitempty"`
}

IndexSnapshot represents a serializable snapshot of a SearchEngine

type IndexStats ¶

type IndexStats struct {
	NumVectors      int
	VectorDim       int
	IVFClusters     int
	PQSubquantizers int
	GPUMemoryMB     float32
	IsTrained       bool
	IndexBuilt      bool
}

IndexStats provides indexer statistics

type IndexingStats ¶

type IndexingStats struct {
	DocumentsProcessed int
	ProcessingTime     time.Duration
	EmbeddingTime      time.Duration
	IndexingTime       time.Duration
}

IndexingStats provides indexing performance metrics

type Int8EmbeddingModel512 ¶

type Int8EmbeddingModel512 struct {
	// contains filtered or unexported fields
}

Int8EmbeddingModel512 represents the int8 quantized model with 512 dimensions

func LoadFastModel ¶

func LoadFastModel() (*Int8EmbeddingModel512, error)

LoadFastModel is a convenience function that always loads the fastest model

func LoadInt8Model512 ¶

func LoadInt8Model512() (*Int8EmbeddingModel512, error)

LoadInt8Model512 loads the int8 quantized model with 512 dimensions

func (*Int8EmbeddingModel512) Embed ¶

func (m *Int8EmbeddingModel512) Embed(text string) ([]float32, error)

Embed embeds text using int8 model

func (*Int8EmbeddingModel512) EmbedInt8 ¶

func (m *Int8EmbeddingModel512) EmbedInt8(text string) (*Int8Result512, error)

EmbedInt8 returns int8 quantized embedding

func (*Int8EmbeddingModel512) EmbedTokens ¶

func (m *Int8EmbeddingModel512) EmbedTokens(tokens []int16) ([]float32, error)

EmbedTokens embeds int16 token IDs directly (no tokenization needed)

func (*Int8EmbeddingModel512) GetInt8Weights ¶

func (m *Int8EmbeddingModel512) GetInt8Weights() ([]int8, []float32)

GetInt8Weights returns flattened embedding weights and scales for GPU use

func (*Int8EmbeddingModel512) GetMemoryUsage ¶

func (m *Int8EmbeddingModel512) GetMemoryUsage() string

GetMemoryUsage returns the memory usage of the model

func (*Int8EmbeddingModel512) Similarity ¶

func (m *Int8EmbeddingModel512) Similarity(text1, text2 string) (float32, error)

Similarity computes cosine similarity between two texts using int8 embeddings

func (*Int8EmbeddingModel512) Tokenize ¶

func (m *Int8EmbeddingModel512) Tokenize(text string) ([]int16, error)

Tokenize converts text to int16 token IDs using the in-process wordpiece tokenizer.

type Int8Result512 ¶

type Int8Result512 struct {
	Vector []int8  // 512-dimensional int8 vector
	Scale  float32 // Scale factor for dequantization
}

EmbedInt8Result represents the result with int8 vector and scale

type MemoryOptimizedCache ¶

type MemoryOptimizedCache struct {
	// contains filtered or unexported fields
}

MemoryOptimizedCache provides a cache with memory management

func NewMemoryOptimizedCache ¶

func NewMemoryOptimizedCache(maxSize int) *MemoryOptimizedCache

NewMemoryOptimizedCache creates a memory-optimized embedding cache

func (*MemoryOptimizedCache) Clear ¶

func (c *MemoryOptimizedCache) Clear()

Clear clears the cache

func (*MemoryOptimizedCache) Get ¶

func (c *MemoryOptimizedCache) Get(text string) (*EmbedInt8Result, bool)

Get retrieves an embedding from cache

func (*MemoryOptimizedCache) Put ¶

func (c *MemoryOptimizedCache) Put(text string, embedding *EmbedInt8Result)

Put stores an embedding in cache

func (*MemoryOptimizedCache) Size ¶

func (c *MemoryOptimizedCache) Size() int

Size returns current cache size

type ModelCompatibilityWrapper ¶

type ModelCompatibilityWrapper struct {
	// contains filtered or unexported fields
}

ModelCompatibilityWrapper provides a common interface for both model types

func LoadCompatibleModel ¶

func LoadCompatibleModel() (*ModelCompatibilityWrapper, error)

LoadCompatibleModel loads a model that works with all existing code

func (*ModelCompatibilityWrapper) Encode ¶

func (w *ModelCompatibilityWrapper) Encode(text string) ([]float32, error)

Encode provides a unified interface for both model types

func (*ModelCompatibilityWrapper) EncodeInt8 ¶

func (w *ModelCompatibilityWrapper) EncodeInt8(text string) ([]int8, error)

EncodeInt8 provides fast int8 encoding when available

type ObjectPool ¶

type ObjectPool struct {
	// contains filtered or unexported fields
}

ObjectPool provides reusable object pools to reduce allocations

func NewObjectPool ¶

func NewObjectPool() *ObjectPool

NewObjectPool creates optimized object pools

func (*ObjectPool) GetEmbedding ¶

func (p *ObjectPool) GetEmbedding() *EmbedInt8Result

GetEmbedding gets a reusable embedding result from pool

func (*ObjectPool) GetSlice ¶

func (p *ObjectPool) GetSlice() *[]float32

GetSlice gets a reusable float32 slice from pool

func (*ObjectPool) GetVector ¶

func (p *ObjectPool) GetVector() *simd.Vec512

GetVector gets a reusable vector from pool

func (*ObjectPool) PutEmbedding ¶

func (p *ObjectPool) PutEmbedding(emb *EmbedInt8Result)

PutEmbedding returns embedding to pool

func (*ObjectPool) PutSlice ¶

func (p *ObjectPool) PutSlice(slice *[]float32)

PutSlice returns slice to pool

func (*ObjectPool) PutVector ¶

func (p *ObjectPool) PutVector(vec *simd.Vec512)

PutVector returns vector to pool

type OptimizedEmbeddingModel ¶

type OptimizedEmbeddingModel struct {
	*EmbeddingModel
	// contains filtered or unexported fields
}

OptimizedEmbeddingModel extends the base model with caching and batch processing

func LoadOptimizedModel ¶

func LoadOptimizedModel() (*OptimizedEmbeddingModel, error)

LoadOptimizedModel loads the model with all optimizations enabled

func (*OptimizedEmbeddingModel) BatchEmbed ¶

func (m *OptimizedEmbeddingModel) BatchEmbed(texts []string) ([]*EmbedInt8Result, error)

BatchEmbed processes multiple texts efficiently

func (*OptimizedEmbeddingModel) EmbedInt8Optimized ¶

func (m *OptimizedEmbeddingModel) EmbedInt8Optimized(text string) (*EmbedInt8Result, error)

EmbedInt8Optimized generates quantized embeddings with optimizations

func (*OptimizedEmbeddingModel) EmbedOptimized ¶

func (m *OptimizedEmbeddingModel) EmbedOptimized(text string) ([]float32, error)

EmbedOptimized generates embeddings with all optimizations

func (*OptimizedEmbeddingModel) FastSearch ¶

func (m *OptimizedEmbeddingModel) FastSearch(query string, limit int) ([]float32, error)

FastSearch performs optimized search for AI images

func (*OptimizedEmbeddingModel) GetStats ¶

func (m *OptimizedEmbeddingModel) GetStats() map[string]interface{}

GetStats returns performance statistics

func (*OptimizedEmbeddingModel) OptimizeForProduction ¶

func (m *OptimizedEmbeddingModel) OptimizeForProduction(maxCacheSize int, gpuEnabled bool)

OptimizeForProduction applies production-ready optimizations

func (*OptimizedEmbeddingModel) PrecomputePatterns ¶

func (m *OptimizedEmbeddingModel) PrecomputePatterns(patterns []string)

PrecomputePatterns adds new patterns to the cache

func (*OptimizedEmbeddingModel) WarmupCache ¶

func (m *OptimizedEmbeddingModel) WarmupCache()

WarmupCache preloads common search patterns

type ParallelIndexConfig ¶

type ParallelIndexConfig struct {
	NumWorkers  int  // Number of parallel workers (default: NumCPU)
	BatchSize   int  // Batch size for processing (default: 100)
	EnableCache bool // Enable embedding cache (default: true)
	QueueSize   int  // Task queue size (default: 1000)
}

ParallelIndexConfig configures parallel indexing

func DefaultParallelIndexConfig ¶

func DefaultParallelIndexConfig() ParallelIndexConfig

DefaultParallelIndexConfig returns optimized configuration

type ParallelIndexStats ¶

type ParallelIndexStats struct {
	TotalIndexed uint64
	TotalTime    time.Duration
	Errors       uint32
	NumWorkers   int
	BatchSize    int
	DocsPerSec   float64
}

ParallelIndexStats contains parallel indexing statistics

type ParallelIndexer ¶

type ParallelIndexer struct {
	// contains filtered or unexported fields
}

ParallelIndexer provides high-performance parallel indexing

func NewParallelIndexer ¶

func NewParallelIndexer(engine *SearchEngine, config ParallelIndexConfig) *ParallelIndexer

NewParallelIndexer creates a new parallel indexer

func (*ParallelIndexer) IndexDocumentsParallel ¶

func (p *ParallelIndexer) IndexDocumentsParallel(texts []string) ([]int, error)

IndexDocumentsParallel indexes documents using parallel processing

func (*ParallelIndexer) IndexWithProgress ¶

func (p *ParallelIndexer) IndexWithProgress(texts []string) (<-chan IndexProgress, error)

IndexWithProgress indexes documents with progress reporting

func (*ParallelIndexer) OptimizeWorkers ¶

func (p *ParallelIndexer) OptimizeWorkers(testDocs []string) (int, error)

OptimizeWorkers finds the optimal number of workers

func (*ParallelIndexer) Stats ¶

func (p *ParallelIndexer) Stats() ParallelIndexStats

Stats returns parallel indexer statistics

type ParallelProcessor ¶

type ParallelProcessor struct {
	// contains filtered or unexported fields
}

ParallelProcessor handles parallel batch processing with controlled concurrency

func NewParallelProcessor ¶

func NewParallelProcessor() *ParallelProcessor

NewParallelProcessor creates a processor with optimal worker count

func (*ParallelProcessor) Close ¶

func (p *ParallelProcessor) Close()

Close shuts down the processor

func (*ParallelProcessor) ProcessBatch ¶

func (p *ParallelProcessor) ProcessBatch(items []func())

ProcessBatch processes items in parallel with controlled concurrency

type ParallelSearchEngine ¶

type ParallelSearchEngine struct {
	*SearchEngine
	// contains filtered or unexported fields
}

ParallelSearchEngine extends SearchEngine with parallel capabilities

func NewParallelSearchEngine ¶

func NewParallelSearchEngine(model *EmbeddingModel, config SearchConfig) *ParallelSearchEngine

NewParallelSearchEngine creates a search engine with parallel indexing

func (*ParallelSearchEngine) IndexBatchParallel ¶

func (e *ParallelSearchEngine) IndexBatchParallel(texts []string) ([]int, error)

IndexBatchParallel indexes documents using CPU parallelization

func (*ParallelSearchEngine) IndexBatchWithComparison ¶

func (e *ParallelSearchEngine) IndexBatchWithComparison(texts []string) (*IndexComparison, error)

IndexBatchWithComparison compares different indexing methods

type PersistenceFormat ¶

type PersistenceFormat string

PersistenceFormat represents the format for saving/loading indexes

const (
	// FormatBinary uses Go's gob encoding (fastest, Go-specific)
	FormatBinary PersistenceFormat = "binary"
	// FormatJSON uses JSON encoding (portable but slower)
	FormatJSON PersistenceFormat = "json"
)

type PersistenceStats ¶

type PersistenceStats struct {
	LastSaved    time.Time
	SaveCount    int
	LoadCount    int
	LastLoadTime time.Duration
	LastSaveTime time.Duration
}

PersistenceStats returns statistics about saved indexes

func GetPersistenceStats ¶

func GetPersistenceStats() PersistenceStats

GetPersistenceStats returns persistence statistics

type PoolStats ¶

type PoolStats struct {
	Name         string
	BlockSizeMB  float64
	MaxBlocks    int
	AllocBlocks  int
	FreeBlocks   int
	TotalAllocGB float64
	AllocCount   uint64
	FreeCount    uint64
}

type PrecomputedEmbeddings ¶

type PrecomputedEmbeddings struct {
	Single   map[string][]float32 `json:"single"`
	Bigram   map[string][]float32 `json:"bigram"`
	Trigram  map[string][]float32 `json:"trigram"`
	Fourgram map[string][]float32 `json:"fourgram"`
}

PrecomputedEmbeddings stores actual embedding vectors

type PresetConfig ¶

type PresetConfig struct {
	Preset      SearchPreset
	DatasetSize int // Estimated number of vectors
}

PresetConfig contains simplified configuration options

type SaveOptions ¶

type SaveOptions struct {
	Format       PersistenceFormat
	Compress     bool
	IncludeTexts bool
	Metadata     map[string]interface{}
}

SaveOptions configures how the index is saved

func DefaultSaveOptions ¶

func DefaultSaveOptions() SaveOptions

DefaultSaveOptions returns recommended save options

type SearchConfig ¶

type SearchConfig struct {
	// Automatic mode - let the engine choose optimal settings
	AutoMode bool

	// Preset configuration (when AutoMode is true)
	Preset SearchPreset // Use predefined configuration preset

	// Manual configuration (when AutoMode is false)
	MaxExactSearchSize int  // Use exact search below this size (default: 50000)
	NumClusters        int  // Number of IVF clusters (default: auto)
	SearchClusters     int  // Number of clusters to search (default: auto)
	UseCompression     bool // Use PQ compression for large datasets (default: auto)
	UseGraphRouting    bool // Use HNSW for centroid routing (default: auto)
	CandidatesToRerank int  // Number of candidates to rerank (default: auto)

	// Async configuration
	EnableAsync    bool // Enable async indexing (default: false)
	AsyncWorkers   int  // Number of async workers (default: 4)
	AsyncQueueSize int  // Size of async queue (default: 1000)
	MaxConcurrency int  // Maximum concurrent operations (default: runtime.NumCPU())

	// GPU acceleration configuration
	EnableGPU    bool // Enable GPU acceleration for similarity search (default: false)
	GPUDeviceID  int  // CUDA device ID to use (default: 0)
	GPUBatchSize int  // Batch size for GPU operations (default: 1000)
	UseInt8      bool // Use int8 quantization for embeddings (75% memory savings)
}

SearchConfig configures the search engine

func AsyncSearchConfig ¶

func AsyncSearchConfig() SearchConfig

AsyncSearchConfig returns configuration optimized for async processing

func AutoOptimizedSearchConfig ¶

func AutoOptimizedSearchConfig() SearchConfig

AutoOptimizedSearchConfig returns the best configuration based on available hardware It automatically detects and uses GPU acceleration when available

func DefaultSearchConfig ¶

func DefaultSearchConfig() SearchConfig

DefaultSearchConfig returns optimized default configuration Automatically detects and enables GPU with CAGRA when available

func GPUSearchConfig ¶

func GPUSearchConfig() SearchConfig

GPUSearchConfig returns configuration optimized for GPU acceleration with CAGRA

type SearchEngine ¶

type SearchEngine struct {
	// contains filtered or unexported fields
}

SearchEngine provides a high-level API for vector search

func FastSearchEngine ¶

func FastSearchEngine(model *EmbeddingModel) *SearchEngine

FastSearchEngine is an alias for NewAutoSearchEngine for backward compatibility

func NewAsyncSearchEngine ¶

func NewAsyncSearchEngine(model *EmbeddingModel) *SearchEngine

NewAsyncSearchEngine creates a search engine optimized for async operations

func NewAutoSearchEngine ¶

func NewAutoSearchEngine(model *EmbeddingModel) *SearchEngine

NewAutoSearchEngine creates a search engine with automatic hardware optimization This is the recommended way to create a search engine - it will automatically use GPU acceleration if available, or optimized CPU settings otherwise

func NewCAGRASearchEngine ¶

func NewCAGRASearchEngine(model *EmbeddingModel) *SearchEngine

NewCAGRASearchEngine creates a CAGRA-powered search engine for ultra-fast search

func NewGPUSearchEngine ¶

func NewGPUSearchEngine(model *EmbeddingModel) *SearchEngine

NewGPUSearchEngine creates a GPU-accelerated search engine with CAGRA

func NewSearchEngine ¶

func NewSearchEngine(model *EmbeddingModel) *SearchEngine

NewSearchEngine creates a new search engine It automatically uses GPU acceleration if available for 39x performance boost

func NewSearchEngineWithConfig ¶

func NewSearchEngineWithConfig(model *EmbeddingModel, config SearchConfig) *SearchEngine

NewSearchEngineWithConfig creates a search engine with custom configuration

func NewSearchEngineWithPreset ¶

func NewSearchEngineWithPreset(model *EmbeddingModel, preset SearchPreset) (*SearchEngine, error)

NewSearchEngineWithPreset creates a search engine with a preset configuration

func (*SearchEngine) AutoSave ¶

func (se *SearchEngine) AutoSave(dir string, interval time.Duration)

AutoSave starts automatic periodic saving

func (*SearchEngine) Checkpoint ¶

func (se *SearchEngine) Checkpoint(dir string) error

Checkpoint creates a checkpoint of the current index state

func (*SearchEngine) Clear ¶

func (se *SearchEngine) Clear()

Clear removes all indexed documents

func (*SearchEngine) Close ¶

func (se *SearchEngine) Close() error

Close shuts down the search engine and stops async workers

func (*SearchEngine) FindSimilar ¶

func (se *SearchEngine) FindSimilar(documentID int, k int) ([]SearchResult, error)

FindSimilar finds documents similar to a given document ID

func (*SearchEngine) Flush ¶

func (se *SearchEngine) Flush() error

Flush waits for all pending async indexing operations to complete

func (*SearchEngine) GetAllDocuments ¶

func (se *SearchEngine) GetAllDocuments() map[int]string

GetAllDocuments returns all indexed documents

func (*SearchEngine) GetDocument ¶

func (se *SearchEngine) GetDocument(id int) (string, bool)

GetDocument retrieves a document by ID

func (*SearchEngine) Index ¶

func (se *SearchEngine) Index(text string) (int, error)

Index adds and indexes a single text with auto-generated ID

func (*SearchEngine) IndexBatch ¶

func (se *SearchEngine) IndexBatch(texts []string) ([]int, error)

IndexBatch efficiently indexes multiple texts

func (*SearchEngine) IndexBatchAsync ¶

func (se *SearchEngine) IndexBatchAsync(texts []string) <-chan IndexResponse

IndexBatchAsync asynchronously indexes multiple texts and returns a channel for the result

func (*SearchEngine) IndexBatchAsyncWithIDs ¶

func (se *SearchEngine) IndexBatchAsyncWithIDs(ids []int, texts []string) <-chan IndexResponse

IndexBatchAsyncWithIDs asynchronously indexes texts with specific IDs

func (*SearchEngine) IndexBatchWithIDs ¶

func (se *SearchEngine) IndexBatchWithIDs(ids []int, texts []string) error

IndexBatchWithIDs efficiently indexes multiple texts with specific IDs

func (*SearchEngine) IndexWithID ¶

func (se *SearchEngine) IndexWithID(id int, text string) error

IndexWithID adds and indexes a text with a specific ID

func (*SearchEngine) Load ¶

func (se *SearchEngine) Load(path string) error

Load restores a SearchEngine from disk

func (*SearchEngine) LoadFromDirectory ¶

func (se *SearchEngine) LoadFromDirectory(dir string) error

LoadFromDirectory loads the index from a directory

func (*SearchEngine) Optimize ¶

func (se *SearchEngine) Optimize() error

Optimize rebuilds the index with optimal parameters for current data

func (*SearchEngine) QuickSave ¶

func (se *SearchEngine) QuickSave(path string) error

QuickSave saves the index with default options

func (*SearchEngine) Save ¶

func (se *SearchEngine) Save(path string, options SaveOptions) error

Save persists the SearchEngine to disk

func (*SearchEngine) SaveToDirectory ¶

func (se *SearchEngine) SaveToDirectory(dir string, options SaveOptions) error

SaveToDirectory saves the index to a directory with metadata

func (*SearchEngine) Search ¶

func (se *SearchEngine) Search(query string, k int) ([]SearchResult, error)

Search performs semantic search and returns top K results

func (*SearchEngine) SearchWithOptions ¶

func (se *SearchEngine) SearchWithOptions(query string, opts SearchOptions) ([]SearchResult, error)

SearchWithOptions performs search with advanced options

func (*SearchEngine) Size ¶

func (se *SearchEngine) Size() int

Size returns the number of indexed documents

func (*SearchEngine) Stats ¶

func (se *SearchEngine) Stats() SearchEngineStats

Stats returns search engine statistics

type SearchEngineStats ¶

type SearchEngineStats struct {
	NumDocuments  int
	IndexType     string
	MemoryUsageMB float64
	Initialized   bool
	IndexDetails  map[string]interface{}
}

SearchEngineStats contains engine statistics

type SearchOptions ¶

type SearchOptions struct {
	TopK           int     // Number of results to return
	MinSimilarity  float32 // Minimum similarity threshold (0-1)
	MaxDistance    float32 // Maximum distance threshold
	IncludeVectors bool    // Include embedding vectors in results
}

SearchOptions provides advanced search options

type SearchPreset ¶

type SearchPreset int

SearchPreset represents predefined search configurations

const (
	// FastPreset prioritizes speed for small datasets (<50K vectors)
	FastPreset SearchPreset = iota
	// BalancedPreset balances speed and accuracy for medium datasets (50K-500K vectors)
	BalancedPreset
	// AccuratePreset prioritizes accuracy for large datasets (>500K vectors)
	AccuratePreset
	// CAGRAPreset uses NVIDIA CAGRA for ultra-fast search (sub-millisecond latency)
	CAGRAPreset
	// CustomPreset allows manual configuration
	CustomPreset
)

type SearchRequest ¶

type SearchRequest struct {
	Query     string   `json:"query"`
	Queries   []string `json:"queries,omitempty"` // Batch search
	K         int      `json:"k"`
	Timeout   int      `json:"timeout_ms,omitempty"`
	RequestID string   `json:"request_id,omitempty"`
}

SearchRequest represents a search API request

type SearchResponse ¶

type SearchResponse struct {
	Results   []SearchResult   `json:"results,omitempty"`
	Batch     [][]SearchResult `json:"batch,omitempty"`
	Latency   int64            `json:"latency_us"`
	RequestID string           `json:"request_id,omitempty"`
	Error     string           `json:"error,omitempty"`
}

SearchResponse represents a search API response

type SearchResult ¶

type SearchResult struct {
	ID         int
	Text       string
	Similarity float32
	Distance   float32
}

SearchResult represents a search result with similarity score

type SearchServer ¶

type SearchServer struct {
	// contains filtered or unexported fields
}

SearchServer provides a high-performance HTTP search server with shared memory

func NewSearchServer ¶

func NewSearchServer(model *EmbeddingModel, config ServerConfig) (*SearchServer, error)

NewSearchServer creates a new high-performance search server

func (*SearchServer) Start ¶

func (s *SearchServer) Start() error

Start starts the search server

func (*SearchServer) Stop ¶

func (s *SearchServer) Stop() error

Stop gracefully stops the server

type ServerConfig ¶

type ServerConfig struct {
	Port              int
	SharedIndexPath   string
	MaxVectors        int
	MaxConcurrency    int
	EnableProfiling   bool
	EnableMetrics     bool
	ReadOnly          bool
	PreloadEmbeddings bool
	WorkerThreads     int
}

ServerConfig configures the search server

func DefaultServerConfig ¶

func DefaultServerConfig() ServerConfig

DefaultServerConfig returns optimized server configuration

type ServerDocument ¶

type ServerDocument struct {
	ID   int    `json:"id"`
	Text string `json:"text"`
}

ServerDocument represents a document to index

type ServerIndexRequest ¶

type ServerIndexRequest struct {
	Documents []ServerDocument `json:"documents"`
	Async     bool             `json:"async,omitempty"`
	RequestID string           `json:"request_id,omitempty"`
}

ServerIndexRequest represents an indexing API request

type ServerIndexResponse ¶

type ServerIndexResponse struct {
	Indexed   int    `json:"indexed"`
	Latency   int64  `json:"latency_us"`
	RequestID string `json:"request_id,omitempty"`
	Error     string `json:"error,omitempty"`
}

ServerIndexResponse represents an indexing API response

type SharedIndexHeader ¶

type SharedIndexHeader struct {
	// Atomic fields for lock-free reads
	NumVectors uint64 // Number of vectors (atomic)
	VectorDim  uint32 // Vector dimensions (512)
	Version    uint32 // Index version for compatibility

	// Index configuration
	MaxVectors uint32 // Maximum capacity
	IndexType  uint32 // 0=flat, 1=ivf, 2=hnsw

	// Memory layout
	VectorOffset uint64 // Offset to vector data
	MetaOffset   uint64 // Offset to metadata
	ScalesOffset uint64 // Offset to scale factors
	IDsOffset    uint64 // Offset to document IDs

	// Write coordination
	WriterPID   int32  // PID of current writer (-1 if none)
	WriteSeqNum uint64 // Sequence number for writes

	// Statistics
	TotalSearches uint64 // Total searches performed
	TotalReads    uint64 // Total vector reads
}

SharedIndexHeader is the header stored in shared memory

type SharedIndexStats ¶

type SharedIndexStats struct {
	NumVectors    uint64
	MaxVectors    uint64
	TotalSearches uint64
	TotalReads    uint64
	WriteSeqNum   uint64
	CacheSize     int
	MemoryUsageMB float64
}

SharedIndexStats contains index statistics

type SharedMemoryConfig ¶

type SharedMemoryConfig struct {
	BasePath    string // Base path for memory mapped files
	MaxVectors  int    // Maximum number of vectors
	ReadOnly    bool   // Open in read-only mode
	CreateIfNew bool   // Create new index if doesn't exist
	CacheSize   int    // Size of hot vector cache
	UseLockFree bool   // Use lock-free algorithms
}

SharedMemoryConfig configures shared memory index

type SharedMemoryIndex ¶

type SharedMemoryIndex struct {
	// contains filtered or unexported fields
}

SharedMemoryIndex provides zero-copy, cross-process vector search

func NewSharedMemoryIndex ¶

func NewSharedMemoryIndex(config SharedMemoryConfig) (*SharedMemoryIndex, error)

NewSharedMemoryIndex creates a new shared memory index

func (*SharedMemoryIndex) AddVector ¶

func (idx *SharedMemoryIndex) AddVector(vec *simd.Vec512, scale float32, id int) error

AddVector adds a vector to the shared index (writer only)

func (*SharedMemoryIndex) BatchSearch ¶

func (idx *SharedMemoryIndex) BatchSearch(queries []*simd.Vec512, k int) [][]SearchResult

BatchSearch performs multiple searches efficiently

func (*SharedMemoryIndex) Close ¶

func (idx *SharedMemoryIndex) Close() error

Close unmaps memory and closes files

func (*SharedMemoryIndex) GetVector ¶

func (idx *SharedMemoryIndex) GetVector(index int) (*simd.Vec512, error)

GetVector returns a zero-copy reference to a vector

func (*SharedMemoryIndex) ReleaseWriter ¶

func (idx *SharedMemoryIndex) ReleaseWriter()

ReleaseWriter releases writer lock

func (*SharedMemoryIndex) SearchTopK ¶

func (idx *SharedMemoryIndex) SearchTopK(query *simd.Vec512, k int) []SearchResult

SearchTopK performs zero-copy k-NN search directly on shared memory

func (*SharedMemoryIndex) Stats ¶

func (idx *SharedMemoryIndex) Stats() SharedIndexStats

Stats returns index statistics

func (*SharedMemoryIndex) Sync ¶

func (idx *SharedMemoryIndex) Sync() error

Sync forces synchronization to disk

func (*SharedMemoryIndex) TryAcquireWriter ¶

func (idx *SharedMemoryIndex) TryAcquireWriter() bool

TryAcquireWriter attempts to become the writer process

func (*SharedMemoryIndex) WaitForWrites ¶

func (idx *SharedMemoryIndex) WaitForWrites(targetSeq uint64)

WaitForWrites waits for pending writes to complete

type SimilarityResult ¶

type SimilarityResult struct {
	Text1      string
	Text2      string
	Similarity float32
}

SimilarityResult represents a similarity comparison

type SimpleInt8Model512 ¶

type SimpleInt8Model512 struct {
	// contains filtered or unexported fields
}

SimpleInt8Model512 is a simple version that works without external C deps

func LoadSimpleInt8Model512 ¶

func LoadSimpleInt8Model512() (*SimpleInt8Model512, error)

LoadSimpleInt8Model512 loads the int8 model with built-in tokenizer (singleton)

func (*SimpleInt8Model512) Close ¶

func (m *SimpleInt8Model512) Close() error

Close is provided for API compatibility with heavier model implementations.

func (*SimpleInt8Model512) Embed ¶

func (m *SimpleInt8Model512) Embed(text string) ([]float32, error)

Embed embeds text using simple int8 model

func (*SimpleInt8Model512) EmbedBatchInt8 ¶

func (m *SimpleInt8Model512) EmbedBatchInt8(texts []string) ([]*Int8Result512, error)

EmbedBatchInt8 efficiently processes multiple texts in a single call

func (*SimpleInt8Model512) EmbedBatchInt8Optimized ¶

func (m *SimpleInt8Model512) EmbedBatchInt8Optimized(texts []string, progressCallback func(processed, total int)) ([]*Int8Result512, error)

EmbedBatchInt8Optimized provides the fastest batch processing with memory optimization

func (*SimpleInt8Model512) EmbedDim ¶

func (m *SimpleInt8Model512) EmbedDim() int

EmbedDim returns the embedding dimensionality for API compatibility.

func (*SimpleInt8Model512) EmbedFast ¶

func (m *SimpleInt8Model512) EmbedFast(text string) ([]float32, func())

EmbedFast is the zero-allocation embedding path using buffer pools

func (*SimpleInt8Model512) EmbedInt8 ¶

func (m *SimpleInt8Model512) EmbedInt8(text string) (*Int8Result512, error)

EmbedInt8 returns int8 quantized embedding

func (*SimpleInt8Model512) EmbedTokens ¶

func (m *SimpleInt8Model512) EmbedTokens(tokens []int16) ([]float32, error)

EmbedTokens embeds int16 token IDs directly

func (*SimpleInt8Model512) EmbedTokensInt8 ¶

func (m *SimpleInt8Model512) EmbedTokensInt8(tokens []int16) (*Int8Result512, error)

EmbedTokensInt8 converts pre-tokenized input directly into an int8 embedding result.

func (*SimpleInt8Model512) EmbedTokensInto ¶

func (m *SimpleInt8Model512) EmbedTokensInto(tokens []int16, result []float32) int

EmbedTokensInto embeds tokens into a pre-allocated buffer (zero-alloc hot path)

func (*SimpleInt8Model512) EmbedTokensIntoSIMD ¶

func (m *SimpleInt8Model512) EmbedTokensIntoSIMD(tokens []int16, result []float32) int

EmbedTokensIntoSIMD uses AVX2 SIMD for faster embedding accumulation

func (*SimpleInt8Model512) EmbeddingTable ¶

func (m *SimpleInt8Model512) EmbeddingTable() [][]int8

EmbeddingTable returns the vocab embedding matrix (read-only).

func (*SimpleInt8Model512) ScaleTable ¶

func (m *SimpleInt8Model512) ScaleTable() []float32

ScaleTable returns per-token quantization scales (read-only).

func (*SimpleInt8Model512) Similarity ¶

func (m *SimpleInt8Model512) Similarity(text1, text2 string) (float32, error)

Similarity computes cosine similarity between two texts

func (*SimpleInt8Model512) SimpleTokenize ¶

func (m *SimpleInt8Model512) SimpleTokenize(text string) []int16

SimpleTokenize performs basic tokenization (space-separated + subword)

func (*SimpleInt8Model512) VocabSize ¶

func (m *SimpleInt8Model512) VocabSize() int

VocabSize returns the size of the token vocabulary.

type SimplifiedSearchConfig ¶

type SimplifiedSearchConfig struct {
	Preset      SearchPreset
	DatasetSize int
	// Optional custom parameters (only used with CustomPreset)
	CustomConfig *SearchConfig
}

SimplifiedSearchConfig provides a simpler configuration interface

type TensorInfo ¶

type TensorInfo struct {
	Dtype       string   `json:"dtype"`
	Shape       []int    `json:"shape"`
	DataOffsets [2]int64 `json:"data_offsets"`
}

TensorInfo contains safetensors tensor metadata

type TokenData ¶

type TokenData struct {
	TokenIDs []int `json:"token_ids"`
	Length   int   `json:"length"`
}

TokenData represents tokenization from the real model

type TokenFrequencyData ¶

type TokenFrequencyData struct {
	TokenizerName string                `json:"tokenizer_name"`
	VocabSize     int                   `json:"vocab_size"`
	Stopwords     []int                 `json:"stopwords"`
	SingleTokens  TokenFrequencySection `json:"single_tokens"`
	Bigrams       TokenPatternSection   `json:"bigrams"`
	Trigrams      TokenPatternSection   `json:"trigrams"`
	Fourgrams     TokenPatternSection   `json:"fourgrams"`
	Stats         map[string]int        `json:"stats"`
}

TokenFrequencyData represents the frequency analysis results

type TokenFrequencySection ¶

type TokenFrequencySection struct {
	IDs    []int `json:"ids"`
	Counts []int `json:"counts"`
}

type TokenPatternCache ¶

type TokenPatternCache struct {
	// contains filtered or unexported fields
}

TokenPatternCache stores precomputed embeddings for common token patterns

func NewTokenPatternCache ¶

func NewTokenPatternCache(freqFile, embeddingFile string) (*TokenPatternCache, error)

NewTokenPatternCache creates a new cache with precomputed embeddings

func (*TokenPatternCache) BatchGetEmbeddings ¶

func (c *TokenPatternCache) BatchGetEmbeddings(tokenBatches [][]int) ([]*CachedEmbedding, []bool)

BatchGetEmbeddings retrieves embeddings for multiple patterns in parallel

func (*TokenPatternCache) ComputeEmbeddingWithCache ¶

func (c *TokenPatternCache) ComputeEmbeddingWithCache(tokens []int, computeFn func([]int) ([]float32, error)) ([]float32, error)

ComputeEmbeddingWithCache computes embedding using cache where possible

func (*TokenPatternCache) FilterStopwords ¶

func (c *TokenPatternCache) FilterStopwords(tokens []int, textLength int) []int

FilterStopwords removes stopwords from token sequence if text is long

func (*TokenPatternCache) GetCachedEmbedding ¶

func (c *TokenPatternCache) GetCachedEmbedding(tokens []int) (*CachedEmbedding, bool)

GetCachedEmbedding tries to retrieve cached embedding for token pattern

func (*TokenPatternCache) GetStats ¶

func (c *TokenPatternCache) GetStats() map[string]interface{}

GetStats returns cache statistics

func (*TokenPatternCache) PrecomputeCommonPatterns ¶

func (c *TokenPatternCache) PrecomputeCommonPatterns(model *EmbeddingModel, patterns [][]int)

PrecomputeCommonPatterns adds embeddings for common patterns

type TokenPatternSection ¶

type TokenPatternSection struct {
	Patterns [][]int `json:"patterns"`
	Counts   []int   `json:"counts"`
}

type TokenizerOptimizations ¶

type TokenizerOptimizations struct {
	// contains filtered or unexported fields
}

TokenizerOptimizations provides fast tokenization helpers

func NewTokenizerOptimizations ¶

func NewTokenizerOptimizations(maxCacheSize int) *TokenizerOptimizations

NewTokenizerOptimizations creates optimized tokenizer wrapper

func (*TokenizerOptimizations) ClearCache ¶

func (t *TokenizerOptimizations) ClearCache()

ClearCache clears the tokenization cache

func (*TokenizerOptimizations) TokenizeCached ¶

func (t *TokenizerOptimizations) TokenizeCached(text string, tokenizeFn func(string) ([]uint32, error)) ([]int, error)

TokenizeCached performs cached tokenization

type UnifiedModelConfig ¶

type UnifiedModelConfig struct {
	UseInt8      bool   // Use optimized int8 model (recommended)
	ModelDir     string // Optional custom model directory
	ForceFloat32 bool   // Force float32 for compatibility (slower)
}

UnifiedModelConfig configures model loading behavior

func DefaultFastConfig ¶

func DefaultFastConfig() *UnifiedModelConfig

DefaultFastConfig returns optimized configuration for maximum performance

type VectorBuffer ¶

type VectorBuffer struct {
	// contains filtered or unexported fields
}

VectorBuffer provides a reusable buffer for vector operations

func NewVectorBuffer ¶

func NewVectorBuffer(capacity int) *VectorBuffer

NewVectorBuffer creates a vector buffer with specified capacity

func (*VectorBuffer) Add ¶

func (vb *VectorBuffer) Add(vec simd.Vec512, scale float32)

Add adds a vector to the buffer

func (*VectorBuffer) GetVectors ¶

func (vb *VectorBuffer) GetVectors() ([]simd.Vec512, []float32)

GetVectors returns the current vectors and scales

func (*VectorBuffer) IsFull ¶

func (vb *VectorBuffer) IsFull() bool

IsFull returns true if buffer is at capacity

func (*VectorBuffer) Len ¶

func (vb *VectorBuffer) Len() int

Len returns current buffer length

func (*VectorBuffer) Reset ¶

func (vb *VectorBuffer) Reset()

Reset clears the buffer for reuse

type VectorIndex ¶

type VectorIndex struct {
	// contains filtered or unexported fields
}

VectorIndex provides high-performance vector search capabilities

func NewVectorIndex ¶

func NewVectorIndex(model *EmbeddingModel, config VectorIndexConfig) *VectorIndex

NewVectorIndex creates a new vector index

func (*VectorIndex) AddDocument ¶

func (idx *VectorIndex) AddDocument(doc Document) error

AddDocument adds a document to the index

func (*VectorIndex) AddDocuments ¶

func (idx *VectorIndex) AddDocuments(docs []Document) error

AddDocuments adds multiple documents efficiently

func (*VectorIndex) AddDocumentsBulkGPU ¶

func (idx *VectorIndex) AddDocumentsBulkGPU(docs []Document) error

AddDocumentsBulkGPU forces GPU bulk indexing regardless of size

func (*VectorIndex) AddDocumentsWithMonitoring ¶

func (idx *VectorIndex) AddDocumentsWithMonitoring(docs []Document) (<-chan interface{}, error)

AddDocumentsWithMonitoring adds documents with real-time GPU monitoring

func (*VectorIndex) Search ¶

func (idx *VectorIndex) Search(query string, k int) ([]SearchResult, error)

Search performs similarity search

func (*VectorIndex) Size ¶

func (idx *VectorIndex) Size() int