vectorstore

package

v0.1.1 Latest Latest Go to latest Published: Dec 7, 2025 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/aixgo-dev/aixgo

Links

Open Source Insights

README ¶

Vector Store Package

The vectorstore package provides a unified interface for working with vector databases in Aixgo. It enables semantic search, similarity matching, and retrieval-augmented generation (RAG) workflows through a provider-agnostic API.

Overview

Vector databases store high-dimensional embeddings alongside metadata, enabling fast similarity search. This package abstracts common operations across different vector database providers, allowing you to switch backends without changing your application code.

Key Concepts

Collection: Isolated namespace with use-case specific configuration
Document: A piece of content with its embedding vector and metadata
Embedding: A numerical representation (vector) of content, typically generated by a machine learning model
Query: Type-safe search with composable filters
Multi-modal Content: Support for text, images, audio, and video

Features

Collection-Based Isolation: Separate namespaces for different use cases
Provider Agnostic: Switch between memory, Firestore, and future providers using the same API
Multi-Modal Support: Text, images, audio, and video content
Flexible Filtering: Compose type-safe filters with And/Or/Not
Batch Operations: Efficiently upsert and retrieve multiple documents
Streaming Support: Handle large result sets with iterators
Multiple Distance Metrics: Cosine similarity, Euclidean distance, dot product
Production Ready: Built-in validation, error handling, and concurrency safety
Extensible: Easy to add custom vector store providers

Installation

go get github.com/aixgo-dev/aixgo/pkg/vectorstore

Provider-Specific Dependencies

# For Firestore support
go get cloud.google.com/go/firestore
go get github.com/aixgo-dev/aixgo/pkg/vectorstore/firestore

# For in-memory support (included in base package)
go get github.com/aixgo-dev/aixgo/pkg/vectorstore/memory

Quick Start

Basic Usage with Memory Provider

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/aixgo-dev/aixgo/pkg/vectorstore"
    "github.com/aixgo-dev/aixgo/pkg/vectorstore/memory"
)

func main() {
    // Create an in-memory vector store
    store, err := memory.New()
    if err != nil {
        log.Fatal(err)
    }
    defer store.Close()

    // Get a collection
    coll := store.Collection("documents")

    ctx := context.Background()

    // Create a document with embedding
    doc := &vectorstore.Document{
        ID:      "doc-1",
        Content: vectorstore.NewTextContent("Aixgo is a production-grade AI agent framework for Go"),
        Embedding: vectorstore.NewEmbedding(
            []float32{0.1, 0.2, 0.3, /* ... 384 dimensions total */},
            "text-embedding-3-small",
        ),
        Tags: []string{"documentation", "framework"},
    }

    // Store the document
    result, err := coll.Upsert(ctx, doc)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("Inserted: %d documents\n", result.Inserted)

    // Query for similar documents
    query := &vectorstore.Query{
        Embedding: vectorstore.NewEmbedding(
            []float32{0.11, 0.21, 0.31, /* ... */},
            "text-embedding-3-small",
        ),
        Limit:    5,
        MinScore: 0.7,
    }

    results, err := coll.Query(ctx, query)
    if err != nil {
        log.Fatal(err)
    }

    for _, match := range results.Matches {
        fmt.Printf("Score: %.3f - %s\n", match.Score, match.Document.Content.String())
    }
}

Production Setup with Firestore

import (
    "github.com/aixgo-dev/aixgo/pkg/vectorstore/firestore"
)

// Create Firestore vector store
store, err := firestore.New(
    firestore.WithProject("my-gcp-project"),
    firestore.WithDatabase("(default)"),
    firestore.WithCredentials("/path/to/service-account.json"),
)
if err != nil {
    log.Fatal(err)
}
defer store.Close()

// Get collection with TTL for semantic caching
cache := store.Collection("cache",
    vectorstore.WithTTL(5*time.Minute),
    vectorstore.WithDeduplication(true),
)

Supported Providers

Provider	Status	Best For	Persistence	Scalability
memory	Available	Development, testing	No	Low (10K docs)
firestore	Available	Production, serverless	Yes	High
qdrant	Planned	High-performance search	Yes	Very High
pgvector	Planned	Existing PostgreSQL apps	Yes	High

Memory Provider

Pros:

Zero setup required
Fast for small datasets
Perfect for testing

Cons:

Data lost on restart
Limited capacity (default 10K documents)
Brute-force search (O(n) complexity)

Use Cases:

Unit tests
Local development
Prototyping

Firestore Provider

Pros:

Serverless, fully managed
Automatic scaling
Real-time sync capabilities
Built-in security rules

Cons:

Requires GCP setup
Costs based on operations
Index creation takes time

Use Cases:

Production deployments
Serverless architectures
Projects already using Firebase/GCP

API Reference

VectorStore Interface

type VectorStore interface {
    // Collection returns a collection with the given name and options
    Collection(name string, opts ...CollectionOption) Collection

    // ListCollections returns all collection names
    ListCollections(ctx context.Context) ([]string, error)

    // DeleteCollection removes a collection and all its documents
    DeleteCollection(ctx context.Context, name string) error

    // Stats returns store-level statistics
    Stats(ctx context.Context) (*StoreStats, error)

    // Close closes the connection
    Close() error
}

Collection Interface

type Collection interface {
    // Upsert inserts or updates documents
    Upsert(ctx context.Context, docs ...*Document) (*UpsertResult, error)

    // UpsertBatch efficiently upserts multiple documents
    UpsertBatch(ctx context.Context, docs []*Document, opts ...BatchOption) (*UpsertResult, error)

    // Query performs similarity search
    Query(ctx context.Context, query *Query) (*QueryResult, error)

    // QueryStream returns an iterator for large result sets
    QueryStream(ctx context.Context, query *Query) (ResultIterator, error)

    // Get retrieves documents by IDs
    Get(ctx context.Context, ids ...string) ([]*Document, error)

    // Delete removes documents by IDs
    Delete(ctx context.Context, ids ...string) (*DeleteResult, error)

    // DeleteByFilter removes documents matching a filter
    DeleteByFilter(ctx context.Context, filter Filter) (*DeleteResult, error)

    // Count returns the number of documents matching a filter
    Count(ctx context.Context, filter Filter) (int64, error)
}

Document Structure

type Document struct {
    ID        string       // Unique identifier
    Content   *Content     // Multi-modal (text/image/audio/video)
    Embedding *Embedding   // Vector + model metadata
    Scope     *Scope       // Multi-tenant isolation
    Temporal  *Temporal    // Time-based metadata
    Tags      []string     // Indexed labels
    Metadata  map[string]any // Free-form data
}

Query Structure

type Query struct {
    Embedding         *Embedding     // Query vector
    Filters           Filter         // Composable filters
    Limit             int            // Number of results (default: 10)
    Offset            int            // Pagination offset
    MinScore          float32        // Minimum similarity (0.0-1.0)
    Metric            DistanceMetric // Similarity metric
    IncludeEmbeddings bool           // Include vectors in results
    IncludeContent    bool           // Include content in results
    SortBy            []SortBy       // Hybrid ranking
    Explain           bool           // Debug info
}

Composable Filters

// Composite filters
And(filters...)
Or(filters...)
Not(filter)

// Field-based filters
Eq(field, value)
Ne(field, value)
Gt/Gte/Lt/Lte(field, value)
In/NotIn(field, values...)
Contains/StartsWith/EndsWith(field, substring)

// Tag filters
TagFilter(tag)
TagsFilter(tags...)  // All tags
AnyTagFilter(tags...) // Any tag

// Scope filters
ScopeFilter(scope)
TenantFilter(tenant)
UserFilter(user)
SessionFilter(session)

// Time filters
CreatedAfter/Before(time)
UpdatedAfter/Before(time)
NotExpired()

// Score filters
ScoreAbove/Below/AtLeast(threshold)

Distance Metrics

const (
    DistanceMetricCosine     = "cosine"      // Default, range: -1 to 1
    DistanceMetricEuclidean  = "euclidean"   // Range: 0 to infinity
    DistanceMetricDotProduct = "dot_product" // Range: -inf to +inf
)

When to use:

Cosine: Most text embeddings (normalized vectors)
Euclidean: When vector magnitude matters
Dot Product: Faster than cosine for normalized vectors

Best Practices

1. Collection Design

Use collections for isolation:

// Semantic cache
cache := store.Collection("cache",
    vectorstore.WithTTL(5*time.Minute),
    vectorstore.WithDeduplication(true),
)

// Agent memory
memory := store.Collection("agent-memory",
    vectorstore.WithScope("user", "session"),
)

// Document store
docs := store.Collection("documents",
    vectorstore.WithIndexing(vectorstore.IndexTypeHNSW),
    vectorstore.WithDimensions(768),
)

2. Document Design

Keep content focused:

// Good: Focused, semantic chunk
doc := &vectorstore.Document{
    ID:      "user-guide-installation",
    Content: vectorstore.NewTextContent("To install Aixgo, run: go get github.com/aixgo-dev/aixgo"),
    Embedding: vectorstore.NewEmbedding(embedding, "text-embedding-3-small"),
    Tags:    []string{"installation", "quickstart"},
}

// Avoid: Too large, mixed topics
doc := &vectorstore.Document{
    Content: vectorstore.NewTextContent("... entire 50-page user manual ..."),
}

3. Batch Operations

Prefer batch upserts:

// Good: Batch operation
result, err := coll.UpsertBatch(ctx, documents,
    vectorstore.WithBatchSize(100),
    vectorstore.WithParallelism(4),
)

// Avoid: Individual operations
for _, doc := range documents {
    coll.Upsert(ctx, doc) // Inefficient
}

4. Error Handling

Validate before operations:

// Documents are validated automatically
result, err := coll.Upsert(ctx, doc)
if err != nil {
    return fmt.Errorf("upsert failed: %w", err)
}

5. Context and Timeouts

Always use timeouts:

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

results, err := coll.Query(ctx, query)
if err != nil {
    if errors.Is(err, context.DeadlineExceeded) {
        return fmt.Errorf("query timeout: %w", err)
    }
    return err
}

6. Resource Cleanup

Always close connections:

store, err := memory.New()
if err != nil {
    return err
}
defer store.Close() // Important!

Troubleshooting

Firestore Permission Denied

Error: rpc error: code = PermissionDenied

Solution: Grant proper IAM roles:

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:SERVICE_ACCOUNT@PROJECT.iam.gserviceaccount.com" \
  --role="roles/datastore.user"

Firestore Index Not Ready

Error: index not found or not ready

Solution: Wait for index creation (5-10 minutes):

# Check index status
gcloud firestore indexes composite list --format=table

# Create index
gcloud firestore indexes composite create \
  --collection-group=embeddings \
  --query-scope=COLLECTION \
  --field-config=field-path=embedding,vector-config='{"dimension":"384","flat":{}}'

Query Returns No Results

Check these common issues:

MinScore too high: Try lowering or removing it
Wrong distance metric: Use cosine for most text embeddings
Filters too restrictive: Test without filters first
Empty database: Verify documents were stored

// Debug: Query without filters
results, err := coll.Query(ctx, &vectorstore.Query{
    Embedding: queryVec,
    Limit:     100,
    MinScore:  0.0, // Remove score threshold
})

Examples

Example 1: Semantic Caching

cache := store.Collection("cache",
    vectorstore.WithTTL(5*time.Minute),
    vectorstore.WithDeduplication(true),
)

// Store
doc := &vectorstore.Document{
    ID:        queryHash,
    Content:   vectorstore.NewTextContent(query),
    Embedding: vectorstore.NewEmbedding(queryEmbedding, "model"),
    Temporal:  vectorstore.NewTemporalWithTTL(5*time.Minute),
    Metadata:  map[string]any{"result": cachedResult},
}
cache.Upsert(ctx, doc)

// Lookup
query := &vectorstore.Query{
    Embedding: vectorstore.NewEmbedding(queryEmbedding, "model"),
    MinScore:  0.95, // High threshold
    Limit:     1,
}
results, _ := cache.Query(ctx, query)
if results.HasMatches() {
    // Cache hit
    result := results.TopMatch().Document.Metadata["result"]
}

Example 2: Hybrid Search with Filters

// Search for recent documentation in English
results, err := coll.Query(ctx, &vectorstore.Query{
    Embedding: queryEmbedding,
    Filters: vectorstore.And(
        vectorstore.Eq("doc_type", "documentation"),
        vectorstore.Eq("language", "en"),
        vectorstore.Eq("status", "published"),
        vectorstore.Not(vectorstore.Eq("archived", true)),
    ),
    Limit:    10,
    MinScore: 0.75,
})

Example 3: Streaming Large Results

iter, err := coll.QueryStream(ctx, &vectorstore.Query{
    Embedding: queryEmbedding,
    Limit:     10000,
})
defer iter.Close()

for iter.Next() {
    match := iter.Match()
    fmt.Printf("Score: %.4f, Doc: %s\n", match.Score, match.Document.ID)
}

if err := iter.Err(); err != nil {
    log.Fatal(err)
}

media := store.Collection("media",
    vectorstore.WithDimensions(512), // CLIP dimensions
)

// Store image
doc := &vectorstore.Document{
    ID:        imageID,
    Content:   vectorstore.NewImageURL(imageURL),
    Embedding: vectorstore.NewEmbedding(clipEmbedding, "clip-vit-base-patch32"),
    Tags:      []string{"photo", "product"},
}
media.Upsert(ctx, doc)

// Query with image embedding
results, _ := media.Query(ctx, &vectorstore.Query{
    Embedding: queryImageEmbedding,
    Filters:   vectorstore.TagFilter("product"),
})

Performance Considerations

Memory Provider

Complexity: O(n) for search (brute-force)
Throughput: ~10,000 searches/sec (depends on dimensions)
Capacity: Up to 100K documents on typical hardware

Firestore Provider

Complexity: O(log n) with vector index
Throughput: ~1,000 searches/sec (network bound)
Capacity: Unlimited (serverless)

Optimization Tips

Use batch operations for bulk inserts
Set appropriate Limit (smaller = faster)
Add filters to reduce search space
Consider embedding dimensions (smaller = faster, less accurate)
Use streaming for large result sets
Disable content/embeddings in results if not needed

Next Steps

RAG Implementation: See examples/rag-agent
Embeddings: See pkg/embeddings/README.md
Architecture: See ARCHITECTURE.md for design details
Quick Reference: See QUICK_REFERENCE.md for API cheat sheet

Resources

Contributing

To add a new vector store provider:

Implement the VectorStore and Collection interfaces
Add provider-specific constructors (e.g., yourprovider.New())
Add tests and documentation
Submit a pull request

See CONTRIBUTING.md for details.

Documentation ¶

Overview ¶

Example (AgentMemory) ¶

Example_agentMemory demonstrates using collections for agent memory.

package main

import (
	"context"
	"fmt"
	"time"

	"github.com/aixgo-dev/aixgo/pkg/vectorstore"
)

func main() {
	var store vectorstore.VectorStore
	defer func() { _ = store.Close() }()

	// Create memory collection with scope requirements
	memory := store.Collection("agent-memory",
		vectorstore.WithScope("user", "session"),
		vectorstore.WithMaxDocuments(1000),
	)

	ctx := context.Background()

	// Store a memory
	memoryDoc := &vectorstore.Document{
		ID:      "memory-1",
		Content: vectorstore.NewTextContent("User prefers dark mode"),
		Embedding: vectorstore.NewEmbedding(
			[]float32{0.5, 0.6, 0.7},
			"text-embedding-3-small",
		),
		Scope: vectorstore.NewScope("tenant1", "user123", "session456"),
		Tags:  []string{"preference", "ui"},
		Temporal: &vectorstore.Temporal{
			CreatedAt: time.Now(),
			UpdatedAt: time.Now(),
		},
	}

	_, err := memory.Upsert(ctx, memoryDoc)
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		return
	}

	// Retrieve memories for a specific user
	query := &vectorstore.Query{
		Embedding: vectorstore.NewEmbedding(
			[]float32{0.5, 0.6, 0.7},
			"text-embedding-3-small",
		),
		Filters: vectorstore.And(
			vectorstore.UserFilter("user123"),
			vectorstore.TagFilter("preference"),
		),
		Limit: 5,
	}

	result, _ := memory.Query(ctx, query)
	for _, match := range result.Matches {
		fmt.Printf("Memory: %s (score: %.2f)\n",
			match.Document.Content.String(),
			match.Score,
		)
	}
}

Example (Basic) ¶

Example demonstrates basic usage of the VectorStore interface.

package main

import (
	"context"
	"fmt"

	"github.com/aixgo-dev/aixgo/pkg/vectorstore"
)

func main() {
	// NOTE: This is a documentation example showing the API.
	// It won't run without a real store implementation.

	// Create a vector store (implementation-specific)
	var store vectorstore.VectorStore // = memory.New() or firestore.New() etc.
	defer func() { _ = store.Close() }()

	// Create a collection for documents
	docs := store.Collection("documents")

	// Create a document
	doc := &vectorstore.Document{
		ID:      "doc1",
		Content: vectorstore.NewTextContent("The quick brown fox jumps over the lazy dog"),
		Embedding: vectorstore.NewEmbedding(
			[]float32{0.1, 0.2, 0.3, 0.4, 0.5},
			"text-embedding-3-small",
		),
		Tags: []string{"example", "demo"},
	}

	// Insert the document
	ctx := context.Background()
	result, err := docs.Upsert(ctx, doc)
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		return
	}
	fmt.Printf("Inserted: %d documents\n", result.Inserted)

	// Query for similar documents
	query := &vectorstore.Query{
		Embedding: vectorstore.NewEmbedding(
			[]float32{0.1, 0.2, 0.3, 0.4, 0.5},
			"text-embedding-3-small",
		),
		Limit: 10,
	}

	queryResult, _ := docs.Query(ctx, query)
	fmt.Printf("Found: %d matches\n", queryResult.Count())
}

Example (BatchOperations) ¶

Example_batchOperations demonstrates batch upsert with progress tracking.

package main

import (
	"context"
	"fmt"

	"github.com/aixgo-dev/aixgo/pkg/vectorstore"
)

func main() {
	var store vectorstore.VectorStore
	defer func() { _ = store.Close() }()

	docs := store.Collection("documents")
	ctx := context.Background()

	// Create many documents
	documents := make([]*vectorstore.Document, 1000)
	for i := range documents {
		documents[i] = &vectorstore.Document{
			ID:      fmt.Sprintf("doc-%d", i),
			Content: vectorstore.NewTextContent(fmt.Sprintf("Document %d", i)),
			Embedding: vectorstore.NewEmbedding(
				[]float32{float32(i) / 1000.0, 0.5, 0.5},
				"model",
			),
		}
	}

	// Batch insert with progress tracking
	result, err := docs.UpsertBatch(ctx, documents,
		vectorstore.WithBatchSize(100),
		vectorstore.WithParallelism(4),
		vectorstore.WithProgressCallback(func(processed, total int) {
			pct := float64(processed) / float64(total) * 100
			fmt.Printf("Progress: %d/%d (%.1f%%)\n", processed, total, pct)
		}),
	)
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		return
	}

	fmt.Printf("Inserted: %d, Failed: %d\n", result.Inserted, result.Failed)
}

Example (ComplexFilters) ¶

Example_complexFilters demonstrates complex filter queries.

package main

import (
	"context"
	"fmt"
	"time"

	"github.com/aixgo-dev/aixgo/pkg/vectorstore"
)

func main() {
	var store vectorstore.VectorStore
	defer func() { _ = store.Close() }()

	docs := store.Collection("products")
	ctx := context.Background()

	// Complex filter: recent, high-rated electronics in stock
	query := &vectorstore.Query{
		Embedding: vectorstore.NewEmbedding(
			[]float32{0.1, 0.2, 0.3},
			"model",
		),
		Filters: vectorstore.And(
			vectorstore.TagFilter("electronics"),
			vectorstore.Gte("rating", 4.5),
			vectorstore.Eq("in_stock", true),
			vectorstore.CreatedAfter(time.Now().Add(-30*24*time.Hour)),
			vectorstore.Or(
				vectorstore.Contains("category", "phone"),
				vectorstore.Contains("category", "laptop"),
			),
		),
		SortBy: []vectorstore.SortBy{
			vectorstore.SortByScore(),
			vectorstore.SortByField("rating", true),
		},
		Limit: 20,
	}

	result, _ := docs.Query(ctx, query)
	fmt.Printf("Found %d matching products\n", result.Count())
}

Example (ConversationHistory) ¶

Example_conversationHistory demonstrates storing conversation history.

package main

import (
	"context"
	"fmt"
	"time"

	"github.com/aixgo-dev/aixgo/pkg/vectorstore"
)

func main() {
	var store vectorstore.VectorStore
	defer func() { _ = store.Close() }()

	// Create conversation collection
	conversations := store.Collection("conversations",
		vectorstore.WithScope("user", "thread"),
		vectorstore.WithMaxDocuments(100000),
	)

	ctx := context.Background()

	// Store a conversation turn
	turn := &vectorstore.Document{
		ID:      "turn-1",
		Content: vectorstore.NewTextContent("User: What's the weather?\nAssistant: It's sunny today."),
		Embedding: vectorstore.NewEmbedding(
			[]float32{0.2, 0.3, 0.4},
			"text-embedding-3-small",
		),
		Scope: &vectorstore.Scope{
			User:   "user123",
			Thread: "thread-abc",
		},
		Temporal: &vectorstore.Temporal{
			CreatedAt: time.Now(),
			EventTime: &[]time.Time{time.Now()}[0],
		},
		Tags: []string{"weather", "conversation"},
		Metadata: map[string]any{
			"turn_number": 1,
			"user_id":     "user123",
		},
	}

	_, err := conversations.Upsert(ctx, turn)
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		return
	}

	// Retrieve recent conversation history
	query := &vectorstore.Query{
		Filters: vectorstore.And(
			vectorstore.UserFilter("user123"),
			vectorstore.Eq("thread", "thread-abc"),
			vectorstore.CreatedAfter(time.Now().Add(-24*time.Hour)),
		),
		SortBy: []vectorstore.SortBy{
			vectorstore.SortByCreatedAt(false), // Ascending (chronological)
		},
		Limit: 20,
	}

	result, _ := conversations.Query(ctx, query)
	fmt.Printf("Found %d conversation turns\n", result.Count())
}

Example (Deduplication) ¶

Example_deduplication demonstrates content deduplication.

package main

import (
	"context"
	"fmt"

	"github.com/aixgo-dev/aixgo/pkg/vectorstore"
)

func main() {
	var store vectorstore.VectorStore
	defer func() { _ = store.Close() }()

	// Create collection with aggressive deduplication
	docs := store.Collection("documents",
		vectorstore.WithDeduplicationThreshold(0.95),
	)

	ctx := context.Background()

	// Insert multiple similar documents
	documents := []*vectorstore.Document{
		{
			ID:        "doc1",
			Content:   vectorstore.NewTextContent("The quick brown fox"),
			Embedding: vectorstore.NewEmbedding([]float32{0.1, 0.2, 0.3}, "model"),
		},
		{
			ID:        "doc2",
			Content:   vectorstore.NewTextContent("The quick brown fox"), // Duplicate
			Embedding: vectorstore.NewEmbedding([]float32{0.1, 0.2, 0.3}, "model"),
		},
		{
			ID:        "doc3",
			Content:   vectorstore.NewTextContent("A different document"),
			Embedding: vectorstore.NewEmbedding([]float32{0.9, 0.8, 0.7}, "model"),
		},
	}

	result, err := docs.UpsertBatch(ctx, documents)
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		return
	}
	fmt.Printf("Inserted: %d, Deduplicated: %d\n",
		result.Inserted,
		result.Deduplicated,
	)
}

Example (Multimodal) ¶

Example_multimodal demonstrates multi-modal content.

package main

import (
	"context"
	"fmt"

	"github.com/aixgo-dev/aixgo/pkg/vectorstore"
)

func main() {
	var store vectorstore.VectorStore
	defer func() { _ = store.Close() }()

	media := store.Collection("media",
		vectorstore.WithDimensions(512),
	)

	ctx := context.Background()

	// Store an image
	imageDoc := &vectorstore.Document{
		ID: "img1",
		Content: vectorstore.NewImageURL(
			"https://example.com/photo.jpg",
		),
		Embedding: vectorstore.NewEmbedding(
			make([]float32, 512), // CLIP embedding
			"clip-vit-base-patch32",
		),
		Tags: []string{"photo", "landscape"},
	}

	_, err := media.Upsert(ctx, imageDoc)
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		return
	}

	// Query with image embedding
	query := &vectorstore.Query{
		Embedding: vectorstore.NewEmbedding(
			make([]float32, 512),
			"clip-vit-base-patch32",
		),
		Filters: vectorstore.TagFilter("photo"),
		Limit:   10,
	}

	result, _ := media.Query(ctx, query)
	fmt.Printf("Found %d similar images\n", result.Count())
}

Example (SemanticCache) ¶

Example_semanticCache demonstrates using collections for semantic caching.

package main

import (
	"context"
	"fmt"
	"time"

	"github.com/aixgo-dev/aixgo/pkg/vectorstore"
)

func main() {
	var store vectorstore.VectorStore
	defer func() { _ = store.Close() }()

	// Create a cache collection with TTL and deduplication
	cache := store.Collection("cache",
		vectorstore.WithTTL(5*time.Minute),
		vectorstore.WithDeduplication(true),
		vectorstore.WithMaxDocuments(10000),
	)

	ctx := context.Background()

	// Cache a query result
	cacheDoc := &vectorstore.Document{
		ID:      "query-hash-123",
		Content: vectorstore.NewTextContent("What is the capital of France?"),
		Embedding: vectorstore.NewEmbedding(
			[]float32{0.1, 0.2, 0.3},
			"text-embedding-3-small",
		),
		Temporal: vectorstore.NewTemporalWithTTL(5 * time.Minute),
		Tags:     []string{"qa", "geography"},
		Metadata: map[string]any{
			"answer": "Paris",
			"cached": time.Now(),
		},
	}

	_, err := cache.Upsert(ctx, cacheDoc)
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		return
	}

	// Lookup cached result by similarity
	query := &vectorstore.Query{
		Embedding: vectorstore.NewEmbedding(
			[]float32{0.11, 0.19, 0.31},
			"text-embedding-3-small",
		),
		Limit:    1,
		MinScore: 0.95, // High threshold for cache hits
	}

	result, _ := cache.Query(ctx, query)
	if result.HasMatches() {
		answer := result.TopMatch().Document.Metadata["answer"]
		fmt.Printf("Cache hit! Answer: %s\n", answer)
	}
}

Example (Streaming) ¶

Example_streaming demonstrates streaming query results.

package main

import (
	"context"
	"fmt"

	"github.com/aixgo-dev/aixgo/pkg/vectorstore"
)

func main() {
	var store vectorstore.VectorStore
	defer func() { _ = store.Close() }()

	docs := store.Collection("documents")
	ctx := context.Background()

	query := &vectorstore.Query{
		Embedding: vectorstore.NewEmbedding(
			[]float32{0.1, 0.2, 0.3},
			"model",
		),
		Limit: 1000, // Large result set
	}

	// Stream results
	iter, _ := docs.QueryStream(ctx, query)
	defer func() { _ = iter.Close() }()

	count := 0
	for iter.Next() {
		match := iter.Match()
		if match.Score >= 0.8 {
			count++
			fmt.Printf("High score match: %s\n", match.Document.ID)
		}
	}

	if err := iter.Err(); err != nil {
		fmt.Printf("Error: %v\n", err)
	}

	fmt.Printf("Found %d high-score matches\n", count)
}

Example (TimeBasedQueries) ¶

Example_timeBasedQueries demonstrates temporal queries.

package main

import (
	"context"
	"fmt"
	"time"

	"github.com/aixgo-dev/aixgo/pkg/vectorstore"
)

func main() {
	var store vectorstore.VectorStore
	defer func() { _ = store.Close() }()

	events := store.Collection("events")
	ctx := context.Background()

	// Query for events in the last week that haven't expired
	query := &vectorstore.Query{
		Filters: vectorstore.And(
			vectorstore.CreatedAfter(time.Now().Add(-7*24*time.Hour)),
			vectorstore.NotExpired(),
			vectorstore.TagsFilter("important", "scheduled"),
		),
		SortBy: []vectorstore.SortBy{
			vectorstore.SortByCreatedAt(true), // Most recent first
		},
		Limit: 50,
	}

	result, _ := events.Query(ctx, query)
	for _, match := range result.Matches {
		fmt.Printf("Event: %s at %s\n",
			match.Document.Content.String(),
			match.Document.Temporal.CreatedAt,
		)
	}
}

Index ¶

func ForEach(iter ResultIterator, fn func(*Match) error) error
func GetTagFilter(f Filter) (tag string, ok bool)
func GetTimeFilter(f Filter) (field TimeField, op FilterOperator, value time.Time, ok bool)
func IsAndFilter(f Filter) bool
func IsNotFilter(f Filter) bool
func IsOrFilter(f Filter) bool
func Validate(doc *Document) error
func ValidateContent(c *Content) error
func ValidateEmbedding(e *Embedding) error
func ValidateID(id string) error
func ValidateMetadataKey(key string) error
func ValidateScope(s *Scope) error
func ValidateTag(tag string) error
func ValidateTemporal(t *Temporal) error
type BatchConfig
- func ApplyBatchOptions(opts []BatchOption) *BatchConfig
- func (c *BatchConfig) Validate() error
type BatchOption
- func WithBatchSize(size int) BatchOption
- func WithContinueOnError(enabled bool) BatchOption
- func WithMaxRetries(max int) BatchOption
- func WithParallelism(n int) BatchOption
- func WithProgressCallback(callback func(processed, total int)) BatchOption
- func WithRetry(enabled bool) BatchOption
- func WithRetryDelay(delay time.Duration) BatchOption
- func WithValidation(enabled bool) BatchOption
type Collection
type CollectionConfig
- func ApplyOptions(opts []CollectionOption) *CollectionConfig
- func (c *CollectionConfig) Validate() error
type CollectionOption
- func WithAuditLog(enabled bool) CollectionOption
- func WithAutoEmbeddings(fn EmbeddingFunction) CollectionOption
- func WithDeduplication(enabled bool) CollectionOption
- func WithDeduplicationThreshold(threshold float32) CollectionOption
- func WithDimensions(dimensions int) CollectionOption
- func WithIndexing(indexType IndexType) CollectionOption
- func WithMaxDocuments(max int64) CollectionOption
- func WithMaxVersions(max int) CollectionOption
- func WithMetadata(metadata map[string]any) CollectionOption
- func WithScope(fields ...string) CollectionOption
- func WithTTL(ttl time.Duration) CollectionOption
- func WithVersioning(enabled bool) CollectionOption
type CollectionStats
type Content
- func NewAudioContent(data []byte, mimeType string) *Content
- func NewImageContent(data []byte, mimeType string) *Content
- func NewImageURL(url string) *Content
- func NewTextContent(text string) *Content
- func NewVideoContent(data []byte, mimeType string) *Content
- func (c *Content) DataBase64() string
- func (c *Content) String() string
type ContentType
type DeleteResult
- func (r *DeleteResult) Success() bool
type DistanceMetric
type Document
type Embedding
- func NewEmbedding(vector []float32, model string) *Embedding
- func NewNormalizedEmbedding(vector []float32, model string) *Embedding
- func (e *Embedding) Normalize()
type EmbeddingFunction
type ExplainStep
type Filter
- func And(filters ...Filter) Filter
- func AnyTagFilter(tags ...string) Filter
- func Contains(field string, substring string) Filter
- func CreatedAfter(t time.Time) Filter
- func CreatedBefore(t time.Time) Filter
- func EndsWith(field string, suffix string) Filter
- func Eq(field string, value any) Filter
- func Exists(field string) Filter
- func Expired() Filter
- func ExpiresAfter(t time.Time) Filter
- func ExpiresBefore(t time.Time) Filter
- func FieldFilter(field string, operator FilterOperator, value any) Filter
- func GetFilters(f Filter) []Filter
- func GetNotFilter(f Filter) (inner Filter, ok bool)
- func Gt(field string, value any) Filter
- func Gte(field string, value any) Filter
- func In(field string, values ...any) Filter
- func Lt(field string, value any) Filter
- func Lte(field string, value any) Filter
- func Ne(field string, value any) Filter
- func Not(filter Filter) Filter
- func NotExists(field string) Filter
- func NotExpired() Filter
- func NotIn(field string, values ...any) Filter
- func Or(filters ...Filter) Filter
- func ScopeFilter(scope *Scope) Filter
- func ScoreAbove(threshold float32) Filter
- func ScoreAtLeast(threshold float32) Filter
- func ScoreBelow(threshold float32) Filter
- func ScoreFilter(operator FilterOperator, value float32) Filter
- func SessionFilter(session string) Filter
- func StartsWith(field string, prefix string) Filter
- func TagFilter(tag string) Filter
- func TagsFilter(tags ...string) Filter
- func TenantFilter(tenant string) Filter
- func TimeFilter(field TimeField, operator FilterOperator, value time.Time) Filter
- func UpdatedAfter(t time.Time) Filter
- func UpdatedBefore(t time.Time) Filter
- func UserFilter(user string) Filter
type FilterOperator
- func GetFieldFilter(f Filter) (field string, op FilterOperator, value any, ok bool)
- func GetScoreFilter(f Filter) (op FilterOperator, value float32, ok bool)
type IndexType
type Match
- func CollectAll(iter ResultIterator) ([]*Match, error)
- func CollectN(iter ResultIterator, n int) ([]*Match, error)
type OperationTiming
type Query
- func NewFilterQuery(filters Filter) *Query
- func NewQuery(embedding *Embedding) *Query
- func (q *Query) Validate() error
type QueryExplain
- func (e *QueryExplain) ExplainString() string
type QueryResult
- func (r *QueryResult) AvgScore() float32
- func (r *QueryResult) Count() int
- func (r *QueryResult) Documents() []*Document
- func (r *QueryResult) Empty() bool
- func (r *QueryResult) FilterByScore(minScore float32) []*Match
- func (r *QueryResult) FilterByTag(tag string) []*Match
- func (r *QueryResult) First(n int) []*Match
- func (r *QueryResult) GroupByTag() map[string][]*Match
- func (r *QueryResult) HasMatches() bool
- func (r *QueryResult) HasMore() bool
- func (r *QueryResult) IDs() []string
- func (r *QueryResult) Last(n int) []*Match
- func (r *QueryResult) MatchByID(id string) *Match
- func (r *QueryResult) MaxScore() float32
- func (r *QueryResult) MinScore() float32
- func (r *QueryResult) NextOffset() int
- func (r *QueryResult) PrevOffset() int
- func (r *QueryResult) Scores() []float32
- func (r *QueryResult) Slice(start, end int) []*Match
- func (r *QueryResult) TopMatch() *Match
type QueryTiming
type ResultIterator
- func FilterIterator(iter ResultIterator, predicate func(*Match) bool) ResultIterator
- func MapIterator(iter ResultIterator, fn func(*Match) *Match) ResultIterator
- func NewChannelIterator(matches <-chan *Match, errs <-chan error) ResultIterator
- func NewEmptyIterator() ResultIterator
- func NewErrorIterator(err error) ResultIterator
- func NewSliceIterator(matches []*Match) ResultIterator
type Scope
- func GetScopeFilter(f Filter) (scope *Scope, ok bool)
- func NewScope(tenant, user, session string) *Scope
- func (s *Scope) Match(other *Scope) bool
type SortBy
- func SortByCreatedAt(descending bool) SortBy
- func SortByField(field string, descending bool) SortBy
- func SortByScore() SortBy
- func SortByUpdatedAt(descending bool) SortBy
type StoreStats
type Temporal
- func NewTemporal() *Temporal
- func NewTemporalWithTTL(ttl time.Duration) *Temporal
- func (t *Temporal) IsExpired() bool
- func (t *Temporal) IsValid() bool
- func (t *Temporal) SetExpiry(ttl time.Duration)
- func (t *Temporal) Touch()
type TimeField
type TimestampValue
- func NewTimestamp(t time.Time) *TimestampValue
type UpsertResult
- func (r *UpsertResult) PartialSuccess() bool
- func (r *UpsertResult) Success() bool
- func (r *UpsertResult) TotalProcessed() int64
type VectorStore

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ForEach ¶

func ForEach(iter ResultIterator, fn func(*Match) error) error

ForEach applies a function to each result in an iterator. The iterator is closed after iteration.

Example:

iter, err := coll.QueryStream(ctx, query)
if err != nil {
    return err
}
err = vectorstore.ForEach(iter, func(match *Match) error {
    fmt.Printf("Score: %.4f, ID: %s\n", match.Score, match.Document.ID)
    return nil
})

func GetTagFilter ¶

func GetTagFilter(f Filter) (tag string, ok bool)

GetTagFilter extracts the tag from a tag filter. Returns empty string if not a tag filter.

func GetTimeFilter ¶

func GetTimeFilter(f Filter) (field TimeField, op FilterOperator, value time.Time, ok bool)

GetTimeFilter extracts field, operator, and value from a time filter. Returns zero values if not a time filter.

func IsAndFilter ¶

func IsAndFilter(f Filter) bool

IsAndFilter checks if the filter is an AND composite.

func IsNotFilter ¶

func IsNotFilter(f Filter) bool

IsNotFilter checks if the filter is a NOT filter.

func IsOrFilter ¶

func IsOrFilter(f Filter) bool

IsOrFilter checks if the filter is an OR composite.

func Validate ¶

func Validate(doc *Document) error

Validate validates a document before storage. This performs comprehensive validation including:

ID format and length
Content presence and validity
Embedding dimensions and values
Metadata key safety
Temporal constraints

func ValidateContent ¶

func ValidateContent(c *Content) error

ValidateContent validates document content.

func ValidateEmbedding ¶

func ValidateEmbedding(e *Embedding) error

ValidateEmbedding validates an embedding vector.

func ValidateID ¶

func ValidateID(id string) error

ValidateID validates a document ID.

func ValidateMetadataKey ¶

func ValidateMetadataKey(key string) error

ValidateMetadataKey validates a metadata key.

func ValidateScope ¶

func ValidateScope(s *Scope) error

ValidateScope validates scope information.

func ValidateTag ¶

func ValidateTag(tag string) error

ValidateTag validates a tag.

func ValidateTemporal ¶

func ValidateTemporal(t *Temporal) error

ValidateTemporal validates temporal information.

Types ¶

type BatchConfig ¶

type BatchConfig struct {
	// BatchSize is the number of documents to process per batch.
	// Default: 100
	BatchSize int

	// Parallelism is the number of concurrent batches.
	// Default: 1 (sequential)
	Parallelism int

	// ContinueOnError controls whether to continue on individual document errors.
	// If false, the entire batch fails on first error.
	// If true, failed documents are collected in UpsertResult.Errors.
	// Default: true
	ContinueOnError bool

	// ProgressCallback is called after each batch completes.
	// Receives (processed, total) counts.
	ProgressCallback func(processed, total int)

	// ValidateBeforeBatch validates all documents before starting batch.
	// If true, invalid documents cause immediate failure.
	// If false, validation happens per-batch.
	// Default: false
	ValidateBeforeBatch bool

	// RetryOnError enables retry for failed batches.
	// Default: false
	RetryOnError bool

	// MaxRetries is the maximum number of retries per batch.
	// Only meaningful if RetryOnError is true.
	// Default: 3
	MaxRetries int

	// RetryDelay is the delay between retries.
	// Default: 1 second
	RetryDelay time.Duration
}

BatchConfig contains configuration for batch operations.

func ApplyBatchOptions ¶

func ApplyBatchOptions(opts []BatchOption) *BatchConfig

ApplyBatchOptions applies a list of batch options to a config. This is used internally by collection implementations.

func (*BatchConfig) Validate ¶

func (c *BatchConfig) Validate() error

Validate validates a BatchConfig.

type BatchOption ¶

type BatchOption func(*BatchConfig)

BatchOption configures batch operations.

func WithBatchSize ¶

func WithBatchSize(size int) BatchOption

WithBatchSize sets the batch size.

Example:

result, err := coll.UpsertBatch(ctx, docs,
    WithBatchSize(100),
)

func WithContinueOnError ¶

func WithContinueOnError(enabled bool) BatchOption

WithContinueOnError sets whether to continue on errors.

Example:

result, err := coll.UpsertBatch(ctx, docs,
    WithContinueOnError(false), // Fail fast
)

func WithMaxRetries ¶

func WithMaxRetries(max int) BatchOption

WithMaxRetries sets the maximum number of retries. Also enables retry.

Example:

result, err := coll.UpsertBatch(ctx, docs,
    WithMaxRetries(5),
)

func WithParallelism ¶

func WithParallelism(n int) BatchOption

WithParallelism sets the number of concurrent batches.

Example:

result, err := coll.UpsertBatch(ctx, docs,
    WithBatchSize(100),
    WithParallelism(4),
)

func WithProgressCallback ¶

func WithProgressCallback(callback func(processed, total int)) BatchOption

WithProgressCallback sets a progress callback.

Example:

result, err := coll.UpsertBatch(ctx, docs,
    WithProgressCallback(func(processed, total int) {
        log.Printf("Progress: %d/%d (%.1f%%)",
            processed, total,
            float64(processed)/float64(total)*100)
    }),
)

func WithRetry ¶

func WithRetry(enabled bool) BatchOption

WithRetry enables retry on batch failures.

Example:

result, err := coll.UpsertBatch(ctx, docs,
    WithRetry(true),
    WithMaxRetries(5),
    WithRetryDelay(2*time.Second),
)

func WithRetryDelay ¶

func WithRetryDelay(delay time.Duration) BatchOption

WithRetryDelay sets the delay between retries. Also enables retry.

Example:

result, err := coll.UpsertBatch(ctx, docs,
    WithRetryDelay(2*time.Second),
)

func WithValidation ¶

func WithValidation(enabled bool) BatchOption

WithValidation enables pre-batch validation.

Example:

result, err := coll.UpsertBatch(ctx, docs,
    WithValidation(true),
)

type Collection ¶

type Collection interface {
	// Name returns the collection name.
	Name() string

	// Upsert inserts or updates documents in the collection.
	// If a document with the same ID exists, it is updated. Otherwise, a new document is created.
	//
	// Documents are validated before insertion. Invalid documents cause the entire
	// operation to fail (all-or-nothing semantics).
	//
	// For large batches, consider using UpsertBatch for better performance.
	//
	// Example:
	//
	//	doc := &Document{
	//	    ID: "doc1",
	//	    Content: NewTextContent("Hello world"),
	//	    Embedding: NewEmbedding([]float32{0.1, 0.2, 0.3}, "text-embedding-3-small"),
	//	    Tags: []string{"greeting", "english"},
	//	}
	//	result, err := coll.Upsert(ctx, doc)
	Upsert(ctx context.Context, documents ...*Document) (*UpsertResult, error)

	// UpsertBatch performs batch upsert with progress tracking and error handling.
	// This is optimized for inserting large numbers of documents.
	//
	// Options can control batch size, parallelism, and progress callbacks.
	//
	// Example:
	//
	//	result, err := coll.UpsertBatch(ctx, documents,
	//	    WithBatchSize(100),
	//	    WithProgressCallback(func(processed, total int) {
	//	        log.Printf("Progress: %d/%d", processed, total)
	//	    }),
	//	)
	UpsertBatch(ctx context.Context, documents []*Document, opts ...BatchOption) (*UpsertResult, error)

	// Query performs similarity search and returns matching documents.
	// The query can include vector similarity, metadata filters, temporal constraints,
	// scope filters, and more.
	//
	// Example:
	//
	//	results, err := coll.Query(ctx, &Query{
	//	    Embedding: queryEmbedding,
	//	    Limit: 10,
	//	    Filters: And(
	//	        TagFilter("category", "product"),
	//	        ScoreFilter(GreaterThan(0.8)),
	//	    ),
	//	})
	Query(ctx context.Context, query *Query) (*QueryResult, error)

	// QueryStream performs similarity search and streams results via an iterator.
	// This is useful for processing large result sets without loading everything into memory.
	//
	// The iterator must be closed when done to release resources.
	//
	// Example:
	//
	//	iter, err := coll.QueryStream(ctx, query)
	//	if err != nil {
	//	    log.Fatal(err)
	//	}
	//	defer iter.Close()
	//
	//	for iter.Next() {
	//	    match := iter.Match()
	//	    fmt.Printf("Score: %.4f, Content: %s\n", match.Score, match.Document.Content)
	//	}
	//	if err := iter.Err(); err != nil {
	//	    log.Fatal(err)
	//	}
	QueryStream(ctx context.Context, query *Query) (ResultIterator, error)

	// Get retrieves documents by their IDs.
	// Documents that don't exist are omitted from the result (no error is returned).
	//
	// The order of returned documents may not match the order of requested IDs.
	//
	// Example:
	//
	//	docs, err := coll.Get(ctx, "doc1", "doc2", "doc3")
	Get(ctx context.Context, ids ...string) ([]*Document, error)

	// Delete removes documents by their IDs.
	// IDs that don't exist are silently ignored (no error is returned).
	//
	// Example:
	//
	//	result, err := coll.Delete(ctx, "doc1", "doc2")
	//	fmt.Printf("Deleted %d documents\n", result.Deleted)
	Delete(ctx context.Context, ids ...string) (*DeleteResult, error)

	// DeleteByFilter removes all documents matching the filter.
	// This is useful for bulk deletion based on criteria.
	//
	// WARNING: This can delete many documents. Use with caution.
	//
	// Example:
	//
	//	// Delete all expired documents
	//	result, err := coll.DeleteByFilter(ctx,
	//	    TimeFilter(Before(time.Now())),
	//	)
	DeleteByFilter(ctx context.Context, filter Filter) (*DeleteResult, error)

	// Count returns the number of documents in the collection.
	// If a filter is provided, only documents matching the filter are counted.
	//
	// Example:
	//
	//	total, err := coll.Count(ctx, nil) // All documents
	//	active, err := coll.Count(ctx, TagFilter("status", "active"))
	Count(ctx context.Context, filter Filter) (int64, error)

	// Stats returns statistics about the collection.
	// This includes document count, storage size, index info, etc.
	Stats(ctx context.Context) (*CollectionStats, error)

	// Clear removes all documents from the collection.
	// This is primarily useful for testing.
	//
	// WARNING: This permanently deletes all data in the collection.
	Clear(ctx context.Context) error
}

Collection represents an isolated namespace for documents and embeddings. Collections enable use-case specific configurations like TTL, deduplication, scoping, and indexing strategies.

All Collection methods are safe for concurrent use.

type CollectionConfig ¶

type CollectionConfig struct {
	// TTL specifies the time-to-live for documents in this collection.
	// Documents are automatically deleted after TTL expires.
	// Zero means no TTL (documents never expire).
	TTL time.Duration

	// EnableDeduplication enables content-based deduplication.
	// When enabled, documents with identical embeddings (or content hashes)
	// are deduplicated automatically.
	EnableDeduplication bool

	// DeduplicationThreshold is the similarity threshold for deduplication (0.0-1.0).
	// Documents with similarity >= threshold are considered duplicates.
	// Default: 0.99 (nearly identical)
	DeduplicationThreshold float32

	// IndexType specifies the vector index type.
	// Examples: "flat", "hnsw", "ivf"
	IndexType IndexType

	// EmbeddingDimensions is the expected dimensionality of embeddings.
	// If set, documents with different dimensions will be rejected.
	// Zero means no dimension validation.
	EmbeddingDimensions int

	// AutoGenerateEmbeddings enables automatic embedding generation.
	// When enabled, documents without embeddings will have them generated
	// using the specified embedding function.
	AutoGenerateEmbeddings bool

	// EmbeddingFunction is the function to generate embeddings.
	// Only used if AutoGenerateEmbeddings is true.
	EmbeddingFunction EmbeddingFunction

	// ScopeRequired specifies required scope fields.
	// Documents without these scope fields will be rejected.
	ScopeRequired []string

	// MaxDocuments limits the number of documents in the collection.
	// Oldest documents are removed when limit is exceeded (FIFO).
	// Zero means no limit.
	MaxDocuments int64

	// EnableVersioning enables document versioning.
	// Previous versions are retained and can be queried.
	EnableVersioning bool

	// MaxVersions limits the number of versions per document.
	// Only meaningful if EnableVersioning is true.
	// Zero means unlimited versions.
	MaxVersions int

	// EnableAuditLog enables audit logging for all operations.
	EnableAuditLog bool

	// Metadata contains additional provider-specific configuration.
	Metadata map[string]any
}

CollectionConfig contains the configuration for a collection. This is built from CollectionOption functions.

func ApplyOptions ¶

func ApplyOptions(opts []CollectionOption) *CollectionConfig

ApplyOptions applies a list of options to a config. This is used internally by collection implementations.

func (*CollectionConfig) Validate ¶

func (c *CollectionConfig) Validate() error

Validate validates a CollectionConfig.

type CollectionOption ¶

type CollectionOption func(*CollectionConfig)

CollectionOption configures a Collection. Options are applied when creating or accessing a collection.

func WithAuditLog ¶

func WithAuditLog(enabled bool) CollectionOption

WithAuditLog enables audit logging for the collection.

Example:

docs := store.Collection("docs",
    WithAuditLog(true),
)

func WithAutoEmbeddings ¶

func WithAutoEmbeddings(fn EmbeddingFunction) CollectionOption

WithAutoEmbeddings enables automatic embedding generation.

Example:

docs := store.Collection("docs",
    WithAutoEmbeddings(myEmbeddingFunc),
)

func WithDeduplication ¶

func WithDeduplication(enabled bool) CollectionOption

WithDeduplication enables content-based deduplication. Documents with similarity >= 0.99 are considered duplicates.

Example:

docs := store.Collection("docs", WithDeduplication(true))

func WithDeduplicationThreshold ¶

func WithDeduplicationThreshold(threshold float32) CollectionOption

WithDeduplicationThreshold sets the similarity threshold for deduplication. Also enables deduplication.

Example:

docs := store.Collection("docs", WithDeduplicationThreshold(0.95))

func WithDimensions ¶

func WithDimensions(dimensions int) CollectionOption

WithDimensions sets the expected embedding dimensions. Documents with different dimensions will be rejected.

Example:

docs := store.Collection("docs", WithDimensions(768))

func WithIndexing ¶

func WithIndexing(indexType IndexType) CollectionOption

WithIndexing sets the vector index type.

Example:

docs := store.Collection("docs", WithIndexing(IndexTypeHNSW))

func WithMaxDocuments ¶

func WithMaxDocuments(max int64) CollectionOption

WithMaxDocuments limits the collection size. Oldest documents are removed when limit is exceeded.

Example:

cache := store.Collection("cache",
    WithMaxDocuments(10000),
)

func WithMaxVersions ¶

func WithMaxVersions(max int) CollectionOption

WithMaxVersions limits the number of versions per document. Also enables versioning.

Example:

docs := store.Collection("docs",
    WithMaxVersions(5),
)

func WithMetadata ¶

func WithMetadata(metadata map[string]any) CollectionOption

WithMetadata adds custom metadata to the collection config.

Example:

docs := store.Collection("docs",
    WithMetadata(map[string]any{
        "shard_count": 4,
        "replication": 3,
    }),
)

func WithScope ¶

func WithScope(fields ...string) CollectionOption

WithScope specifies required scope fields. Documents without these fields will be rejected.

Example:

memory := store.Collection("memory",
    WithScope("user", "session"),
)

func WithTTL ¶

func WithTTL(ttl time.Duration) CollectionOption

WithTTL sets the time-to-live for documents in the collection.

Example:

cache := store.Collection("cache", WithTTL(5*time.Minute))

func WithVersioning ¶

func WithVersioning(enabled bool) CollectionOption

WithVersioning enables document versioning.

Example:

docs := store.Collection("docs",
    WithVersioning(true),
)

type CollectionStats ¶

type CollectionStats struct {
	// Name is the collection name
	Name string

	// Documents is the number of documents in the collection
	Documents int64

	// StorageBytes is the storage used by this collection in bytes
	StorageBytes int64

	// EmbeddingDimensions is the dimensionality of embeddings in this collection
	EmbeddingDimensions int

	// IndexType is the index type used (e.g., "flat", "hnsw", "ivf")
	IndexType string

	// CreatedAt is when the collection was created
	CreatedAt *TimestampValue

	// UpdatedAt is when the collection was last modified
	UpdatedAt *TimestampValue

	// Extra contains provider-specific statistics
	Extra map[string]any
}

CollectionStats contains statistics about a specific collection.

type Content ¶

type Content struct {
	// Type indicates the content type
	Type ContentType

	// Text is the text content (for ContentTypeText)
	Text string

	// Data is the binary data (for images, audio, video)
	// Stored as base64-encoded string for JSON serialization
	Data []byte

	// MimeType is the MIME type of the content (e.g., "image/jpeg", "audio/mp3")
	MimeType string

	// URL is an optional external URL for the content
	// Useful for referencing large media without storing it inline
	URL string

	// Chunks contains text chunks for long documents
	// Useful for document splitting and retrieval
	Chunks []string
}

Content represents multi-modal document content. A document can contain text, images, audio, or video.

func NewAudioContent ¶

func NewAudioContent(data []byte, mimeType string) *Content

NewAudioContent creates a Content with audio data.

func NewImageContent ¶

func NewImageContent(data []byte, mimeType string) *Content

NewImageContent creates a Content with image data.

func NewImageURL ¶

func NewImageURL(url string) *Content

NewImageURL creates a Content with an image URL reference.

func NewTextContent ¶

func NewTextContent(text string) *Content

NewTextContent creates a Content with text.

func NewVideoContent ¶

func NewVideoContent(data []byte, mimeType string) *Content

NewVideoContent creates a Content with video data.

func (*Content) DataBase64 ¶

func (c *Content) DataBase64() string

DataBase64 returns the binary data as a base64-encoded string. This is useful for JSON serialization.

func (*Content) String ¶

func (c *Content) String() string

String returns a string representation of the content. For text, returns the text. For binary, returns a summary.

type ContentType ¶

type ContentType string

ContentType represents the type of content in a document.

const (
	// ContentTypeText represents text content
	ContentTypeText ContentType = "text"

	// ContentTypeImage represents image content (JPEG, PNG, WebP, etc.)
	ContentTypeImage ContentType = "image"

	// ContentTypeAudio represents audio content (MP3, WAV, etc.)
	ContentTypeAudio ContentType = "audio"

	// ContentTypeVideo represents video content (MP4, WebM, etc.)
	ContentTypeVideo ContentType = "video"

	// ContentTypeMultimodal represents mixed content types
	ContentTypeMultimodal ContentType = "multimodal"
)

type DeleteResult ¶

type DeleteResult struct {
	// Deleted is the number of documents actually deleted
	Deleted int64

	// NotFound is the number of IDs that were not found
	NotFound int64

	// NotFoundIDs contains the IDs that were not found
	NotFoundIDs []string

	// Timing contains operation timing information
	Timing *OperationTiming
}

DeleteResult contains the results of a delete operation.

func (*DeleteResult) Success ¶

func (r *DeleteResult) Success() bool

Success returns true if at least one document was deleted.

type DistanceMetric ¶

type DistanceMetric string

DistanceMetric represents the method for calculating vector similarity.

const (
	// DistanceMetricCosine calculates cosine similarity (default)
	// Range: -1 (opposite) to 1 (identical)
	// Best for: Most text embeddings (normalized vectors)
	DistanceMetricCosine DistanceMetric = "cosine"

	// DistanceMetricEuclidean calculates Euclidean (L2) distance
	// Range: 0 (identical) to infinity (different)
	// Best for: When magnitude matters
	DistanceMetricEuclidean DistanceMetric = "euclidean"

	// DistanceMetricDotProduct calculates dot product similarity
	// Range: -infinity to +infinity
	// Best for: Normalized vectors, faster than cosine
	DistanceMetricDotProduct DistanceMetric = "dot_product"

	// DistanceMetricManhattan calculates Manhattan (L1) distance
	// Range: 0 (identical) to infinity (different)
	// Best for: High-dimensional sparse vectors
	DistanceMetricManhattan DistanceMetric = "manhattan"

	// DistanceMetricHamming calculates Hamming distance (for binary vectors)
	// Range: 0 (identical) to vector_length (completely different)
	// Best for: Binary embeddings
	DistanceMetricHamming DistanceMetric = "hamming"
)

type Document ¶

type Document struct {
	// ID is the unique identifier for the document.
	// Must be unique within a collection.
	// IDs should be URL-safe strings (alphanumeric, hyphens, underscores).
	ID string

	// Content is the multi-modal content of the document.
	// Supports text, images, audio, video.
	Content *Content

	// Embedding is the vector representation of the content.
	// Can be nil if embeddings are generated server-side.
	Embedding *Embedding

	// Scope defines hierarchical context for the document.
	// Useful for multi-tenancy, user isolation, session tracking.
	// Example: {Tenant: "acme", User: "user123", Session: "sess456"}
	Scope *Scope

	// Temporal contains time-related information for the document.
	// Useful for TTL, time-based queries, event ordering.
	Temporal *Temporal

	// Tags are indexed labels for efficient filtering.
	// Unlike metadata, tags are optimized for equality queries.
	// Examples: ["product", "electronics", "featured"]
	Tags []string

	// Metadata contains additional free-form information.
	// Use this for data that doesn't fit into typed fields.
	// Keys should be alphanumeric with underscores (no special chars).
	Metadata map[string]any

	// Score is the similarity score (populated during queries).
	// Not stored, only returned in query results.
	Score float32 `json:"-"`

	// Distance is the raw distance metric (populated during queries).
	// Not stored, only returned in query results.
	Distance float32 `json:"-"`
}

Document represents a document with embeddings, content, and metadata. Documents are the primary unit of storage in a vector database.

Unlike the previous version which only had a string Content field, this enhanced Document supports:

Multi-modal content (text, image, audio, video)
Typed scope fields for multi-tenancy
Temporal information for time-based queries
Tags for efficient filtering
Structured metadata separate from free-form data

type Embedding ¶

type Embedding struct {
	// Vector is the embedding vector
	Vector []float32

	// Model is the name of the model that generated this embedding
	// Examples: "text-embedding-3-small", "clip-vit-base-patch32"
	Model string

	// Dimensions is the dimensionality of the vector
	// Automatically set from len(Vector)
	Dimensions int

	// Normalized indicates whether the vector is normalized (unit length)
	// Many distance metrics work better with normalized vectors
	Normalized bool
}

Embedding represents a vector embedding with metadata.

func NewEmbedding ¶

func NewEmbedding(vector []float32, model string) *Embedding

NewEmbedding creates a new Embedding from a vector and model name.

func NewNormalizedEmbedding ¶

func NewNormalizedEmbedding(vector []float32, model string) *Embedding

NewNormalizedEmbedding creates a normalized embedding (unit length). The vector is normalized in-place.

func (*Embedding) Normalize ¶

func (e *Embedding) Normalize()

Normalize normalizes the embedding vector to unit length (in-place).

type EmbeddingFunction ¶

type EmbeddingFunction func(content *Content) (*Embedding, error)

EmbeddingFunction generates embeddings for content.

type ExplainStep ¶

type ExplainStep struct {
	// Name is the step name
	Name string

	// Duration is how long this step took
	Duration time.Duration

	// Details contains step-specific information
	Details map[string]any
}

ExplainStep represents a single step in query execution.

type Filter ¶

type Filter interface {
	// contains filtered or unexported methods
}

Filter represents a condition for filtering documents. Filters can be combined using And(), Or(), Not().

func And ¶

func And(filters ...Filter) Filter

And combines multiple filters with AND logic (all must match).

func AnyTagFilter ¶

func AnyTagFilter(tags ...string) Filter

AnyTagFilter creates a filter that matches documents with any of the specified tags.

func Contains ¶

func Contains(field string, substring string) Filter

Contains creates a substring filter.

func CreatedAfter ¶

func CreatedAfter(t time.Time) Filter

CreatedAfter filters documents created after a time.

func CreatedBefore ¶

func CreatedBefore(t time.Time) Filter

CreatedBefore filters documents created before a time.

func EndsWith ¶

func EndsWith(field string, suffix string) Filter

EndsWith creates a suffix filter.

func Eq ¶

func Eq(field string, value any) Filter

Eq creates an equality filter.

func Exists ¶

func Exists(field string) Filter

Exists creates an existence filter.

func Expired ¶

func Expired() Filter

Expired filters documents that have expired.

func ExpiresAfter ¶

func ExpiresAfter(t time.Time) Filter

ExpiresAfter filters documents that expire after a time.

func ExpiresBefore ¶

func ExpiresBefore(t time.Time) Filter

ExpiresBefore filters documents that expire before a time.

func FieldFilter ¶

func FieldFilter(field string, operator FilterOperator, value any) Filter

FieldFilter creates a filter on a metadata field.

func GetFilters ¶

func GetFilters(f Filter) []Filter

GetFilters returns all filters in the composite filter. This is used internally by providers to decompose complex filters.

func GetNotFilter ¶

func GetNotFilter(f Filter) (inner Filter, ok bool)

GetNotFilter extracts the inner filter from a NOT filter.

func Gt ¶

func Gt(field string, value any) Filter

Gt creates a greater-than filter.

func Gte ¶

func Gte(field string, value any) Filter

Gte creates a greater-than-or-equal filter.

func In ¶

func In(field string, values ...any) Filter

In creates an in-set filter.

func Lt ¶

func Lt(field string, value any) Filter

Lt creates a less-than filter.

func Lte ¶

func Lte(field string, value any) Filter

Lte creates a less-than-or-equal filter.

func Ne ¶

func Ne(field string, value any) Filter

Ne creates an inequality filter.

func Not ¶

func Not(filter Filter) Filter

Not negates a filter.

func NotExists ¶

func NotExists(field string) Filter

NotExists creates a non-existence filter.

func NotExpired ¶

func NotExpired() Filter

NotExpired filters documents that have not expired yet.

func NotIn ¶

func NotIn(field string, values ...any) Filter

NotIn creates a not-in-set filter.

func Or ¶

func Or(filters ...Filter) Filter

Or combines multiple filters with OR logic (at least one must match).

func ScopeFilter ¶

func ScopeFilter(scope *Scope) Filter

ScopeFilter creates a filter based on scope. Documents must match all non-empty scope fields.

func ScoreAbove ¶

func ScoreAbove(threshold float32) Filter

ScoreAbove filters results with score > threshold.

func ScoreAtLeast ¶

func ScoreAtLeast(threshold float32) Filter

ScoreAtLeast filters results with score >= threshold.

func ScoreBelow ¶

func ScoreBelow(threshold float32) Filter

ScoreBelow filters results with score < threshold.

func ScoreFilter ¶

func ScoreFilter(operator FilterOperator, value float32) Filter

ScoreFilter creates a filter based on similarity score. Only applies during vector similarity queries.

func SessionFilter ¶

func SessionFilter(session string) Filter

SessionFilter creates a filter for a specific session.

func StartsWith ¶

func StartsWith(field string, prefix string) Filter

StartsWith creates a prefix filter.

func TagFilter ¶

func TagFilter(tag string) Filter

TagFilter creates a filter that matches documents with a specific tag.

func TagsFilter ¶

func TagsFilter(tags ...string) Filter

TagsFilter creates a filter that matches documents with all specified tags.

func TenantFilter ¶

func TenantFilter(tenant string) Filter

TenantFilter creates a filter for a specific tenant.

func TimeFilter ¶

func TimeFilter(field TimeField, operator FilterOperator, value time.Time) Filter

TimeFilter creates a time-based filter.

func UpdatedAfter ¶

func UpdatedAfter(t time.Time) Filter

UpdatedAfter filters documents updated after a time.

func UpdatedBefore ¶

func UpdatedBefore(t time.Time) Filter

UpdatedBefore filters documents updated before a time.

func UserFilter ¶

func UserFilter(user string) Filter

UserFilter creates a filter for a specific user.

type FilterOperator ¶

type FilterOperator string

FilterOperator represents a comparison operator.

const (
	// OpEqual checks for equality (==)
	OpEqual FilterOperator = "eq"

	// OpNotEqual checks for inequality (!=)
	OpNotEqual FilterOperator = "ne"

	// OpGreaterThan checks if field > value
	OpGreaterThan FilterOperator = "gt"

	// OpGreaterThanOrEqual checks if field >= value
	OpGreaterThanOrEqual FilterOperator = "gte"

	// OpLessThan checks if field < value
	OpLessThan FilterOperator = "lt"

	// OpLessThanOrEqual checks if field <= value
	OpLessThanOrEqual FilterOperator = "lte"

	// OpIn checks if field is in a set of values
	OpIn FilterOperator = "in"

	// OpNotIn checks if field is not in a set of values
	OpNotIn FilterOperator = "nin"

	// OpContains checks if a string field contains a substring
	OpContains FilterOperator = "contains"

	// OpStartsWith checks if a string field starts with a prefix
	OpStartsWith FilterOperator = "startswith"

	// OpEndsWith checks if a string field ends with a suffix
	OpEndsWith FilterOperator = "endswith"

	// OpExists checks if a field exists (value is ignored)
	OpExists FilterOperator = "exists"

	// OpNotExists checks if a field does not exist (value is ignored)
	OpNotExists FilterOperator = "notexists"
)

func GetFieldFilter ¶

func GetFieldFilter(f Filter) (field string, op FilterOperator, value any, ok bool)

GetFieldFilter extracts field, operator, and value from a field filter. Returns empty values if not a field filter.

func GetScoreFilter ¶

func GetScoreFilter(f Filter) (op FilterOperator, value float32, ok bool)

GetScoreFilter extracts operator and value from a score filter. Returns zero values if not a score filter.

type IndexType ¶

type IndexType string

IndexType represents the type of vector index.

const (
	// IndexTypeFlat performs brute-force (exact) search.
	// Best for: Small collections (<10K documents), maximum accuracy.
	IndexTypeFlat IndexType = "flat"

	// IndexTypeHNSW uses Hierarchical Navigable Small World graph.
	// Best for: Large collections, good balance of speed and accuracy.
	IndexTypeHNSW IndexType = "hnsw"

	// IndexTypeIVF uses Inverted File with Product Quantization.
	// Best for: Very large collections, faster but less accurate.
	IndexTypeIVF IndexType = "ivf"

	// IndexTypeAuto lets the provider choose based on collection size.
	IndexTypeAuto IndexType = "auto"
)

type Match ¶

type Match struct {
	// Document is the matched document
	Document *Document

	// Score is the similarity score (higher is more similar)
	// For cosine similarity: -1 (opposite) to 1 (identical)
	// For euclidean: normalized to 0-1 range
	Score float32

	// Distance is the raw distance metric (optional)
	// The interpretation depends on the distance metric used
	Distance float32

	// Rank is the result rank (1-based)
	// Useful for hybrid ranking scenarios
	Rank int
}

Match represents a single search result with similarity score.

func CollectAll ¶

func CollectAll(iter ResultIterator) ([]*Match, error)

CollectAll collects all results from an iterator into a slice. The iterator is closed after collection.

This is a convenience function for cases where you want to materialize all results. Use with caution on large result sets.

Example:

iter, err := coll.QueryStream(ctx, query)
if err != nil {
    return err
}
matches, err := vectorstore.CollectAll(iter)
if err != nil {
    return err
}

func CollectN ¶

func CollectN(iter ResultIterator, n int) ([]*Match, error)

CollectN collects up to N results from an iterator. The iterator is NOT closed (caller should close it).

Example:

iter, err := coll.QueryStream(ctx, query)
if err != nil {
    return err
}
defer iter.Close()

// Get first 10 results
matches, err := vectorstore.CollectN(iter, 10)

type OperationTiming ¶

type OperationTiming struct {
	// Total is the total operation time
	Total time.Duration

	// Validation is the time spent validating documents
	Validation time.Duration

	// Storage is the time spent writing to storage
	Storage time.Duration

	// IndexUpdate is the time spent updating indexes
	IndexUpdate time.Duration
}

OperationTiming contains timing information for CRUD operations.

type Query ¶

type Query struct {
	// Embedding is the query vector for similarity search.
	// If nil, performs a pure metadata/filter query without vector similarity.
	Embedding *Embedding

	// Filters specifies conditions that documents must match.
	// Can be combined using And(), Or(), Not() for complex queries.
	Filters Filter

	// Limit is the maximum number of results to return.
	// Default: 10. Maximum: 10000.
	Limit int

	// Offset is the number of results to skip (for pagination).
	// Default: 0.
	Offset int

	// MinScore is the minimum similarity score (0.0-1.0).
	// Documents with lower scores are excluded.
	// Default: 0 (no minimum).
	MinScore float32

	// Metric specifies how to calculate vector similarity.
	// Default: Cosine similarity.
	Metric DistanceMetric

	// IncludeEmbeddings controls whether to return embeddings in results.
	// Default: false (embeddings are large and often not needed).
	IncludeEmbeddings bool

	// IncludeContent controls whether to return document content in results.
	// Default: true. Set to false to only get metadata/scores.
	IncludeContent bool

	// SortBy specifies additional sorting criteria (after similarity).
	// Useful for hybrid ranking (e.g., similarity + recency).
	SortBy []SortBy

	// Explain requests query execution details (for debugging/optimization).
	// Default: false.
	Explain bool
}

Query defines parameters for similarity search. It supports vector similarity, metadata filters, temporal constraints, scope filters, pagination, and more.

func NewFilterQuery ¶

func NewFilterQuery(filters Filter) *Query

NewFilterQuery creates a filter-only query (no vector similarity).

func NewQuery ¶

func NewQuery(embedding *Embedding) *Query

NewQuery creates a query with an embedding vector.

func (*Query) Validate ¶

func (q *Query) Validate() error

Validate validates a query.

type QueryExplain ¶

type QueryExplain struct {
	// Strategy describes the query execution strategy
	// Examples: "brute_force", "hnsw_index", "ivf_index"
	Strategy string

	// IndexUsed indicates which index was used (if any)
	IndexUsed string

	// ScannedDocuments is the number of documents scanned
	ScannedDocuments int64

	// FilteredDocuments is the number of documents after filtering
	FilteredDocuments int64

	// VectorComparisons is the number of vector similarity comparisons
	VectorComparisons int64

	// CacheHit indicates if results were served from cache
	CacheHit bool

	// Steps contains detailed execution steps
	Steps []ExplainStep
}

QueryExplain contains detailed query execution information. This is useful for debugging and optimization.

func (*QueryExplain) ExplainString ¶

func (e *QueryExplain) ExplainString() string

ExplainString returns a human-readable explanation of query execution.

type QueryResult ¶

type QueryResult struct {
	// Matches are the matching documents with their scores
	Matches []*Match

	// Total is the total number of matches (before limit/offset)
	// Useful for pagination
	Total int64

	// Offset is the offset that was applied
	Offset int

	// Limit is the limit that was applied
	Limit int

	// Timing contains query execution timing information
	Timing *QueryTiming

	// Explain contains query execution details (if requested)
	Explain *QueryExplain
}

QueryResult contains the results of a similarity search query.

func (*QueryResult) AvgScore ¶

func (r *QueryResult) AvgScore() float32

AvgScore returns the average similarity score across all matches.

func (*QueryResult) Count ¶

func (r *QueryResult) Count() int

Count returns the number of matches in this result.

func (*QueryResult) Documents ¶

func (r *QueryResult) Documents() []*Document

Documents returns just the documents from matches (without scores).

func (*QueryResult) Empty ¶

func (r *QueryResult) Empty() bool

Empty returns true if there are no matches.

func (*QueryResult) FilterByScore ¶

func (r *QueryResult) FilterByScore(minScore float32) []*Match

FilterByScore returns matches with score >= minScore.

func (*QueryResult) FilterByTag ¶

func (r *QueryResult) FilterByTag(tag string) []*Match

FilterByTag returns matches with a specific tag.

func (*QueryResult) First ¶

func (r *QueryResult) First(n int) []*Match

First returns the first N matches.

func (*QueryResult) GroupByTag ¶

func (r *QueryResult) GroupByTag() map[string][]*Match

GroupByTag groups matches by tag. Each match may appear in multiple groups if it has multiple tags.

func (*QueryResult) HasMatches ¶

func (r *QueryResult) HasMatches() bool

HasMatches returns true if the query returned any matches.

func (*QueryResult) HasMore ¶

func (r *QueryResult) HasMore() bool

HasMore returns true if there are more results beyond the current page.

func (*QueryResult) IDs ¶

func (r *QueryResult) IDs() []string

IDs returns just the document IDs from matches.

func (*QueryResult) Last ¶

func (r *QueryResult) Last(n int) []*Match

Last returns the last N matches.

func (*QueryResult) MatchByID ¶

func (r *QueryResult) MatchByID(id string) *Match

MatchByID finds a match by document ID.

func (*QueryResult) MaxScore ¶

func (r *QueryResult) MaxScore() float32

MaxScore returns the highest similarity score.

func (*QueryResult) MinScore ¶

func (r *QueryResult) MinScore() float32

MinScore returns the lowest similarity score.

func (*QueryResult) NextOffset ¶

func (r *QueryResult) NextOffset() int

NextOffset returns the offset for the next page.

func (*QueryResult) PrevOffset ¶

func (r *QueryResult) PrevOffset() int

PrevOffset returns the offset for the previous page.

func (*QueryResult) Scores ¶

func (r *QueryResult) Scores() []float32

Scores returns just the scores from matches.

func (*QueryResult) Slice ¶

func (r *QueryResult) Slice(start, end int) []*Match

Slice returns a slice of matches [start:end].

func (*QueryResult) TopMatch ¶

func (r *QueryResult) TopMatch() *Match

TopMatch returns the highest scoring match, or nil if no matches.

type QueryTiming ¶

type QueryTiming struct {
	// Total is the total query execution time
	Total time.Duration

	// VectorSearch is the time spent on vector similarity search
	VectorSearch time.Duration

	// FilterApplication is the time spent applying filters
	FilterApplication time.Duration

	// Retrieval is the time spent retrieving full documents
	Retrieval time.Duration

	// Scoring is the time spent calculating similarity scores
	Scoring time.Duration
}

QueryTiming contains timing information for a query.

type ResultIterator ¶

type ResultIterator interface {
	// Next advances to the next result.
	// Returns true if there is a result, false if iteration is complete or an error occurred.
	//
	// Always check Err() after Next returns false to distinguish between
	// normal completion and errors.
	Next() bool

	// Match returns the current search match.
	// Only valid after Next returns true.
	Match() *Match

	// Err returns any error that occurred during iteration.
	// Should be checked after Next returns false.
	Err() error

	// Close releases resources associated with the iterator.
	// Always call Close when done, typically via defer.
	Close() error
}

ResultIterator provides streaming access to query results. It follows the iterator pattern common in Go database libraries.

func FilterIterator ¶

func FilterIterator(iter ResultIterator, predicate func(*Match) bool) ResultIterator

FilterIterator applies a predicate to filter results from an iterator. Returns a new iterator with only matching results.

Example:

iter, err := coll.QueryStream(ctx, query)
if err != nil {
    return err
}
defer iter.Close()

// Filter for high-scoring results
filtered := vectorstore.FilterIterator(iter, func(m *Match) bool {
    return m.Score >= 0.8
})
defer filtered.Close()

func MapIterator ¶

func MapIterator(iter ResultIterator, fn func(*Match) *Match) ResultIterator

MapIterator applies a transformation function to each result. Returns a new iterator with transformed results.

Example:

iter, err := coll.QueryStream(ctx, query)
if err != nil {
    return err
}
defer iter.Close()

// Boost scores
boosted := vectorstore.MapIterator(iter, func(m *Match) *Match {
    m.Score *= 1.2
    if m.Score > 1.0 {
        m.Score = 1.0
    }
    return m
})
defer boosted.Close()

func NewChannelIterator ¶

func NewChannelIterator(matches <-chan *Match, errs <-chan error) ResultIterator

NewChannelIterator creates a ResultIterator from channels. The match channel should be closed when done. The error channel receives at most one error.

func NewEmptyIterator ¶

func NewEmptyIterator() ResultIterator

NewEmptyIterator creates a ResultIterator with no results.

func NewErrorIterator ¶

func NewErrorIterator(err error) ResultIterator

NewErrorIterator creates a ResultIterator that returns an error.

func NewSliceIterator ¶

func NewSliceIterator(matches []*Match) ResultIterator

NewSliceIterator creates a ResultIterator from a slice of matches. This is a helper for providers that materialize all results in memory.

type Scope ¶

type Scope struct {
	// Tenant is the top-level scope (organization, workspace, team)
	Tenant string

	// User is the user identifier
	User string

	// Session is the session identifier
	Session string

	// Agent is the agent identifier (for multi-agent systems)
	Agent string

	// Thread is the conversation thread identifier
	Thread string

	// Custom contains additional custom scope dimensions
	// Use this for domain-specific scoping needs
	Custom map[string]string
}

Scope defines hierarchical context for a document. This enables multi-tenancy, user isolation, and session tracking.

Scope fields are indexed separately for efficient filtering. All fields are optional and can be combined as needed.

func GetScopeFilter ¶

func GetScopeFilter(f Filter) (scope *Scope, ok bool)

GetScopeFilter extracts the scope from a scope filter. Returns nil if not a scope filter.

func NewScope ¶

func NewScope(tenant, user, session string) *Scope

NewScope creates a scope with common fields.

func (*Scope) Match ¶

func (s *Scope) Match(other *Scope) bool

Match checks if this scope matches another scope. A nil field matches any value (wildcard).

type SortBy ¶

type SortBy struct {
	// Field is the field name to sort by
	// Can be a metadata field, score, or temporal field
	Field string

	// Descending indicates descending order (default is ascending)
	Descending bool
}

SortBy specifies a field to sort by.

func SortByCreatedAt ¶

func SortByCreatedAt(descending bool) SortBy

SortByCreatedAt creates a sort by creation time.

func SortByField ¶

func SortByField(field string, descending bool) SortBy

SortByField creates a sort by metadata field.

func SortByScore ¶

func SortByScore() SortBy

SortByScore creates a sort by similarity score (descending by default).

func SortByUpdatedAt ¶

func SortByUpdatedAt(descending bool) SortBy

SortByUpdatedAt creates a sort by update time.

type StoreStats ¶

type StoreStats struct {
	// Collections is the total number of collections
	Collections int64

	// Documents is the total number of documents across all collections
	Documents int64

	// StorageBytes is the total storage used in bytes
	StorageBytes int64

	// Provider is the vector store provider name (e.g., "memory", "firestore", "qdrant")
	Provider string

	// Version is the provider version
	Version string

	// Extra contains provider-specific statistics
	Extra map[string]any
}

StoreStats contains statistics about the entire vector store.

type Temporal ¶

type Temporal struct {
	// CreatedAt is when the document was created
	CreatedAt time.Time

	// UpdatedAt is when the document was last updated
	UpdatedAt time.Time

	// ExpiresAt is when the document should expire (optional)
	// Used for automatic cleanup in caching scenarios
	ExpiresAt *time.Time

	// EventTime is the time of the event this document represents (optional)
	// Used for time-series data and event ordering
	EventTime *time.Time

	// ValidFrom is when this document becomes valid (optional)
	// Used for future-dated content
	ValidFrom *time.Time

	// ValidUntil is when this document stops being valid (optional)
	// Different from ExpiresAt - this is semantic validity, not storage
	ValidUntil *time.Time
}

Temporal contains time-related information for a document. This enables TTL, time-based queries, and event ordering.

func NewTemporal ¶

func NewTemporal() *Temporal

NewTemporal creates a Temporal with creation time set to now.

func NewTemporalWithTTL ¶

func NewTemporalWithTTL(ttl time.Duration) *Temporal

NewTemporalWithTTL creates a Temporal with TTL.

func (*Temporal) IsExpired ¶

func (t *Temporal) IsExpired() bool

IsExpired checks if the document has expired.

func (*Temporal) IsValid ¶

func (t *Temporal) IsValid() bool

IsValid checks if the document is currently valid (within ValidFrom/ValidUntil range).

func (*Temporal) SetExpiry ¶

func (t *Temporal) SetExpiry(ttl time.Duration)

SetExpiry sets the expiration time relative to now.

func (*Temporal) Touch ¶

func (t *Temporal) Touch()

Touch updates the UpdatedAt timestamp to now.

type TimeField ¶

type TimeField string

TimeField represents a temporal field to filter on.

const (
	// TimeFieldCreatedAt filters on document creation time
	TimeFieldCreatedAt TimeField = "created_at"

	// TimeFieldUpdatedAt filters on document update time
	TimeFieldUpdatedAt TimeField = "updated_at"

	// TimeFieldExpiresAt filters on document expiration time
	TimeFieldExpiresAt TimeField = "expires_at"

	// TimeFieldEventTime filters on event time
	TimeFieldEventTime TimeField = "event_time"

	// TimeFieldValidFrom filters on validity start time
	TimeFieldValidFrom TimeField = "valid_from"

	// TimeFieldValidUntil filters on validity end time
	TimeFieldValidUntil TimeField = "valid_until"
)

type TimestampValue ¶

type TimestampValue struct {
	time.Time
}

TimestampValue represents a timestamp that can be stored and queried. This is a helper type for stats and results.

func NewTimestamp ¶

func NewTimestamp(t time.Time) *TimestampValue

NewTimestamp creates a TimestampValue from a time.Time.

type UpsertResult ¶

type UpsertResult struct {
	// Inserted is the number of new documents inserted
	Inserted int64

	// Updated is the number of existing documents updated
	Updated int64

	// Failed is the number of documents that failed validation/insertion
	Failed int64

	// FailedIDs contains the IDs of documents that failed (if any)
	FailedIDs []string

	// Errors contains errors for failed documents (parallel to FailedIDs)
	Errors []error

	// Timing contains operation timing information
	Timing *OperationTiming

	// Deduplicated is the number of documents that were deduplicated
	// Only set if deduplication is enabled
	Deduplicated int64

	// DeduplicatedIDs contains the IDs of deduplicated documents
	DeduplicatedIDs []string
}

UpsertResult contains the results of an upsert operation.

func (*UpsertResult) PartialSuccess ¶

func (r *UpsertResult) PartialSuccess() bool

PartialSuccess returns true if some documents succeeded but some failed.

func (*UpsertResult) Success ¶

func (r *UpsertResult) Success() bool

Success returns true if all documents were successfully upserted.

func (*UpsertResult) TotalProcessed ¶

func (r *UpsertResult) TotalProcessed() int64

TotalProcessed returns the total number of documents processed.

type VectorStore ¶

type VectorStore interface {
	// Collection returns a Collection with the specified name and options.
	// Collections provide isolated namespaces for documents and embeddings.
	//
	// The name must be a valid collection identifier (alphanumeric, hyphens, underscores).
	// Options configure behavior like TTL, deduplication, indexing, etc.
	//
	// Collections are created lazily on first use. Calling Collection multiple times
	// with the same name returns the same logical collection.
	//
	// Example:
	//
	//	cache := store.Collection("cache", WithTTL(5*time.Minute))
	//	docs := store.Collection("documents", WithIndexing(IndexTypeHNSW))
	Collection(name string, opts ...CollectionOption) Collection

	// ListCollections returns the names of all collections in this store.
	// This is useful for administration and debugging.
	ListCollections(ctx context.Context) ([]string, error)

	// DeleteCollection permanently deletes a collection and all its documents.
	// This operation cannot be undone.
	//
	// Returns an error if the collection doesn't exist or cannot be deleted.
	DeleteCollection(ctx context.Context, name string) error

	// Stats returns statistics about the vector store.
	// This includes total collections, documents, storage size, etc.
	Stats(ctx context.Context) (*StoreStats, error)

	// Close closes the connection to the vector database and releases resources.
	// After Close is called, the VectorStore should not be used.
	Close() error
}

VectorStore is the top-level interface for vector database operations. It provides methods for creating and managing collections, which are isolated namespaces for documents and embeddings.

VectorStore acts as a factory for Collections, enabling multi-tenancy, use-case isolation, and organized storage of embeddings.

Example:

store, err := memory.New()
if err != nil {
    log.Fatal(err)
}
defer store.Close()

// Create collection for semantic caching
cache := store.Collection("cache",
    WithTTL(5*time.Minute),
    WithDeduplication(true),
)

// Create collection for agent memory
memory := store.Collection("agent-memory",
    WithScope("user", "session"),
)

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
firestore
memory

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Vector Store Package

Table of Contents

Overview

Key Concepts

Features

Installation

Provider-Specific Dependencies

Quick Start

Basic Usage with Memory Provider

Production Setup with Firestore

Supported Providers

Memory Provider

Firestore Provider

API Reference

VectorStore Interface

Collection Interface

Document Structure

Query Structure

Composable Filters

Distance Metrics

Best Practices

1. Collection Design

2. Document Design

3. Batch Operations

4. Error Handling

5. Context and Timeouts

6. Resource Cleanup

Troubleshooting

Firestore Permission Denied

Firestore Index Not Ready

Query Returns No Results

Examples

Example 1: Semantic Caching

Example 2: Hybrid Search with Filters

Example 3: Streaming Large Results

Example 4: Multi-Modal Search

Performance Considerations

Memory Provider

Firestore Provider

Optimization Tips

Next Steps

Resources

Contributing

Documentation ¶

Overview ¶

Index ¶

Examples ¶

Constants ¶

Variables ¶

Functions ¶

func ForEach ¶

func GetTagFilter ¶

func GetTimeFilter ¶

func IsAndFilter ¶

func IsNotFilter ¶

func IsOrFilter ¶

func Validate ¶

func ValidateContent ¶

func ValidateEmbedding ¶

func ValidateID ¶

func ValidateMetadataKey ¶

func ValidateScope ¶

func ValidateTag ¶

func ValidateTemporal ¶

Types ¶

type BatchConfig ¶

func ApplyBatchOptions ¶

func (*BatchConfig) Validate ¶

type BatchOption ¶

func WithBatchSize ¶

func WithContinueOnError ¶

func WithMaxRetries ¶

func WithParallelism ¶

func WithProgressCallback ¶

func WithRetry ¶

func WithRetryDelay ¶

func WithValidation ¶

type Collection ¶

type CollectionConfig ¶