pgvector

package module
v0.5.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 11, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

README

goagent/memory/vector/pgvector

PostgreSQL + pgvector store for goagent.

Implements goagent.VectorStore over PostgreSQL with the pgvector extension. Supports cosine, L2, and inner-product distance functions with HNSW indexing.

go get github.com/Germanblandin1/goagent/memory/vector/pgvector

Documentation

Documentation

Overview

Package pgvector implements goagent.VectorStore over PostgreSQL with the pgvector extension.

The caller describes their existing table via TableConfig — this package does not impose any schema. For a quick start without an existing table, use Migrate.

Metadata filtering

When MetadataColumn is set to a JSONB column, Search supports goagent.WithFilter to restrict results server-side using PostgreSQL's JSONB containment operator (@>). All key-value pairs in the filter map must be present in the stored metadata (AND semantics).

For best performance on large tables, create a GIN index on the metadata column:

CREATE INDEX ON embeddings USING gin(metadata jsonb_path_ops);

Without the index, PostgreSQL falls back to a sequential scan. For tables under ~100k rows, the sequential scan is typically fast enough.

Score threshold

goagent.WithScoreThreshold is applied in Go after the database returns results. topK is applied by the database first, so a selective threshold may yield fewer than topK results.

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	// Cosine uses cosine distance (<=>).
	// Score is 1 - distance, in [0, 1] for normalised vectors.
	// Recommended for most text embedding models.
	// HNSW operator class: vector_cosine_ops.
	Cosine = DistanceFunc{"<=>", "vector_cosine_ops"}

	// L2 uses Euclidean distance (<->).
	// Score is 1 / (1 + distance), in (0, 1].
	// Use when the model produces non-normalised vectors.
	// HNSW operator class: vector_l2_ops.
	L2 = DistanceFunc{"<->", "vector_l2_ops"}

	// InnerProduct uses negative inner product (<#>).
	// Score is the inner product (negated back to positive).
	// Equivalent to cosine similarity for normalised vectors;
	// marginally faster on some hardware.
	// HNSW operator class: vector_ip_ops.
	InnerProduct = DistanceFunc{"<#>", "vector_ip_ops"}
)

Functions

func Migrate

func Migrate(ctx context.Context, db *sql.DB, cfg MigrateConfig) error

Migrate creates the vector extension, table, and HNSW index if they do not exist. It is idempotent — safe to call multiple times without error. The created table has columns: id TEXT PK, embedding vector(Dims), content TEXT, metadata JSONB, created_at TIMESTAMPTZ.

To use the created table with New:

cfg := pgvector.MigrateConfig{TableName: "goagent_embeddings", Dims: 1536}
if err := pgvector.Migrate(ctx, db, cfg); err != nil { ... }

store, err := pgvector.New(db, pgvector.TableConfig{
    Table:          cfg.TableName,
    IDColumn:       "id",
    VectorColumn:   "embedding",
    TextColumn:     "content",
    MetadataColumn: "metadata",
})
Example
package main

import (
	"context"
	"database/sql"
	"fmt"
	"log"

	"github.com/Germanblandin1/goagent/memory/vector/pgvector"
	_ "github.com/jackc/pgx/v5/stdlib"
)

func main() {
	ctx := context.Background()
	db, err := sql.Open("pgx", "postgres://user:pass@localhost/mydb")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	cfg := pgvector.MigrateConfig{
		TableName: "goagent_embeddings",
		Dims:      1536, // match your embedding model (e.g. text-embedding-3-small)
	}
	if err := pgvector.Migrate(ctx, db, cfg); err != nil {
		log.Fatal(err)
	}

	store, err := pgvector.New(db, pgvector.TableConfig{
		Table:          cfg.TableName,
		IDColumn:       "id",
		VectorColumn:   "embedding",
		TextColumn:     "content",
		MetadataColumn: "metadata",
	})
	if err != nil {
		log.Fatal(err)
	}
	_ = store
	fmt.Println("store ready")
}

Types

type DistanceFunc

type DistanceFunc struct {
	// contains filtered or unexported fields
}

DistanceFunc defines the vector distance operator used for similarity search and the corresponding HNSW operator class for index creation.

The DistanceFunc passed to New and the operator class used in Migrate must match — a mismatch causes pgvector to skip the index and fall back to a sequential scan.

The three built-in values (Cosine, L2, InnerProduct) cover all operators supported by pgvector. For custom or future operators use NewDistanceFunc.

func NewDistanceFunc

func NewDistanceFunc(operator, opsClass string) DistanceFunc

NewDistanceFunc constructs a DistanceFunc for a custom or future pgvector operator. operator is the SQL infix operator (e.g. "<=>") and opsClass is the HNSW operator class (e.g. "vector_cosine_ops").

Use the built-in values (Cosine, L2, InnerProduct) for standard pgvector operators.

type MigrateConfig

type MigrateConfig struct {
	// TableName is the name of the table to create. Default: "goagent_embeddings".
	TableName string

	// Dims is the vector dimension. Required. Must match the embedding model
	// used (768, 1024, 1536, etc.).
	Dims int

	// DistanceFunc sets the HNSW operator class for the index.
	// Must match the WithDistanceFunc option passed to New.
	// Default: Cosine.
	DistanceFunc DistanceFunc

	// HNSWm is the m parameter of the HNSW index (connections per node).
	// Common values: 8 (reduced memory), 16 (default), 32 (higher recall).
	// Default: 16.
	HNSWm int

	// HNSWefConstruction is the candidate set size during index construction.
	// Higher value = more accurate but slower to build.
	// Default: 64.
	HNSWefConstruction int
}

MigrateConfig configures the table that Migrate will create. Use this when the caller has no existing table and wants a quick start. For production, review the HNSW index parameters based on expected volume.

type Querier

type Querier interface {
	ExecContext(ctx context.Context, query string, args ...any) (sql.Result, error)
	QueryContext(ctx context.Context, query string, args ...any) (*sql.Rows, error)
}

Querier is the minimal interface the Store needs from a connection. Both *sql.DB and *sql.Tx satisfy it, as does any pgx adapter that wraps them.

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store implements goagent.VectorStore over PostgreSQL with the pgvector extension.

func New

func New(db Querier, cfg TableConfig, opts ...StoreOption) (*Store, error)

New creates a Store backed by db using the given TableConfig and options. Returns an error if any required field is missing or contains invalid characters.

Example
package main

import (
	"database/sql"
	"fmt"
	"log"

	"github.com/Germanblandin1/goagent/memory/vector/pgvector"
	_ "github.com/jackc/pgx/v5/stdlib"
)

func main() {
	db, err := sql.Open("pgx", "postgres://user:pass@localhost/mydb")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	store, err := pgvector.New(db, pgvector.TableConfig{
		Table:          "embeddings",
		IDColumn:       "id",
		VectorColumn:   "embedding",
		TextColumn:     "content",
		MetadataColumn: "metadata", // optional; omit if your table has no metadata column
	})
	if err != nil {
		log.Fatal(err)
	}
	_ = store
	fmt.Println("store created")
}

func (*Store) BulkDelete added in v0.5.4

func (s *Store) BulkDelete(ctx context.Context, ids []string) error

BulkDelete removes all entries with the given ids in a single query using a parameterized IN clause. IDs that do not exist are silently ignored.

func (*Store) BulkUpsert added in v0.5.4

func (s *Store) BulkUpsert(ctx context.Context, entries []goagent.UpsertEntry) error

BulkUpsert stores or updates all entries in a single database transaction when the underlying connection supports BeginTx (e.g. *sql.DB). Otherwise entries are upserted sequentially with individual calls. The operation is idempotent; duplicate IDs within entries follow last-write-wins semantics.

func (*Store) Delete

func (s *Store) Delete(ctx context.Context, id string) error

Delete removes the entry with the given id from the store. It is a no-op if id does not exist.

func (*Store) Search

func (s *Store) Search(ctx context.Context, vec []float32, topK int, opts ...goagent.SearchOption) ([]goagent.ScoredMessage, error)

Search returns the topK messages most similar to vec, ordered by similarity descending. Each returned Message has RoleDocument so it is never forwarded to a provider.

WithFilter applies a JSONB containment filter (metadata @> filter::jsonb) server-side. Requires MetadataColumn to be set in TableConfig; silently ignored otherwise. topK is applied after the filter by the database, so fewer than topK results may be returned when the filter is selective. WithScoreThreshold is applied post-query in Go.

Example (WithFilter)

ExampleStore_Search_withFilter demonstrates metadata filtering using goagent.WithFilter.

The filter is applied server-side with PostgreSQL's JSONB containment operator (metadata @> filter::jsonb), so only rows whose metadata contains all specified key-value pairs are candidates for similarity ranking.

Scenario: a shared embedding table used by multiple teams. Each document is tagged with "team" and "language". A query for the engineering team must not surface documents owned by other teams, even if they are semantically close.

For large tables, create a GIN index to make the JSONB filter index-assisted:

CREATE INDEX ON goagent_embeddings USING gin(metadata jsonb_path_ops);
package main

import (
	"context"
	"database/sql"
	"fmt"
	"log"

	"github.com/Germanblandin1/goagent"
	"github.com/Germanblandin1/goagent/memory/vector/pgvector"
	_ "github.com/jackc/pgx/v5/stdlib"
)

func main() {
	ctx := context.Background()
	db, err := sql.Open("pgx", "postgres://user:pass@localhost/mydb")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	store, err := pgvector.New(db, pgvector.TableConfig{
		Table:          "goagent_embeddings",
		IDColumn:       "id",
		VectorColumn:   "embedding",
		TextColumn:     "content",
		MetadataColumn: "metadata", // required for WithFilter
	})
	if err != nil {
		log.Fatal(err)
	}

	// Upsert documents with team and language metadata.
	// In production this step runs at index time, not at query time.
	docs := []struct {
		id   string
		vec  []float32
		text string
		meta map[string]any
	}{
		{
			id:   "eng-go-001",
			vec:  []float32{0.1, 0.9, 0.0},
			text: "Error handling patterns in Go use explicit return values.",
			meta: map[string]any{"team": "engineering", "language": "go"},
		},
		{
			id:   "eng-py-001",
			vec:  []float32{0.1, 0.85, 0.05},
			text: "Python exceptions propagate up the call stack automatically.",
			meta: map[string]any{"team": "engineering", "language": "python"},
		},
		{
			id:   "mkt-001",
			vec:  []float32{0.15, 0.8, 0.1},
			text: "Our error-free delivery guarantee is central to our brand.",
			meta: map[string]any{"team": "marketing", "language": "en"},
		},
	}
	for _, d := range docs {
		msg := goagent.Message{
			Role:     goagent.RoleDocument,
			Content:  []goagent.ContentBlock{goagent.TextBlock(d.text)},
			Metadata: d.meta,
		}
		if err := store.Upsert(ctx, d.id, d.vec, msg); err != nil {
			log.Fatal(err)
		}
	}

	// Query embedding for "how do I handle errors".
	queryVec := []float32{0.1, 0.88, 0.02}

	// Without filter: all three documents are candidates.
	// With filter: only the Go engineering document qualifies, even though the
	// marketing document has a similar embedding.
	results, err := store.Search(ctx, queryVec, 5,
		goagent.WithFilter(map[string]any{
			"team":     "engineering",
			"language": "go",
		}),
	)
	if err != nil {
		log.Fatal(err)
	}

	for _, r := range results {
		fmt.Printf("score=%.2f team=%s text=%s\n",
			r.Score,
			r.Message.Metadata["team"],
			r.Message.TextContent(),
		)
	}
}
Example (WithScoreThresholdAndFilter)

ExampleStore_Search_withScoreThresholdAndFilter demonstrates combining a score threshold with a metadata filter.

The threshold is applied after the database returns topK results. The filter is applied server-side before ranking. Together they let you express: "give me the top 10 engineering docs, but only if they are at least 80% similar to my query".

package main

import (
	"context"
	"database/sql"
	"fmt"
	"log"

	"github.com/Germanblandin1/goagent"
	"github.com/Germanblandin1/goagent/memory/vector/pgvector"
	_ "github.com/jackc/pgx/v5/stdlib"
)

func main() {
	ctx := context.Background()
	db, err := sql.Open("pgx", "postgres://user:pass@localhost/mydb")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	store, err := pgvector.New(db, pgvector.TableConfig{
		Table:          "goagent_embeddings",
		IDColumn:       "id",
		VectorColumn:   "embedding",
		TextColumn:     "content",
		MetadataColumn: "metadata",
	})
	if err != nil {
		log.Fatal(err)
	}

	queryVec := []float32{0.1, 0.88, 0.02}

	results, err := store.Search(ctx, queryVec, 10,
		goagent.WithFilter(map[string]any{"team": "engineering"}),
		goagent.WithScoreThreshold(0.80), // discard results below 80% similarity
	)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("results above threshold: %d\n", len(results))
}

func (*Store) Upsert

func (s *Store) Upsert(ctx context.Context, id string, vec []float32, msg goagent.Message) error

Upsert stores or updates the message and its embedding vector under id. The operation is idempotent: calling Upsert twice with the same id replaces the first entry with the second. Only the text content and metadata from msg are persisted — Role and ToolCalls are not stored.

type StoreOption

type StoreOption func(*storeOptions)

StoreOption configures the behaviour of a Store.

func WithDistanceFunc

func WithDistanceFunc(d DistanceFunc) StoreOption

WithDistanceFunc sets the distance operator used in Search. Must match the operator class of the table's vector index. Default: Cosine.

type TableConfig

type TableConfig struct {
	// Table is the table or view name. May include schema: "public.embeddings".
	Table string

	// IDColumn is the PRIMARY KEY column of type TEXT.
	IDColumn string

	// VectorColumn is the column of type vector(n) where n is the embedding
	// model dimension (e.g. 1536 for text-embedding-3-small, 768 for
	// nomic-embed-text, 1024 for voyage-3).
	VectorColumn string

	// TextColumn is the TEXT column containing the chunk text.
	// Its value is returned as a goagent.ContentBlock of type text in ScoredMessage.
	TextColumn string

	// MetadataColumn is optional. If non-empty, it must be a JSONB column.
	// Its content is deserialized into Message.Metadata of each ScoredMessage.
	MetadataColumn string
}

TableConfig describes the schema of the caller's vector table. No field has a default — the caller must be explicit. All fields are required except MetadataColumn.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL