vector

package
v0.1.8-rc.22 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2026 License: Apache-2.0 Imports: 33 Imported by: 0

Documentation

Overview

Package vector provides the vector store used for Genie's semantic memory: embedding and searching over documents (runbooks, synced data from Drive, Gmail, Slack, etc.) so the agent can retrieve relevant context via memory_search.

It solves the problem of giving the agent access to large, heterogeneous corpora: items are embedded, stored in an IStore (in-memory or Qdrant), and queried by semantic similarity. The sync pipeline upserts NormalizedItems from data sources; tools expose search and (when configured) delete. Without this package, the agent would have no persistent, searchable memory across runs.

Index

Constants

View Source
const (
	MemoryStoreToolName  = "memory_store"
	MemorySearchToolName = "memory_search"
	MemoryDeleteToolName = "memory_delete"
	MemoryListToolName   = "memory_list"
	MemoryMergeToolName  = "memory_merge"
)

Tool name constants for the vector memory tools. Use these instead of magic strings when referencing memory tools elsewhere (e.g. retrieval tool classification, empty-memory guard, loop detection).

View Source
const MaxCharsPerEmbeddingChunk = 16_000

MaxCharsPerEmbeddingChunk is a safe character limit so that chunked text stays under embedding model token limits (e.g. OpenAI text-embedding-3-small 8191 tokens). Email/HTML can be ~2 chars per token; 8000 tokens ≈ 16000 chars.

View Source
const MetaAgentName = "__agent_name"

MetaAgentName is the metadata key used to scope documents to a specific agent, preventing one agent from reading or overwriting another agent's data in a shared vector store collection.

View Source
const MetaLogicalID = "_logical_id"

MetaLogicalID is the reserved metadata key that stores the caller-facing document ID. ScopedStore namespaces internal document IDs by visibility scope to prevent cross-scope collisions, and uses this key to recover the original logical ID on search output. Callers must never set this key manually — ScopedStore strips any caller-supplied value.

View Source
const MetaVisibility = "visibility"

MetaVisibility is the reserved metadata key used to scope documents by visibility (private, group, or global). It is automatically injected by ScopedStore on every write and used as a filter on every read. Callers must never set this key manually — ScopedStore strips any caller-supplied value and replaces it with the derived scope.

View Source
const SkillNameKey = "skill_name"
View Source
const VisibilityGlobal = "global"

VisibilityGlobal is the visibility scope value for documents that should be accessible to all users and channels within a deployment agent. Used by raw write paths (data source sync, activity reports, learned skills, tool index, graph entities/relations) to stamp content that must remain searchable from any context.

Variables

View Source
var DefaultContentChunker = NewContentChunker(MaxCharsPerEmbeddingChunk, embeddingChunkOverlap)

DefaultContentChunker is the shared content-aware chunker using the standard embedding chunk size and overlap. Datasource connectors should use this instead of creating their own.

Functions

func ChunkTextForEmbedding

func ChunkTextForEmbedding(text string) []string

ChunkTextForEmbedding splits text into chunks using trpc-agent-go's RecursiveChunking strategy (paragraph→line→word→character boundaries, with 10% overlap). Returns a single chunk if text is short enough; otherwise multiple chunks suitable for embedding one at a time.

func NewMemoryDeleteTool

func NewMemoryDeleteTool(store IStore) tool.Tool

NewMemoryDeleteTool creates a tool that deletes entries from vector memory by ID. Use this to clean up stale, incorrect, or outdated memories.

func NewMemoryListTool

func NewMemoryListTool(store IStore, cfg *Config) tool.Tool

NewMemoryListTool creates a tool that lists entries from vector memory. Supports metadata filtering to browse specific categories.

func NewMemoryMergeTool

func NewMemoryMergeTool(store IStore, cfg *Config) tool.Tool

NewMemoryMergeTool creates a tool that merges multiple memory entries into one. The agent provides the consolidated text; the tool upserts it under the first ID and deletes the remaining IDs.

func NewMemorySearchTool

func NewMemorySearchTool(store IStore, cfg *Config) tool.Tool

NewMemorySearchTool creates a tool that searches the vector memory. When cfg.AllowedMetadataKeys is set, only those keys may be used in filter.

func NewMemoryStoreTool

func NewMemoryStoreTool(store IStore, cfg *Config, scorer MemoryImportanceScorer) tool.Tool

NewMemoryStoreTool creates a tool that stores text into the vector memory. When cfg.AllowedMetadataKeys is set, only those keys are accepted in metadata. If req.ID is set, the store is upserted (existing document with that ID is replaced). If scorer is non-nil, each write is scored for importance and the score is stored as "_importance" metadata for retrieval-time quality filtering.

Types

type AddRequest

type AddRequest struct {
	Items []BatchItem
}

AddRequest holds the items to insert into the vector store.

type BatchItem

type BatchItem struct {
	ID       string
	Text     string
	Metadata map[string]string
}

BatchItem represents a single document to be stored via Add.

func ChunkContentToBatchItems

func ChunkContentToBatchItems(itemID, content string, baseMeta map[string]string) ([]BatchItem, error)

ChunkContentToBatchItems splits content into chunks and returns BatchItems ready for Upsert with stable IDs and chunk metadata. Uses the default RecursiveChunking strategy.

type Config

type Config struct {
	// PersistenceDir is the directory where the vector store snapshot is
	// saved as a JSON file. If empty, the store is ephemeral (in-memory only).
	// Note: PersistenceDir is ignored when using an external store (Qdrant),
	// as those handle persistence internally.
	PersistenceDir    string `yaml:"persistence_dir,omitempty" toml:"persistence_dir,omitempty"`
	EmbeddingProvider string `yaml:"embedding_provider,omitempty" toml:"embedding_provider,omitempty"` // "openai", "ollama", "huggingface", "gemini"
	APIKey            string `yaml:"api_key,omitempty" toml:"api_key,omitempty"`
	OllamaURL         string `yaml:"ollama_url,omitempty" toml:"ollama_url,omitempty"`
	OllamaModel       string `yaml:"ollama_model,omitempty" toml:"ollama_model,omitempty"`
	HuggingFaceURL    string `yaml:"huggingface_url,omitempty" toml:"huggingface_url,omitempty"`
	GeminiAPIKey      string `yaml:"gemini_api_key,omitempty" toml:"gemini_api_key,omitempty"`
	GeminiModel       string `yaml:"gemini_model,omitempty" toml:"gemini_model,omitempty"`
	// VectorStoreProvider specifies the vector store backend to use.
	// Options: "inmemory" (default), "qdrant"
	VectorStoreProvider string `yaml:"vector_store_provider,omitempty" toml:"vector_store_provider,omitempty"`
	// Qdrant configuration (only used when VectorStoreProvider is "qdrant")
	Qdrant qdrantstore.Config `yaml:"qdrant,omitempty" toml:"qdrant,omitempty"`
	// AllowedMetadataKeys optionally restricts which metadata keys may be used in
	// memory_store and memory_search. If non-empty, only these keys are accepted
	// for metadata (store) and filter (search), enabling product/category buckets.
	AllowedMetadataKeys []string `yaml:"allowed_metadata_keys,omitempty" toml:"allowed_metadata_keys,omitempty"`
}

Config holds the configuration for the vector store. It supports OpenAI, Ollama (via OpenAI-compatible endpoint), HuggingFace Text-Embeddings-Inference, Gemini, and a deterministic dummy embedder for development and testing.

func DefaultConfig

func DefaultConfig(ctx context.Context, sp security.SecretProvider) Config

DefaultConfig builds the default vector store configuration by resolving API keys and endpoints through the given SecretProvider. Without a SecretProvider, callers can pass security.NewEnvProvider() to preserve the legacy os.Getenv behavior.

func (Config) NewStore

func (cfg Config) NewStore(ctx context.Context) (*Store, error)

NewStore creates a new vector store backed by trpc-agent-go/knowledge. If cfg.PersistenceDir is set and using in-memory store, existing data is loaded from disk. If using Qdrant, persistence is handled by the external store itself.

type ContentChunker

type ContentChunker struct {
	// contains filtered or unexported fields
}

ContentChunker selects a chunking strategy based on content type. Reusable across data sources (GDrive, Slack, Gmail, etc.). Each datasource calls ChunkForType with the appropriate content type to get intelligent splitting.

func NewContentChunker

func NewContentChunker(chunkSize, overlap int) *ContentChunker

NewContentChunker creates a ContentChunker with the given size and overlap. It initialises both RecursiveChunking (for generic text) and MarkdownChunking (for structured documents) from trpc-agent-go.

func (*ContentChunker) ChunkForType

func (cc *ContentChunker) ChunkForType(text string, ct ContentType) []string

ChunkForType splits text using the strategy for the given content type and returns the text chunks.

func (*ContentChunker) ChunkToBatchItemsForType

func (cc *ContentChunker) ChunkToBatchItemsForType(itemID, content string, baseMeta map[string]string, ct ContentType) ([]BatchItem, error)

ChunkToBatchItemsForType splits content and returns BatchItems using the strategy appropriate for the content type.

func (*ContentChunker) StrategyFor

func (cc *ContentChunker) StrategyFor(ct ContentType) chunking.Strategy

StrategyFor returns the appropriate chunking strategy for the content type.

type ContentType

type ContentType int

ContentType classifies document content for strategy selection.

const (
	// ContentTypePlain is generic text content.
	ContentTypePlain ContentType = iota
	// ContentTypeMarkdown is markdown-structured content (Google Docs, .md files).
	ContentTypeMarkdown
)

func ContentTypeFromMIME

func ContentTypeFromMIME(mime string) ContentType

ContentTypeFromMIME maps a MIME type to a ContentType. Google Workspace document types are treated as markdown because exported text often contains heading-like structure. This function is reusable by any datasource.

type DeleteRequest

type DeleteRequest struct {
	IDs []string
}

DeleteRequest holds the IDs to remove from the vector store.

type IStore

type IStore interface {
	Search(ctx context.Context, req SearchRequest) ([]SearchResult, error)
	Add(ctx context.Context, req AddRequest) error
	// Upsert replaces existing documents with the same ID, or inserts if not present.
	// Use a stable ID (e.g. source:external_id) to overwrite memory when appropriate.
	Upsert(ctx context.Context, req UpsertRequest) error
	Delete(ctx context.Context, req DeleteRequest) error
	Close(ctx context.Context) error
}

func NewScopedStore

func NewScopedStore(inner IStore) IStore

NewScopedStore returns an IStore that enforces per-user/per-channel visibility on every operation. Wrap the raw store before passing it to NewToolProvider so that all LLM-facing memory tools are automatically scoped. The orchestrator and other internal callers should continue using the unwrapped store (they manage their own visibility filters).

type MemoryDeleteRequest

type MemoryDeleteRequest struct {
	IDs []string `json:"ids" jsonschema:"description=List of memory IDs to delete,required"`
}

MemoryDeleteRequest is the input for the memory_delete tool.

type MemoryDeleteResponse

type MemoryDeleteResponse struct {
	Deleted int    `json:"deleted"`
	Message string `json:"message"`
}

MemoryDeleteResponse is the output for the memory_delete tool.

func (MemoryDeleteResponse) MarshalJSON

func (r MemoryDeleteResponse) MarshalJSON() ([]byte, error)

MarshalJSON implements custom JSON marshaling for tool responses.

type MemoryImportanceScorer

type MemoryImportanceScorer interface {
	// ScoreText returns an importance score (1-10) for the given text.
	ScoreText(ctx context.Context, text string) int
}

MemoryImportanceScorer scores text content for importance on a 1-10 scale. This is a local interface to avoid importing reactree/memory; callers can satisfy it with rtmemory.ImportanceScorer or a no-op implementation.

type MemoryListRequest

type MemoryListRequest struct {
	Filter map[string]string `json:"filter,omitempty" jsonschema:"description=Optional metadata filter to narrow results (e.g. type=accomplishment)"`
	Limit  int               `json:"limit,omitempty" jsonschema:"description=Maximum entries to return (default 20)"`
}

MemoryListRequest is the input for the memory_list tool.

type MemoryListResponse

type MemoryListResponse struct {
	Entries []MemorySearchResultItem `json:"entries"`
	Count   int                      `json:"count"`
}

MemoryListResponse is the output for the memory_list tool.

func (MemoryListResponse) MarshalJSON

func (r MemoryListResponse) MarshalJSON() ([]byte, error)

MarshalJSON implements custom JSON marshaling for tool responses.

type MemoryMergeRequest

type MemoryMergeRequest struct {
	IDs        []string          `json:"ids" jsonschema:"description=List of memory IDs to merge (minimum 2),required"`
	MergedText string            `json:"merged_text" jsonschema:"description=The consolidated text for the merged memory entry,required"`
	Metadata   map[string]string `` /* 136-byte string literal not displayed */
}

MemoryMergeRequest is the input for the memory_merge tool.

type MemoryMergeResponse

type MemoryMergeResponse struct {
	MergedID     string `json:"merged_id"`
	DeletedCount int    `json:"deleted_count"`
	Message      string `json:"message"`
}

MemoryMergeResponse is the output for the memory_merge tool.

func (MemoryMergeResponse) MarshalJSON

func (r MemoryMergeResponse) MarshalJSON() ([]byte, error)

MarshalJSON implements custom JSON marshaling for tool responses.

type MemorySearchRequest

type MemorySearchRequest struct {
	Query  string            `json:"query" jsonschema:"description=The search query to find relevant memories,required"`
	Limit  int               `json:"limit,omitempty" jsonschema:"description=Maximum number of results to return (default 5)"`
	Filter map[string]string `` /* 153-byte string literal not displayed */
}

MemorySearchRequest is the input for the memory_search tool.

type MemorySearchResponse

type MemorySearchResponse struct {
	Results []MemorySearchResultItem `json:"results"`
	Count   int                      `json:"count"`
}

MemorySearchResponse is the output for the memory_search tool.

func (MemorySearchResponse) MarshalJSON

func (r MemorySearchResponse) MarshalJSON() ([]byte, error)

MarshalJSON implements custom JSON marshaling for tool responses.

type MemorySearchResultItem

type MemorySearchResultItem struct {
	ID         string            `json:"id"`
	Content    string            `json:"content"`
	Metadata   map[string]string `json:"metadata,omitempty"`
	Similarity float64           `json:"similarity"`
}

MemorySearchResultItem represents a single search result from the memory_search tool.

type MemoryStoreRequest

type MemoryStoreRequest struct {
	Text     string            `json:"text" jsonschema:"description=The text content to store in memory,required"`
	Metadata map[string]string `` /* 144-byte string literal not displayed */
	ID       string            `` /* 133-byte string literal not displayed */
}

MemoryStoreRequest is the input for the memory_store tool.

type MemoryStoreResponse

type MemoryStoreResponse struct {
	ID      string `json:"id"`
	Message string `json:"message"`
}

MemoryStoreResponse is the output for the memory_store tool.

func (MemoryStoreResponse) MarshalJSON

func (r MemoryStoreResponse) MarshalJSON() ([]byte, error)

MarshalJSON implements custom JSON marshaling for tool responses.

type SearchRequest

type SearchRequest struct {
	Query  string
	Limit  int
	Filter map[string]string
}

SearchRequest holds the parameters for a vector store search. When Filter is non-nil, only documents whose metadata contains ALL specified entries are returned. This replaces the former SearchWithFilter method — an unfiltered search simply leaves Filter nil.

type SearchResult

type SearchResult struct {
	ID       string            `json:"id"`
	Content  string            `json:"content"`
	Metadata map[string]string `json:"metadata,omitempty"`
	Score    float64           `json:"score"`
}

SearchResult represents a single result returned by Store.Search. It contains the matched document content, its metadata and the cosine similarity score (0.0–1.0, higher is more similar).

func (SearchResult) String

func (s SearchResult) String() string

type SearchResults

type SearchResults []SearchResult

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store wraps a trpc-agent-go vector store and embedder to provide simple add/search operations for agent memory. When PersistenceDir is set and using in-memory store, the store snapshots its state to disk after every Add and restores it on startup. When using Qdrant, persistence is handled by the external store itself.

func (*Store) Add

func (s *Store) Add(ctx context.Context, req AddRequest) error

Add stores one or more documents in the vector store. When multiple items are provided, their embeddings are generated concurrently via errgroup, reducing wall-clock latency from N×round-trip to max(round-trip). A single disk snapshot is taken at the end.

func (*Store) Close

func (s *Store) Close(ctx context.Context) error

Close flushes any pending state to disk (if persistence is configured). For Qdrant stores, it closes the client connection. It is safe to call multiple times.

func (*Store) Delete

func (s *Store) Delete(ctx context.Context, req DeleteRequest) error

Delete removes one or more documents by their IDs from the vector store. A single snapshot is taken at the end. Errors from individual deletes are collected but do not stop processing of remaining items.

func (*Store) Search

func (s *Store) Search(ctx context.Context, req SearchRequest) ([]SearchResult, error)

Search finds semantically similar documents, optionally filtered by metadata key-value pairs. Only documents whose metadata contains ALL specified filter entries are returned. Pass nil Filter for unfiltered search. This enables source-based memory isolation (e.g. per-sender, per-channel).

func (*Store) Upsert

func (s *Store) Upsert(ctx context.Context, req UpsertRequest) error

Upsert replaces documents with the same ID (delete then add). Use a stable ID (e.g. source:external_id) so that re-ingestion overwrites rather than duplicates.

type ToolProvider

type ToolProvider struct {
	// contains filtered or unexported fields
}

ToolProvider wraps an IStore and optional Config, and satisfies the tools.ToolProviders interface so vector memory tools can be passed directly to tools.NewRegistry. When cfg is non-nil and AllowedMetadataKeys is set, memory_store and memory_search only accept those keys for metadata and filter (product/category buckets).

func NewToolProvider

func NewToolProvider(store IStore, cfg *Config, scorer MemoryImportanceScorer) *ToolProvider

NewToolProvider creates a ToolProvider for the vector memory tools (memory_store and memory_search). cfg may be nil; when set with AllowedMetadataKeys, only those metadata keys are allowed. scorer may be nil; when set, memory_store writes are scored for importance and the score is stored as "_importance" metadata.

func (*ToolProvider) GetTools

func (p *ToolProvider) GetTools(_ context.Context) []tool.Tool

GetTools returns the memory store and memory search tools.

type UpsertRequest

type UpsertRequest struct {
	Items []BatchItem
}

UpsertRequest holds the items to upsert (replace-or-insert) in the vector store.

Directories

Path Synopsis
Package qdrantstore provides the Qdrant vector store backend for Genie's semantic memory.
Package qdrantstore provides the Qdrant vector store backend for Genie's semantic memory.
Code generated by counterfeiter.
Code generated by counterfeiter.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL