knowledge

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2026 License: MIT Imports: 25 Imported by: 0

Documentation

Overview

Package knowledge provides a side-effect-free knowledge base library: document storage, chunking, tokenization, and BM25 / semantic / hybrid retrieval over layered context (L0 abstract, L1 overview, L2 chunks).

Storage operations (Store.AddDocument, AddDocuments) only persist raw content and update search indexes; they do not call out to an LLM. To derive L0/L1, use the stateless GenerateDocumentContext and GenerateDatasetContext helpers and publish results back through the FSStore setters and sidecar writers. This keeps scheduling, retries, caching, and persistence concerns owned entirely by the caller.

Index

Constants

View Source
const (
	// AbstractPrompt produces a single-sentence L0 summary (~100 tokens).
	AbstractPrompt = `` /* 155-byte string literal not displayed */

	// OverviewPrompt produces a structured L1 overview (~1000 tokens).
	OverviewPrompt = `` /* 187-byte string literal not displayed */

	// DatasetOverviewPrompt produces a dataset-level L1 from per-document
	// abstracts ("- name: abstract" lines).
	DatasetOverviewPrompt = `` /* 196-byte string literal not displayed */

)

Prompt templates used to derive layered context from raw documents. Exported so callers can override or compose their own pipelines while reusing the SDK's defaults for the common case.

View Source
const DefaultPromptInputLimit = 8000

DefaultPromptInputLimit is the maximum number of characters of document content fed into a prompt. Content beyond this is truncated to keep prompts within typical context windows.

View Source
const DefaultThreshold = 0.1

Variables

View Source
var (
	DetectTokenizer = textsearch.DetectTokenizer
	NewCorpusStats  = textsearch.NewCorpusStats
	ExtractKeywords = textsearch.ExtractKeywords
	ScoreText       = textsearch.ScoreText
)

Functions

func KnowledgeNodeSchema

func KnowledgeNodeSchema() node.NodeSchema

KnowledgeNodeSchema returns the NodeSchema for the knowledge node type, used by the schema registry for frontend metadata.

func NewAddTool

func NewAddTool(ks Store) tool.Tool

func NewSearchTool

func NewSearchTool(ks Store) tool.Tool

func RegisterNode

func RegisterNode(ks Store)

RegisterNode registers the "knowledge" node builder with the SDK's node factory, capturing the given Store via closure. Call this from bootstrap before constructing the node factory.

func ScoreChunk

func ScoreChunk(chunk *Chunk, keywords []string, corpus *CorpusStats, tokenizer Tokenizer) float64

ScoreChunk computes the BM25 score for a chunk against query keywords.

Types

type CJKTokenizer

type CJKTokenizer = textsearch.CJKTokenizer

Type and function aliases re-exported from sdk/textsearch for backward compatibility. Internal code and tests can use these without changing import paths.

type CacheOption

type CacheOption func(*CachedStore)

CacheOption configures a CachedStore.

func WithMaxItems

func WithMaxItems(n int) CacheOption

WithMaxItems sets the maximum number of cached items.

func WithTTL

func WithTTL(d time.Duration) CacheOption

WithTTL sets the cache time-to-live.

type CachedStore

type CachedStore struct {
	// contains filtered or unexported fields
}

CachedStore wraps a Store with TTL + LRU caching for read operations. Write operations are forwarded and evict related cache entries.

func NewCachedStore

func NewCachedStore(inner Store, opts ...CacheOption) *CachedStore

NewCachedStore wraps inner with caching.

func (*CachedStore) Abstract

func (s *CachedStore) Abstract(ctx context.Context, datasetID, name string) (string, error)

func (*CachedStore) AddDocument

func (s *CachedStore) AddDocument(ctx context.Context, datasetID, name, content string) error

func (*CachedStore) AddDocuments

func (s *CachedStore) AddDocuments(ctx context.Context, datasetID string, docs []DocInput) error

func (*CachedStore) DatasetAbstract

func (s *CachedStore) DatasetAbstract(ctx context.Context, datasetID string) (string, error)

func (*CachedStore) DatasetOverview

func (s *CachedStore) DatasetOverview(ctx context.Context, datasetID string) (string, error)

func (*CachedStore) DeleteDocument

func (s *CachedStore) DeleteDocument(ctx context.Context, datasetID, name string) error

func (*CachedStore) EvictDataset

func (s *CachedStore) EvictDataset(datasetID string)

EvictDataset removes all cached entries for a dataset. Callers should invoke this after mutating the underlying store out-of-band — for example, after refreshing layered context on the inner FSStore via SetDocAbstract / SetDatasetOverview / etc.

func (*CachedStore) GetDocument

func (s *CachedStore) GetDocument(ctx context.Context, datasetID, name string) (*Document, error)

func (*CachedStore) ListDocuments

func (s *CachedStore) ListDocuments(ctx context.Context, datasetID string) ([]Document, error)

func (*CachedStore) Overview

func (s *CachedStore) Overview(ctx context.Context, datasetID, name string) (string, error)

func (*CachedStore) Search

func (s *CachedStore) Search(ctx context.Context, datasetID, query string, opts SearchOptions) ([]SearchResult, error)

type ChangeNotifier

type ChangeNotifier interface {
	Events() <-chan struct{}
	Close() error
}

ChangeNotifier emits an opaque event whenever the underlying source changes.

Concrete implementations live in adapter packages (e.g. sdkx/knowledge/watcher uses fsnotify) so that the sdk core stays dependency-free.

Events on the Events channel must be coalesced by the consumer; the Reloader below applies a debounce window. Implementations should close Events when Close is called.

type Chunk

type Chunk struct {
	DocName string `json:"doc_name"`
	Index   int    `json:"index"`
	Content string `json:"content"`
	Offset  int    `json:"offset"`
}

Chunk represents a segment of a document.

func ChunkDocument

func ChunkDocument(docName, content string, cfg ChunkConfig) []Chunk

ChunkDocument splits content into overlapping chunks, preferring to break at paragraph or sentence boundaries.

type ChunkConfig

type ChunkConfig struct {
	ChunkSize    int `json:"chunk_size,omitempty"`
	ChunkOverlap int `json:"chunk_overlap,omitempty"`
}

ChunkConfig controls document chunking.

func DefaultChunkConfig

func DefaultChunkConfig() ChunkConfig

DefaultChunkConfig returns the default chunking configuration.

type ContextLayer

type ContextLayer string

ContextLayer indicates the granularity of a search result.

const (
	LayerAbstract ContextLayer = "L0" // ~100 token one-sentence summary
	LayerOverview ContextLayer = "L1" // ~1k token structured overview
	LayerDetail   ContextLayer = "L2" // full chunk content
)

type CorpusStats

type CorpusStats = textsearch.CorpusStats

Type and function aliases re-exported from sdk/textsearch for backward compatibility. Internal code and tests can use these without changing import paths.

type DatasetContext

type DatasetContext struct {
	Abstract string // dataset-level L0
	Overview string // dataset-level L1
}

DatasetContext groups the layered context for an entire dataset.

func GenerateDatasetContext

func GenerateDatasetContext(ctx context.Context, l llm.LLM, summaries []DocumentSummary) (DatasetContext, error)

GenerateDatasetContext derives dataset-level L0 + L1 from per-document abstracts. The L1 overview is generated first, then distilled into L0. Returns an empty context with no error when summaries is empty.

type DatasetQuery

type DatasetQuery struct {
	DatasetID string `json:"dataset_id"`
	StateKey  string `json:"state_key"`
	TopK      int    `json:"top_k"`
}

DatasetQuery describes a single dataset search within a Knowledge node.

type DocInput

type DocInput struct {
	Name    string
	Content string
}

DocInput is a name+content pair for batch document ingestion.

type Document

type Document struct {
	Name     string            `json:"name"`
	Content  string            `json:"content"`
	Abstract string            `json:"abstract,omitempty"` // L0
	Overview string            `json:"overview,omitempty"` // L1
	Metadata map[string]string `json:"metadata,omitempty"`
}

Document represents a knowledge base document.

type DocumentContext

type DocumentContext struct {
	Abstract string // L0
	Overview string // L1
}

DocumentContext groups the layered context for a single document.

func GenerateDocumentContext

func GenerateDocumentContext(ctx context.Context, l llm.LLM, content string) (DocumentContext, error)

GenerateDocumentContext synthesizes L0 (abstract) and L1 (overview) for a document by issuing two LLM calls. Pure function: no I/O, no caching, no retries; callers own scheduling and persistence.

Returns a partial result on error: if abstract generation fails the zero-value context is returned with the error; if overview fails the already-generated abstract is preserved so callers can choose to persist it.

type DocumentSummary

type DocumentSummary struct {
	Name     string
	Abstract string
}

DocumentSummary pairs a document name with its L0 abstract, used as input to GenerateDatasetContext.

type Embedder

type Embedder = embedding.Embedder

Embedder is an alias for the SDK embedding.Embedder interface. It supports both single-text and batch embeddings.

type FSStore

type FSStore struct {
	// contains filtered or unexported fields
}

FSStore implements Store using a Workspace-backed file tree. Thread-safe via sync.RWMutex.

FSStore is intentionally side-effect free with respect to layered context: AddDocument persists raw content + builds the BM25/vector index, but does not synthesize L0/L1. Callers are expected to drive summarization explicitly via the GenerateDocumentContext / GenerateDatasetContext helpers and then publish results back through SetDocAbstract / SetDocOverview / SetDatasetAbstract / SetDatasetOverview (and WriteSidecar / WriteDatasetFile for persistence).

func NewFSStore

func NewFSStore(ws workspace.Workspace, opts ...FSStoreOption) *FSStore

NewFSStore creates a knowledge store rooted at the given prefix.

func (*FSStore) Abstract

func (s *FSStore) Abstract(ctx context.Context, datasetID, name string) (string, error)

func (*FSStore) AddDocument

func (s *FSStore) AddDocument(ctx context.Context, datasetID, name, content string) error

func (*FSStore) AddDocuments

func (s *FSStore) AddDocuments(ctx context.Context, datasetID string, docs []DocInput) error

func (*FSStore) BuildIndex

func (s *FSStore) BuildIndex(ctx context.Context) error

BuildIndex scans all datasets and builds the in-memory search index.

func (*FSStore) DatasetAbstract

func (s *FSStore) DatasetAbstract(_ context.Context, datasetID string) (string, error)

func (*FSStore) DatasetOverview

func (s *FSStore) DatasetOverview(_ context.Context, datasetID string) (string, error)

func (*FSStore) DeleteDocument

func (s *FSStore) DeleteDocument(ctx context.Context, datasetID, name string) error

func (*FSStore) GetDocument

func (s *FSStore) GetDocument(ctx context.Context, datasetID, name string) (*Document, error)

func (*FSStore) ListDocuments

func (s *FSStore) ListDocuments(ctx context.Context, datasetID string) ([]Document, error)

func (*FSStore) Overview

func (s *FSStore) Overview(ctx context.Context, datasetID, name string) (string, error)

func (*FSStore) Prefix

func (s *FSStore) Prefix() string

Prefix returns the FSStore directory prefix beneath WorkspaceRoot.

Combined: filepath.Join(WorkspaceRoot(), Prefix()) is the on-disk root.

func (*FSStore) ReindexVectors

func (s *FSStore) ReindexVectors(ctx context.Context) error

ReindexVectors regenerates vector embeddings for all indexed chunks. Safe to call after BuildIndex to restore semantic/hybrid search capability.

func (*FSStore) Search

func (s *FSStore) Search(ctx context.Context, datasetID, query string, opts SearchOptions) ([]SearchResult, error)

Search performs a two-level search: if datasetID is empty, first filter datasets by L0, then search within top datasets; otherwise search directly.

func (*FSStore) SetDatasetAbstract

func (s *FSStore) SetDatasetAbstract(datasetID, abstract string)

SetDatasetAbstract updates the in-memory dataset-level abstract.

func (*FSStore) SetDatasetOverview

func (s *FSStore) SetDatasetOverview(datasetID, overview string)

SetDatasetOverview updates the in-memory dataset-level overview.

func (*FSStore) SetDocAbstract

func (s *FSStore) SetDocAbstract(datasetID, name, abstract string)

SetDocAbstract updates the in-memory abstract for a document and refreshes abstract-layer corpus statistics. Intended to be called after deriving a new L0 (e.g. via GenerateDocumentContext); pair with WriteSidecar to make the change durable.

func (*FSStore) SetDocOverview

func (s *FSStore) SetDocOverview(datasetID, name, overview string)

SetDocOverview updates the in-memory overview for a document.

func (*FSStore) WorkspaceRoot

func (s *FSStore) WorkspaceRoot() string

WorkspaceRoot exposes the underlying workspace root when available.

Returns "" if the workspace does not implement Root().

func (*FSStore) WriteDatasetFile

func (s *FSStore) WriteDatasetFile(ctx context.Context, datasetID, filename, content string) error

WriteDatasetFile writes a dataset-level file (e.g. .abstract.md, .overview.md).

func (*FSStore) WriteSidecar

func (s *FSStore) WriteSidecar(ctx context.Context, datasetID, name, ext, content string) error

WriteSidecar writes a per-document sidecar file (e.g. ".abstract", ".overview") used to persist layered context derived externally.

type FSStoreOption

type FSStoreOption func(*FSStore)

FSStoreOption configures an FSStore.

func WithChunkConfig

func WithChunkConfig(cfg ChunkConfig) FSStoreOption

WithChunkConfig sets the chunking configuration.

func WithEmbedder

func WithEmbedder(e Embedder) FSStoreOption

WithEmbedder sets the embedder for semantic/hybrid search.

func WithTokenizer

func WithTokenizer(t Tokenizer) FSStoreOption

WithTokenizer sets the tokenizer for search and indexing.

type KnowledgeConfig

type KnowledgeConfig struct {
	Datasets []DatasetQuery `json:"datasets"`
	MaxLayer ContextLayer   `json:"max_layer,omitempty"` // L0, L1, L2 (default)
}

KnowledgeConfig configures a Knowledge node.

func KnowledgeConfigFromMap

func KnowledgeConfigFromMap(m map[string]any) KnowledgeConfig

KnowledgeConfigFromMap parses a KnowledgeConfig from a generic map.

type KnowledgeNode

type KnowledgeNode struct {
	// contains filtered or unexported fields
}

KnowledgeNode is a Go-native graph node for knowledge retrieval.

func NewKnowledgeNode

func NewKnowledgeNode(id string, store Store, config KnowledgeConfig) *KnowledgeNode

NewKnowledgeNode creates a Knowledge node. store may be nil (retrieval returns empty).

func (*KnowledgeNode) Config

func (n *KnowledgeNode) Config() map[string]any

func (*KnowledgeNode) ExecuteBoard

func (n *KnowledgeNode) ExecuteBoard(ectx graph.ExecutionContext, board *graph.Board) error

func (*KnowledgeNode) ID

func (n *KnowledgeNode) ID() string

func (*KnowledgeNode) InputPorts

func (n *KnowledgeNode) InputPorts() []graph.Port

func (*KnowledgeNode) OutputPorts

func (n *KnowledgeNode) OutputPorts() []graph.Port

func (*KnowledgeNode) SetConfig

func (n *KnowledgeNode) SetConfig(c map[string]any)

func (*KnowledgeNode) Type

func (n *KnowledgeNode) Type() string

type Reloader

type Reloader struct {
	// contains filtered or unexported fields
}

Reloader debounces ChangeNotifier events and triggers Rebuild on a stable trailing edge.

Typical use:

notifier, _ := watcher.NewFSNotifier(store) // sdkx adapter
r := knowledge.NewReloader(store, notifier, knowledge.ReloaderOptions{Debounce: 500 * time.Millisecond})
go r.Run(ctx)

Rebuild defaults to FSStore.BuildIndex; callers can override to integrate with their own RetrievalStore implementations.

func NewReloader

func NewReloader(store *FSStore, notifier ChangeNotifier, opts ReloaderOptions) *Reloader

NewReloader wires a ChangeNotifier to a rebuild callback.

When opts.Rebuild is nil and store is non-nil, the reloader falls back to store.BuildIndex(ctx).

func (*Reloader) Close

func (r *Reloader) Close() error

Close stops Run and the underlying ChangeNotifier.

func (*Reloader) Run

func (r *Reloader) Run(ctx context.Context) error

Run blocks until Close is called or ctx is cancelled.

type ReloaderOptions

type ReloaderOptions struct {
	Debounce time.Duration               // default 500ms
	Rebuild  func(context.Context) error // overrides the default rebuild fn
}

ReloaderOptions configures a Reloader.

type RetrievalStore

type RetrievalStore struct {
	// contains filtered or unexported fields
}

RetrievalStore is a Store implementation backed by a retrieval.Index ( Phase 2 swap-in).

Responsibilities — derived from the legacy FSStore but delegated to the unified retrieval layer:

  • Document persistence: one retrieval.Doc per chunk; document-level rows (L0/L1) live in dedicated namespaces so Search at MaxLayer=L0/L1 only scans the relevant tier.
  • Hybrid search: pipeline.Knowledge composes BM25/vector/RRFFusion; RetrievalStore does NOT contain its own ranking code (delegated).
  • Layered context: Abstract/Overview/DatasetAbstract/DatasetOverview are read by ID-prefix from a single namespace.

Namespace layout (one retrieval namespace per dataset+layer):

kb_<dataset>__chunks   — one Doc per chunk; metadata { dataset, doc_name, chunk_index }
kb_<dataset>__docmeta  — one Doc per document for L0/L1; ID = "abstract:<doc>" / "overview:<doc>" / "doc:<doc>"
kb__datasets           — dataset-level summaries; ID = "abstract:<dataset>" / "overview:<dataset>"

Caller responsibility: pre-compute embeddings for chunks (when supplying an Embedder) — RetrievalStore reuses GenericEmbedder if configured.

func NewRetrievalStore

func NewRetrievalStore(idx retrieval.Index, opts ...RetrievalStoreOption) *RetrievalStore

NewRetrievalStore wires a Store to a retrieval.Index. The store is safe for concurrent use; it does not own idx (caller must Close).

func (*RetrievalStore) Abstract

func (s *RetrievalStore) Abstract(ctx context.Context, datasetID, name string) (string, error)

Abstract implements Store.

func (*RetrievalStore) AddDocument

func (s *RetrievalStore) AddDocument(ctx context.Context, datasetID, name, content string) error

AddDocument implements Store.

func (*RetrievalStore) AddDocuments

func (s *RetrievalStore) AddDocuments(ctx context.Context, datasetID string, docs []DocInput) error

AddDocuments implements Store.

func (*RetrievalStore) DatasetAbstract

func (s *RetrievalStore) DatasetAbstract(ctx context.Context, datasetID string) (string, error)

DatasetAbstract implements Store.

func (*RetrievalStore) DatasetOverview

func (s *RetrievalStore) DatasetOverview(ctx context.Context, datasetID string) (string, error)

DatasetOverview implements Store.

func (*RetrievalStore) DeleteDocument

func (s *RetrievalStore) DeleteDocument(ctx context.Context, datasetID, name string) error

DeleteDocument implements Store.

func (*RetrievalStore) GetDocument

func (s *RetrievalStore) GetDocument(ctx context.Context, datasetID, name string) (*Document, error)

GetDocument implements Store. It assembles the document body by listing all chunks for the requested doc_name in chunk-index order.

func (*RetrievalStore) Index

func (s *RetrievalStore) Index() retrieval.Index

Index exposes the underlying retrieval.Index for callers that need to drop down to a non-Store API (e.g. List or Iterate for reindex).

func (*RetrievalStore) ListDocuments

func (s *RetrievalStore) ListDocuments(ctx context.Context, datasetID string) ([]Document, error)

ListDocuments implements Store.

func (*RetrievalStore) Overview

func (s *RetrievalStore) Overview(ctx context.Context, datasetID, name string) (string, error)

Overview implements Store.

func (*RetrievalStore) Search

func (s *RetrievalStore) Search(ctx context.Context, datasetID, query string, opts SearchOptions) ([]SearchResult, error)

Search implements Store via pipeline.Knowledge.

func (*RetrievalStore) SetAbstract

func (s *RetrievalStore) SetAbstract(ctx context.Context, datasetID, name, abstract string) error

SetAbstract / SetOverview let callers persist L0/L1 results without reusing the FSStore-specific sidecar mechanism.

func (*RetrievalStore) SetDatasetAbstract

func (s *RetrievalStore) SetDatasetAbstract(ctx context.Context, datasetID, abstract string) error

SetDatasetAbstract / SetDatasetOverview persist dataset-level summaries.

func (*RetrievalStore) SetDatasetOverview

func (s *RetrievalStore) SetDatasetOverview(ctx context.Context, datasetID, overview string) error

SetDatasetOverview persists a dataset-level L1 overview.

func (*RetrievalStore) SetOverview

func (s *RetrievalStore) SetOverview(ctx context.Context, datasetID, name, overview string) error

SetOverview persists the L1 overview for a document.

type RetrievalStoreOption

type RetrievalStoreOption func(*RetrievalStore)

RetrievalStoreOption configures a RetrievalStore.

func WithRetrievalChunkConfig

func WithRetrievalChunkConfig(c ChunkConfig) RetrievalStoreOption

WithRetrievalChunkConfig overrides the default chunk config.

func WithRetrievalEmbedder

func WithRetrievalEmbedder(e embedding.Embedder) RetrievalStoreOption

WithRetrievalEmbedder sets the embedder used to vectorize chunks at write time.

func WithRetrievalPipeline

func WithRetrievalPipeline(p *pipeline.Pipeline) RetrievalStoreOption

WithRetrievalPipeline overrides the default pipeline.Knowledge(emb, nil).

func WithRetrievalTokenizer

func WithRetrievalTokenizer(t Tokenizer) RetrievalStoreOption

WithRetrievalTokenizer overrides the BM25 tokenizer.

type SearchMode

type SearchMode string
const (
	ModeBM25     SearchMode = ""
	ModeSemantic SearchMode = "semantic"
	ModeHybrid   SearchMode = "hybrid"
)

type SearchOptions

type SearchOptions struct {
	TopK      int          `json:"top_k,omitempty"`
	MaxLayer  ContextLayer `json:"max_layer,omitempty"`
	Threshold float64      `json:"threshold,omitempty"`
	Mode      SearchMode   `json:"mode,omitempty"`
}

SearchOptions configures a knowledge search query.

type SearchResult

type SearchResult struct {
	Content    string         `json:"content"`
	Score      float64        `json:"score"`
	DocName    string         `json:"doc_name,omitempty"`
	ChunkIndex int            `json:"chunk_index,omitempty"`
	Layer      ContextLayer   `json:"layer"`
	Metadata   map[string]any `json:"metadata,omitempty"`
}

SearchResult represents a single search hit with its relevance score.

func RRFMerge

func RRFMerge(bm25Results, semanticResults []SearchResult, k int) []SearchResult

func RankResults

func RankResults(results []SearchResult, topK int) []SearchResult

RankResults sorts by score descending and limits to topK.

type SimpleTokenizer

type SimpleTokenizer = textsearch.SimpleTokenizer

Type and function aliases re-exported from sdk/textsearch for backward compatibility. Internal code and tests can use these without changing import paths.

type Store

type Store interface {
	AddDocument(ctx context.Context, datasetID, name, content string) error
	AddDocuments(ctx context.Context, datasetID string, docs []DocInput) error
	GetDocument(ctx context.Context, datasetID, name string) (*Document, error)
	DeleteDocument(ctx context.Context, datasetID, name string) error
	ListDocuments(ctx context.Context, datasetID string) ([]Document, error)
	Search(ctx context.Context, datasetID, query string, opts SearchOptions) ([]SearchResult, error)

	// Layered reads
	Abstract(ctx context.Context, datasetID, name string) (string, error)
	Overview(ctx context.Context, datasetID, name string) (string, error)

	// Dataset-level summaries
	DatasetAbstract(ctx context.Context, datasetID string) (string, error)
	DatasetOverview(ctx context.Context, datasetID string) (string, error)
}

Store abstracts knowledge base storage. Documents are organized by dataset.

type Tokenizer

type Tokenizer = textsearch.Tokenizer

Type and function aliases re-exported from sdk/textsearch for backward compatibility. Internal code and tests can use these without changing import paths.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL