kg

package
v1.18.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 13, 2026 License: MIT Imports: 11 Imported by: 0

Documentation

Overview

Package kg implements a bitemporal knowledge graph backed by SQLite.

Data Model

The KG stores SPO (Subject-Predicate-Object) triples about entities extracted from conversation memories. Each triple carries two time dimensions:

  • Valid time (valid_from/valid_until): when the fact is true in the real world
  • Transaction time (txn_from/txn_until): when the row exists in the database

Functional predicates enforce that only one triple per (subject, predicate) may be currently valid — inserting a new value automatically invalidates the old one.

Entity aliases enable case and accent-insensitive matching: "Maria", "maria", and "María" all resolve to the same entity_id.

ADR References

  • ADR-003: KG shares the same SQLite file as the memory store (no separate DB)
  • ADR-009: Parameterized SQL required for all user-facing queries

Wing Inheritance

Triples inherit the wing of their source memory. A triple with wing IS NULL is neutral — never filtered, never penalized, never downgraded.

Index

Constants

View Source
const KgDropSchema = `` /* 608-byte string literal not displayed */

KgDropSchema reverses the KG migration. Drops KG tables and indexes. Safe to call on any database — uses IF EXISTS.

View Source
const KgSchema = `` /* 3464-byte string literal not displayed */

KgSchema contains all CREATE TABLE and CREATE INDEX statements for the Knowledge Graph bitemporal store. All statements are idempotent (IF NOT EXISTS).

Variables

View Source
var PII_PREDICATE_BLACKLIST = map[string]bool{
	"ssn":          true,
	"credit_card":  true,
	"password":     true,
	"address":      true,
	"cpf":          true,
	"rg":           true,
	"phone":        true,
	"email":        true,
	"bank_account": true,
}

PII_PREDICATE_BLACKLIST lists predicates that are ALWAYS dropped before insertion, regardless of source. This is a data-type classification, not a language keyword list.

Functions

func DefaultPatternSets

func DefaultPatternSets() []*patterns.PatternSet

DefaultPatternSets returns the built-in pattern sets (pt-br + en). Parsed once and cached via sync.Once. Returns nil on parse error.

func IsAutoExtractEnabled

func IsAutoExtractEnabled(mode string) bool

IsAutoExtractEnabled returns true if any extraction mode is enabled.

func IsPIIPredicate

func IsPIIPredicate(predicate string) bool

IsPIIPredicate returns true if the predicate is in the blacklist.

Types

type Direction

type Direction int
const (
	Out Direction = iota
	In
	Both
)

type ExtractionConfig

type ExtractionConfig struct {
	// AutoExtract controls which extraction modes are active.
	// Values: "off" (default), "pattern", "llm", "both".
	AutoExtract string

	// LLMBudgetPerCycle is the max memories to process per dream cycle.
	// Default: 20.
	LLMBudgetPerCycle int

	// LLMConsentACK must be true when AutoExtract contains "llm".
	// This is a safety gate — the operator must explicitly acknowledge
	// that memory contents will be sent to an external LLM API.
	LLMConsentACK bool
}

ExtractionConfig holds LLM extractor configuration.

type ExtractionResponse

type ExtractionResponse struct {
	Triples []TripleJSON `json:"triples"`
}

ExtractionResponse is the structured output format expected from the LLM.

type ExtractionResult

type ExtractionResult struct {
	Subject    string
	Predicate  string
	Object     string
	Confidence float64
	RawText    string
}

ExtractionResult is a single (subject, predicate, object) triple found in text.

type Extractor

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor scans free text for SPO triples using configurable regex patterns.

func NewExtractor

func NewExtractor(patternSets []*patterns.PatternSet, logger *slog.Logger) (*Extractor, error)

NewExtractor compiles all templates from the given pattern sets into regexps. Templates are accent-stripped so that normalized input text matches regardless of diacritics. Clause separators from each PatternSet are collected for sentence splitting before pattern matching.

func (*Extractor) Extract

func (e *Extractor) Extract(text string) []ExtractionResult

Extract scans text and returns all SPO triples matched by the compiled patterns. The text is first split into clauses on the configured clause separators so that multi-predicate sentences are handled correctly. Subsequent clauses inherit the subject from the preceding clause when the clause begins with a verb phrase.

func (*Extractor) ExtractAndStore

func (e *Extractor) ExtractAndStore(ctx context.Context, kg *KG, text string, wing string, sourceMemoryID string) (int, error)

ExtractAndStore extracts triples from text and persists them to the knowledge graph. Returns the count of successfully inserted triples. Individual insertion errors are logged but do not abort the batch.

type KG

type KG struct {
	// contains filtered or unexported fields
}

func NewKG

func NewKG(db *sql.DB, logger *slog.Logger) (*KG, error)

func (*KG) AddTriple

func (k *KG) AddTriple(ctx context.Context, subjectName, predicateName, objectText string, opts TripleOpts) (int64, error)

func (*KG) AutoRegisterAlias

func (k *KG) AutoRegisterAlias(ctx context.Context, entityID int64, canonicalName string) error

func (*KG) CurrentFacts

func (k *KG) CurrentFacts(ctx context.Context, subjectName string) ([]Triple, error)

func (*KG) DB

func (k *KG) DB() *sql.DB

DB returns the underlying database connection for direct queries.

func (*KG) DeleteEntity

func (k *KG) DeleteEntity(ctx context.Context, entityName string) error

func (*KG) EnsureEntity

func (k *KG) EnsureEntity(ctx context.Context, name string) (int64, error)

func (*KG) EnsurePredicate

func (k *KG) EnsurePredicate(ctx context.Context, name string, isFunctional bool, description string) (int64, error)

func (*KG) InvalidateTriple

func (k *KG) InvalidateTriple(ctx context.Context, tripleID int64) error

func (*KG) QueryEntity

func (k *KG) QueryEntity(ctx context.Context, entityName string, dir Direction) ([]Triple, error)

func (*KG) RegisterAlias

func (k *KG) RegisterAlias(ctx context.Context, entityID int64, alias string) error

func (*KG) ResolveAlias

func (k *KG) ResolveAlias(ctx context.Context, name string) (int64, error)

func (*KG) Timeline

func (k *KG) Timeline(ctx context.Context, opts TimelineOpts) ([]Triple, error)

type LLMExtractor

type LLMExtractor struct {
	// contains filtered or unexported fields
}

LLMExtractor extracts triples using an LLM provider.

It is off by default (AutoExtract="off") and requires explicit consent (LLMConsentACK=true) before any LLM call is made. A circuit breaker prevents cascading failures: after 3 consecutive errors the extractor pauses for 30 minutes.

func NewLLMExtractor

func NewLLMExtractor(kg *KG, provider LLMProvider, config ExtractionConfig, logger *slog.Logger) (*LLMExtractor, error)

NewLLMExtractor creates a new LLM-backed triple extractor.

Validation rules:

  • kg must not be nil
  • provider must not be nil
  • If config.AutoExtract contains "llm" then config.LLMConsentACK must be true
  • The prompt template at prompts/extractor.txt must be loadable via the embedded FS or a default is used

func (*LLMExtractor) Config

func (e *LLMExtractor) Config() ExtractionConfig

Config returns a copy of the extraction config.

func (*LLMExtractor) ExtractAndStore

func (e *LLMExtractor) ExtractAndStore(ctx context.Context, text, wing, sourceMemoryID string, knownEntities []string) (int, error)

ExtractAndStore runs the full pipeline: extract triples from text via LLM, filter out PII predicates, and store valid triples in the knowledge graph.

Returns the count of successfully inserted triples. Individual insertion errors are logged but do not abort the batch. PII-blacklisted predicates are silently dropped.

func (*LLMExtractor) ExtractFromText

func (e *LLMExtractor) ExtractFromText(ctx context.Context, text string, knownEntities []string) ([]TripleJSON, error)

ExtractFromText calls the LLM provider with the configured prompt template and parses the JSON response into structured triples.

The caller is responsible for budget enforcement — this method does not track per-cycle counts.

type LLMProvider

type LLMProvider interface {
	Complete(ctx context.Context, prompt string) (string, error)
}

LLMProvider is the interface for LLM calls (mockable for tests).

type TimelineOpts

type TimelineOpts struct {
	Subject   string
	From      string
	Until     string
	Direction Direction
	Predicate string
	Limit     int
}

type Triple

type Triple struct {
	TripleID       int64
	SubjectID      int64
	SubjectName    string
	PredicateID    int64
	PredicateName  string
	ObjectText     string
	ObjectEntityID *int64
	ObjectName     string
	ValidFrom      string
	ValidUntil     string
	TxnFrom        string
	TxnUntil       string
	Confidence     float64
	SourceMemoryID string
	Wing           string
	RawText        string
}

type TripleJSON

type TripleJSON struct {
	Subject   string `json:"subject"`
	Predicate string `json:"predicate"`
	Object    string `json:"object"`
}

TripleJSON is a single (subject, predicate, object) triple from the LLM response.

type TripleOpts

type TripleOpts struct {
	ObjectEntityName string
	Confidence       float64
	SourceMemoryID   string
	Wing             string
	RawText          string
	ValidFrom        string
}

Directories

Path Synopsis
Package patterns defines the YAML schema and loader for extraction rule sets.
Package patterns defines the YAML schema and loader for extraction rule sets.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL