gognee

module
v1.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 19, 2026 License: MIT

README ΒΆ

This project is 100% AI-slop. Use at own risk for life and limb.

gognee - A Go Knowledge Graph Memory System

gognee is an importable Go library that provides persistent knowledge graph memory for AI assistants. It enables applications to extract, store, and retrieve information relationships using a combination of vector search and graph traversal.

Features

  • πŸ“š Knowledge Graph Storage: Persistent storage of entities and relationships in SQLite
  • πŸ” Hybrid Search: Combine vector similarity and graph traversal for semantic retrieval
  • 🧠 LLM-Powered Extraction: Automatic entity and relationship extraction using OpenAI's APIs
  • πŸ“ Chunking: Intelligent text splitting with token awareness and overlap handling
  • πŸ”„ Deterministic Deduplication: Same entities across documents resolve to the same node
  • πŸ’Ύ Persistent Memory: Knowledge persists across application restarts

Importing

gognee is a library package intended to be imported into your Go application. Use the package entrypoint at

import "github.com/dan-solli/gognee/pkg/gognee"

Minimal import-and-use example:

package main

import (
	"context"
	"fmt"
	"os"

	"github.com/dan-solli/gognee/pkg/gognee"
)

func main() {
	ctx := context.Background()

	g, err := gognee.New(gognee.Config{
		DBPath:    "./memory.db",
		OpenAIKey: os.Getenv("OPENAI_API_KEY"),
	})
	if err != nil {
		panic(err)
	}
	defer g.Close()

	// Add text and process
	_ = g.Add(ctx, "Gognee is a Go knowledge graph memory library.", gognee.AddOptions{})
	_, _ = g.Cognify(ctx, gognee.CognifyOptions{})

	// Search
	results, _ := g.Search(ctx, "What do I know about gognee?", gognee.SearchOptions{})
	fmt.Printf("Found %d results\n", len(results))
}

Types and convenience values are re-exported from the package (for example SearchOptions, SearchResult, SearchTypeHybrid, Node).

Quick Start

Prerequisites

CGO Requirement: gognee v1.2.0+ requires CGO for sqlite-vec vector indexing:

export CGO_ENABLED=1

Platform-specific notes:

  • Linux: Requires GCC or Clang
  • macOS: Requires Xcode Command Line Tools (xcode-select --install)
  • Windows: Requires MinGW-w64 or MSVC
Installation
CGO_ENABLED=1 go get github.com/dan-solli/gognee
Basic Usage
package main

import (
	"context"
	"fmt"
	"log"
	"os"

	"github.com/dan-solli/gognee/pkg/gognee"
)

func main() {
	ctx := context.Background()

	// Initialize Gognee with OpenAI API key
	g, err := gognee.New(gognee.Config{
		OpenAIKey: os.Getenv("OPENAI_API_KEY"),
		DBPath:    "./memory.db", // Persistent SQLite database
	})
	if err != nil {
		log.Fatal(err)
	}
	defer g.Close()

	// Add documents to the knowledge base
	documents := []string{
		"React is a JavaScript library for building user interfaces using components.",
		"We use TypeScript to add static typing to our React applications.",
		"PostgreSQL is our primary database for storing application data.",
		"The frontend uses React with TypeScript, and the backend uses PostgreSQL.",
	}

	for _, doc := range documents {
		if err := g.Add(ctx, doc, gognee.AddOptions{}); err != nil {
			log.Fatal(err)
		}
	}

	fmt.Printf("Buffered %d documents for processing\n", g.BufferedCount())

	// Process buffered documents through the extraction pipeline
	result, err := g.Cognify(ctx, gognee.CognifyOptions{})
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Processed %d documents: %d nodes, %d edges\n",
		result.DocumentsProcessed,
		result.NodesCreated,
		result.EdgesCreated,
	)

	// Query the knowledge graph
	results, err := g.Search(ctx, "What technologies does the project use?", gognee.SearchOptions{
		Type:       gognee.SearchTypeHybrid,
		TopK:       5,
		GraphDepth: 1,
	})
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println("\nSearch Results:")
	for i, result := range results {
		fmt.Printf("%d. %s (%s) - Score: %.4f\n", i+1, result.Node.Name, result.Node.Type, result.Score)
	}

	// Check statistics
	stats, err := g.Stats()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("\nKnowledge Graph Stats: %d nodes, %d edges\n", stats.NodeCount, stats.EdgeCount)
}

API Reference

Core Methods
New(cfg Config) (*Gognee, error)

Initializes a new Gognee instance.

Config fields:

  • OpenAIKey (required): OpenAI API key for embeddings and LLM
  • DBPath (optional): Path to SQLite database file. Defaults to :memory: if empty
  • EmbeddingModel (optional): Embedding model to use. Default: text-embedding-3-small
  • LLMModel (optional): LLM model for extraction. Default: gpt-4o-mini
  • ChunkSize (optional): Token size for text chunks. Default: 512
  • ChunkOverlap (optional): Token overlap between chunks. Default: 50
Add(ctx context.Context, text string, opts AddOptions) error

Buffers text for later processing.

  • Parameters:
    • text: Document text to add (non-empty)
    • opts.Source (optional): Source identifier for the document
  • Returns: Error if text is empty
  • Note: Text is buffered but NOT processed until Cognify() is called
Cognify(ctx context.Context, opts CognifyOptions) (*CognifyResult, error)

Processes all buffered documents through the full extraction pipeline:

  1. Chunks text into segments
  2. Extracts entities (Person, Concept, System, etc.) via LLM
  3. Extracts relationships between entities via LLM
  4. Creates nodes and edges in the knowledge graph
  5. Generates embeddings for semantic search
  6. Clears the buffer

CognifyResult fields:

  • DocumentsProcessed: Count of documents in the buffer
  • ChunksProcessed: Total chunks created
  • ChunksFailed: Chunks that failed extraction
  • NodesCreated: Entities added to graph
  • EdgesCreated: Relationships added to graph
  • Errors: Individual errors encountered (processing continues best-effort)

Note: The buffer is always cleared after Cognify, even if errors occur. Return error is only for catastrophic failures (context canceled, DB connection lost).

Search(ctx context.Context, query string, opts SearchOptions) ([]SearchResult, error)

Searches the knowledge graph.

SearchOptions fields:

  • Type (optional): Search type - SearchTypeVector, SearchTypeGraph, or SearchTypeHybrid. Default: SearchTypeHybrid
  • TopK (optional): Maximum results to return. Default: 10
  • GraphDepth (optional): Max depth for graph traversal. Default: 1
  • SeedNodeIDs (optional): Starting nodes for graph search

SearchResult fields:

  • Node: Full node data
  • Score: Relevance score (0-1)
  • Source: How the node was found ("vector" or "graph")
  • GraphDepth: Distance from search origin
Close() error

Releases all resources (database connections, buffered data).

Stats() (Stats, error)

Returns knowledge graph statistics:

  • NodeCount: Total entities in graph
  • EdgeCount: Total relationships in graph
  • BufferedDocs: Documents waiting for Cognify
  • LastCognified: Timestamp of last successful Cognify
Advanced Access

For custom pipelines, these components are accessible:

  • GetChunker(): Text chunking
  • GetEmbeddings(): Embedding client
  • GetLLM(): LLM client
  • GetGraphStore(): Graph storage
  • GetVectorStore(): Vector storage

Type Re-exports

Common types are re-exported from the top-level package for convenience:

  • SearchResult, SearchOptions, SearchType
  • Node, Edge
  • SearchTypeVector, SearchTypeGraph, SearchTypeHybrid (constants)
Default Behavior

gognee uses SQLite for both graph storage and vector embeddings. Choose the storage mode with DBPath:

  • Persistent Storage (recommended): DBPath: "./memory.db" - Data persists across restarts
  • In-Memory Storage (testing/dev): DBPath: ":memory:" - Data is cleared when process exits
Persistence

When using a file-based DBPath, both the knowledge graph (nodes and edges) and vector embeddings persist across application restarts. This means:

βœ… No need to re-run Cognify() after restart βœ… Instant search availability on startup βœ… Zero-downtime deployment support

Example workflow:

// Session 1: Build knowledge graph
g1, _ := gognee.New(gognee.Config{
    DBPath: "./memory.db",
    OpenAIKey: apiKey,
})
g1.Add(ctx, "Document 1...", gognee.AddOptions{})
g1.Cognify(ctx, gognee.CognifyOptions{})
g1.Close()

// Session 2: Reopen and immediately search (no Cognify needed)
g2, _ := gognee.New(gognee.Config{
    DBPath: "./memory.db",  // Same database
    OpenAIKey: apiKey,
})
results, _ := g2.Search(ctx, "query", gognee.SearchOptions{})
// βœ… Results immediately available - embeddings were persisted
Migration from v0.6.0 and Earlier

In v0.6.0 and earlier, vector embeddings were stored in memory and lost on restart. If you're upgrading:

  • Existing databases will work without migration - simply run Cognify() once after upgrading to v0.7.0 to populate the persistent embeddings
  • New databases get persistent embeddings automatically
  • In-memory mode (:memory:) behavior is unchanged

MVP Limitations

This is the MVP (Minimum Viable Product). Known limitations:

Performance
  • Vector search: Optimized with sqlite-vec indexed ANN search (O(log n) complexity)
  • Graph traversal: In-memory BFS implementation (acceptable for <100K nodes)
  • Concurrent writes: Serializable transactions may cause contention under heavy load
Persistent Storage

Provide a file path to DBPath to enable persistent storage:

cfg := gognee.Config{
    OpenAIKey: "sk-...",
    DBPath: "./knowledge.db",
}
g, _ := gognee.New(cfg)

The database file is created automatically if it doesn't exist.

Incremental Cognify

By default, gognee tracks processed documents to avoid redundant processing. This reduces costs and processing time when re-adding documents.

How It Works

When you call Cognify(), gognee:

  1. Computes a SHA-256 hash of each document's text
  2. Checks if that hash has been processed before
  3. Skips documents that have already been processed (incremental mode)
  4. Processes new/changed documents normally

Benefits:

  • ⚑ Near-instant processing for duplicate documents (~0ms vs 5-10s)
  • πŸ’° Zero LLM API costs for cached documents
  • πŸ”„ Enables continuous updates without full reprocessing
Default Behavior

Incremental mode is ON by default:

// Second Cognify() call skips already-processed documents
g.Add(ctx, "React is a UI library", gognee.AddOptions{})
g.Cognify(ctx, gognee.CognifyOptions{}) // Processes document (hash: abc123)

g.Add(ctx, "React is a UI library", gognee.AddOptions{}) // Same text
g.Cognify(ctx, gognee.CognifyOptions{}) // Skips (hash: abc123 already processed)
// DocumentsProcessed=0, DocumentsSkipped=1
Controlling Incremental Behavior

Use CognifyOptions to control incremental processing:

// Disable incremental mode (always reprocess)
skipProcessed := false
g.Cognify(ctx, gognee.CognifyOptions{
    SkipProcessed: &skipProcessed,
})

// Force reprocessing even with incremental mode enabled
g.Cognify(ctx, gognee.CognifyOptions{
    Force: true, // Overrides SkipProcessed
})

When to use Force: true:

  • After changing ChunkSize or ChunkOverlap settings
  • To rebuild the knowledge graph from scratch
  • After updating extraction prompts or LLM models
Document Identity

Documents are identified by exact text content (SHA-256 hash). Any change to the text (including whitespace) creates a new document:

g.Add(ctx, "React is great", gognee.AddOptions{Source: "file-a"})
g.Cognify(ctx, gognee.CognifyOptions{}) // Processes

g.Add(ctx, "React is great", gognee.AddOptions{Source: "file-b"}) // Different source
g.Cognify(ctx, gognee.CognifyOptions{}) // Skips (same text)

g.Add(ctx, "React is great!", gognee.AddOptions{}) // Punctuation changed
g.Cognify(ctx, gognee.CognifyOptions{}) // Processes (different hash)

Note: The Source field is metadata only and does NOT affect document identity. Identity is purely content-based.

Persistence

Document tracking persists in the SQLite database (table: processed_documents). Tracking survives application restarts when using file-based DBPath.

For :memory: mode, tracking is lost on restart (incremental behavior only applies within a single session).

Resetting Tracking

To clear processed document history without deleting the knowledge graph:

// Access DocumentTracker interface
tracker := g.GetGraphStore().(store.DocumentTracker)
tracker.ClearProcessedDocuments(ctx)

// Now all documents will be reprocessed
g.Cognify(ctx, gognee.CognifyOptions{})

Memory Decay and Forgetting

gognee supports time-based memory decay to keep the knowledge graph relevant and bounded. Older or rarely-accessed nodes receive lower scores in search results, and can be explicitly pruned.

Configuration

Enable decay by setting decay-related fields in Config:

cfg := gognee.Config{
    OpenAIKey:         "sk-...",
    DBPath:            "./knowledge.db",
    DecayEnabled:      true,          // Enable time-based decay (default: false)
    DecayHalfLifeDays: 30,            // Nodes' scores halve after 30 days (default: 30)
    DecayBasis:        "access",      // Decay based on last access ("access" or "creation", default: "access")
}

Decay Options:

  • DecayEnabled: When true, search results are scored with decay multipliers. Off by default for backward compatibility
  • DecayHalfLifeDays: Number of days after which a node's score is multiplied by 0.5. Shorter values mean faster decay
  • DecayBasis:
    • "access": Decay based on last_accessed_at timestamp (nodes accessed recently resist decay)
    • "creation": Decay based on created_at timestamp (age since creation)
    • If a node has never been accessed, falls back to created_at
Access Reinforcement

When decay is enabled, nodes returned in search results have their last_accessed_at timestamp updated automatically. This means frequently searched nodes resist decay (mimicking human memory reinforcement).

Pruning Nodes

Use Prune() to permanently delete nodes that are too old or have decayed below a threshold:

// Preview what would be pruned (dry run)
result, err := g.Prune(ctx, gognee.PruneOptions{
    MaxAgeDays:    60,      // Prune nodes older than 60 days
    MinDecayScore: 0.1,     // Prune nodes with decay score < 0.1
    DryRun:        true,    // Don't actually delete
})
if err != nil {
    log.Fatal(err)
}
fmt.Printf("Would prune %d nodes and %d edges\n", result.NodesPruned, result.EdgesPruned)

// Actually prune
result, err = g.Prune(ctx, gognee.PruneOptions{
    MaxAgeDays: 60,
    DryRun:     false,
})
if err != nil {
    log.Fatal(err)
}
fmt.Printf("Pruned %d nodes\n", result.NodesPruned)

PruneOptions:

  • MaxAgeDays: Remove nodes older than this many days (based on DecayBasis). If 0, this criterion is not used
  • MinDecayScore: Remove nodes with decay score below this value. If 0, this criterion is not used. Requires DecayEnabled=true
  • DryRun: If true, reports what would be pruned without actually deleting

PruneResult:

  • NodesEvaluated: Total number of nodes checked
  • NodesPruned: Number of nodes deleted
  • EdgesPruned: Number of edges deleted (cascade deletion when endpoints are removed)
  • NodeIDs: List of pruned node IDs (for verification)

Important: Pruning is permanent. Use DryRun=true first to preview the impact.

Decay Math

Decay uses an exponential formula:

score_multiplier = 0.5 ^ (age_days / half_life_days)

Examples with 30-day half-life:

  • 0 days old: multiplier = 1.0 (no decay)
  • 30 days old: multiplier = 0.5 (half score)
  • 60 days old: multiplier = 0.25 (quarter score)
  • 90 days old: multiplier = 0.125
Best Practices
  1. Start with decay disabled to build your knowledge graph, then enable it once populated
  2. Use access-based decay (DecayBasis="access") to preserve frequently queried nodes
  3. Run dry-run prunes periodically to understand decay behavior before committing
  4. Adjust half-life based on your domain:
    • News/events: 7-14 days
    • Product documentation: 90-180 days
    • Reference knowledge: 365+ days

Observability and Logging (v1.6.0+)

gognee v1.6.0 adds structured logging for the memory decay subsystem using Go's standard log/slog package. Logging is completely optional and has zero overhead when not enabled.

Configuration

Enable logging by injecting a *slog.Logger via WithLogger():

import "log/slog"

// Create a logger (JSON or text format)
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
    Level: slog.LevelInfo, // or LevelDebug for verbose logging
}))

// Create Gognee instance
g, err := gognee.New(cfg)
if err != nil {
    log.Fatal(err)
}

// Inject logger - triggers configuration logging
g.WithLogger(logger)
What Gets Logged

Startup (INFO level):

  • Decay configuration when WithLogger() is called
  • Config values: decay_enabled, half_life_days, decay_basis, access_frequency_enabled, reference_access_count

Prune Operations:

  • Start (INFO): options (dry_run, max_age_days, min_decay_score, etc.)
  • Per-memory evaluation (DEBUG): memory_id, status, retention_policy, pinned, decision
  • Per-node evaluation (DEBUG): node_id, age_days, decay_score, decision
  • Complete (INFO): summary with counts (memories_evaluated, memories_pruned, nodes_pruned, duration_ms)

Search Decay (DEBUG level):

  • Per-node decay score calculation
  • Retention policy overrides
  • Filtered nodes (score < threshold)
Log Levels
  • INFO: Operational events (config, prune start/complete)
  • DEBUG: Detailed evaluation (per-memory, per-node decisions)
  • WARN: Recoverable errors (node fetch failures)

Recommendation: Use LevelInfo for production, LevelDebug for troubleshooting decay behavior.

Security and Privacy

Logging follows strict data classification to prevent sensitive content exposure:

NEVER logged:

  • Memory content fields: Topic, Context, Decisions, Rationale
  • Node content fields: Name, Description
  • Credentials: OpenAIKey, API tokens
  • User-supplied metadata values

Safe to log:

  • IDs: memory_id, node_id, edge_id (UUIDs are opaque)
  • Timestamps: created_at, updated_at, last_accessed_at
  • Counts and scores: access_count, decay_score, nodes_pruned
  • Enums: status (Active/Superseded), retention_policy (permanent/standard/ephemeral)
  • Config values: decay settings, thresholds
Example Log Output
{"time":"2026-02-18T10:00:00Z","level":"INFO","msg":"decay config initialized","decay_enabled":true,"half_life_days":30,"decay_basis":"access","access_frequency_enabled":true,"reference_access_count":10}
{"time":"2026-02-18T10:05:00Z","level":"INFO","msg":"prune started","dry_run":false,"max_age_days":90,"min_decay_score":0.1}
{"time":"2026-02-18T10:05:00Z","level":"DEBUG","msg":"node evaluated","node_id":"abc123","age_days":120,"decay_score":0.03,"decision":"prune"}
{"time":"2026-02-18T10:05:01Z","level":"INFO","msg":"prune complete","memories_evaluated":245,"memories_pruned":12,"nodes_evaluated":1523,"nodes_pruned":89,"edges_pruned":142,"duration_ms":1234}
Disabling Logging

Simply don't call WithLogger(). When the logger is nil (default), all logging code paths are no-ops with zero allocations.

Memory Management (v1.0.0+)

gognee v1.0.0 introduces first-class memory CRUD - a higher-level abstraction for managing discrete units of knowledge with full lifecycle management and provenance tracking.

Overview

Instead of using Add() + Cognify() to process raw text, you can now use Memory APIs for structured knowledge management:

  • AddMemory: Create a memory with topic, context, decisions, rationale
  • GetMemory: Retrieve a specific memory by ID
  • ListMemories: List all memories with pagination
  • UpdateMemory: Modify an existing memory (re-cognifies automatically)
  • DeleteMemory: Remove a memory and run garbage collection
  • Search: Now includes MemoryIDs field showing which memories contributed to each result

Key Benefits:

  • πŸ”– Structured Storage: Memories have explicit fields (topic, context, decisions, rationale)
  • πŸ”— Provenance Tracking: Know which knowledge artifacts came from which memory
  • ♻️ Garbage Collection: Deleting/updating a memory cleans up orphaned nodes/edges automatically
  • 🎯 Deduplication: Identical memories (same content) are not re-processed
  • πŸ”„ Re-Cognify on Update: Updating a memory automatically re-extracts entities and relationships
API Overview
MemoryInput
type MemoryInput struct {
    Topic     string                 // Required: 3-7 word title
    Context   string                 // Required: 300-1500 char summary
    Decisions []string               // Optional: list of decisions made
    Rationale []string               // Optional: explanations for decisions
    Metadata  map[string]interface{} // Optional: arbitrary metadata
    Source    string                 // Optional: source identifier
}
MemoryResult
type MemoryResult struct {
    Memory       store.MemoryRecord   // Full memory record
    NodeIDs      []string             // IDs of extracted nodes
    EdgeIDs      []string             // IDs of extracted edges
    NodesCreated int                  // Count of new nodes
    EdgesCreated int                  // Count of new edges
}
Adding a Memory
memory := gognee.MemoryInput{
    Topic:   "Phase 4 Storage Layer Implementation",
    Context: "Implemented SQLite-backed graph store with nodes, edges, and vector storage. Added provenance tracking for memory CRUD operations. Foreign keys enabled for CASCADE deletes.",
    Decisions: []string{
        "Use SQLite for both graph and vector storage",
        "Enable PRAGMA foreign_keys=ON for automatic cascade deletes",
        "Implement two-phase transaction model for memory updates",
    },
    Rationale: []string{
        "SQLite provides ACID guarantees and simplifies deployment",
        "Foreign keys ensure provenance integrity without manual cleanup",
        "Two-phase model prevents long transactions during LLM calls",
    },
    Source: "implementation-doc-004",
}

result, err := g.AddMemory(ctx, memory)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Created memory %s: %d nodes, %d edges\n",
    result.Memory.ID,
    result.NodesCreated,
    result.EdgesCreated,
)

Deduplication: If a memory with identical content already exists, AddMemory returns the existing memory without reprocessing.

Retrieving Memories
// Get a specific memory by ID
memory, err := g.GetMemory(ctx, memoryID)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("Memory: %s\n", memory.Topic)
fmt.Printf("Decisions: %d\n", len(memory.Decisions))

// List all memories with pagination
memories, err := g.ListMemories(ctx, gognee.ListMemoriesOptions{
    Limit:  10,
    Offset: 0,
})
if err != nil {
    log.Fatal(err)
}

for _, summary := range memories {
    fmt.Printf("- %s (%s)\n", summary.Topic, summary.ID)
}
Updating a Memory

Updating a memory triggers automatic re-cognify:

  1. Unlinks old provenance (nodes/edges from previous version)
  2. Runs garbage collection on orphaned artifacts
  3. Re-cognifies with new content
  4. Links new provenance
updates := gognee.MemoryUpdate{
    Context: stringPtr("Updated context with new findings..."),
    Decisions: &[]string{
        "Decision 1 (updated)",
        "Decision 2 (new)",
    },
}

result, err := g.UpdateMemory(ctx, memoryID, updates)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Updated: %d old nodes, %d new nodes\n",
    len(result.OldNodeIDs),
    result.NodesCreated,
)

Important: Only provide fields you want to update. Omitted fields are preserved from the original memory.

Deleting a Memory

Deleting a memory removes it and runs garbage collection:

err := g.DeleteMemory(ctx, memoryID)
if err != nil {
    log.Fatal(err)
}

Garbage Collection Behavior:

  • Deletes nodes/edges that only belong to this memory
  • Preserves shared nodes/edges (used by other memories or legacy Add/Cognify)
  • Legacy data (from Add/Cognify) is never deleted by GC
Search Integration

Search results now include MemoryIDs showing which memories contributed to each node:

results, err := g.Search(ctx, "storage implementation", gognee.SearchOptions{
    TopK:              5,
    IncludeMemoryIDs:  boolPtr(true), // Default: true
})
if err != nil {
    log.Fatal(err)
}

for _, result := range results {
    fmt.Printf("Node: %s\n", result.Node.Name)
    fmt.Printf("From memories: %v\n", result.MemoryIDs)
}

Memory IDs are sorted by updated_at DESC, showing the most recent memory first.

Migration from Legacy Add/Cognify

The legacy Add() + Cognify() workflow continues to work:

// Legacy workflow (still supported)
g.Add(ctx, "Some text...", gognee.AddOptions{})
g.Cognify(ctx, gognee.CognifyOptions{})

Differences:

Feature Legacy Add/Cognify Memory CRUD
Structured Storage No (raw text only) Yes (topic, decisions, rationale)
Provenance Tracking No Yes
Garbage Collection No Yes
Update Support No (must delete + re-add) Yes (UpdateMemory re-cognifies)
Deduplication Document-level (doc_hash) Content-level (doc_hash)
Search Integration Results only Results + MemoryIDs

When to use each:

  • Memory CRUD: Structured knowledge management, agent memory, decision logs, planning artifacts
  • Legacy Add/Cognify: Bulk document ingestion, unstructured text processing

Interoperability: Both systems share the same graph store. Nodes/edges created by legacy Add/Cognify are visible in Search, and vice versa. However, legacy artifacts are not tracked by provenance and won't be affected by garbage collection.

Two-Phase Transaction Model

Memory operations use a two-phase model to avoid long transactions during LLM calls:

Phase 1 (Transaction):

  • Persist memory record with status="pending"
  • Compute doc_hash for deduplication

Phase 2 (No Transaction):

  • Call LLM APIs for entity/relationship extraction
  • Generate embeddings

Phase 3 (Transaction):

  • Update graph with nodes/edges
  • Link provenance
  • Set status="complete"

This design:

  • βœ… Prevents database locks during slow LLM calls
  • βœ… Allows crash recovery (pending memories can be retried)
  • βœ… Maintains transactional integrity for metadata
Garbage Collection Details

Garbage collection uses reference counting via the memory_nodes and memory_edges junction tables:

-- Check if a node is orphaned
SELECT COUNT(*) FROM memory_nodes WHERE node_id = ?
-- If count = 0, node is safe to delete

Preserved Artifacts:

  • Nodes/edges with COUNT(*) > 0 (shared across memories)
  • Nodes/edges without any provenance records (legacy data)

Deleted Artifacts:

  • Nodes/edges with COUNT(*) = 0 after unlinking

Foreign Key Cascade: Deleting a memory automatically deletes its provenance records via ON DELETE CASCADE.

Helper Functions
// Helper for optional string fields
func stringPtr(s string) *string {
    return &s
}

// Helper for optional bool fields
func boolPtr(b bool) *bool {
    return &b
}

Intelligent Memory Lifecycle (v1.1.0)

gognee v1.1.0 introduces intelligent memory lifecycle management to keep knowledge graphs relevant and bounded as they grow over time. Instead of simple time-based aging, memories are managed based on usage patterns, explicit supersession, and retention policies.

Access Frequency Scoring

Memories that are frequently accessed resist decay, regardless of age:

cfg := gognee.Config{
    DecayEnabled: true,
    DecayHalfLifeDays: 30,
    AccessFrequencyEnabled: true,  // Enable frequency-based decay modification
    ReferenceAccessCount: 10,      // Memories with 10+ accesses get full protection
}
  • Zero-access memories: Decay to 50% of time-based score (still visible, but de-prioritized)
  • High-access memories: Maintain full score despite age (important knowledge stays relevant)
  • Access tracking: Automatic - GetMemory() and Search() increment access counters
Explicit Supersession

Mark when one memory replaces another to maintain provenance chains:

// Create new memory that supersedes old ones
result, err := g.AddMemory(ctx, gognee.MemoryInput{
    Topic: "Updated API Design",
    Context: "We've switched from REST to GraphQL...",
    Supersedes: []string{"old-memory-id"},
    SupersessionReason: "REST approach had scaling issues",
})

// Query supersession chain
chain, _ := g.GetSupersessionChain(ctx, "old-memory-id")
// Returns: [old β†’ intermediate β†’ current]

Superseded memories:

  • Automatically marked as status="Superseded"
  • Remain searchable during grace period (default: 30 days)
  • Eligible for pruning after grace period expires
Retention Policies

Different memory types have different lifespans:

Policy Half-Life Prunable Use Case
permanent ∞ (no decay) Never Core facts, system knowledge
decision 365 days Only when superseded Planning decisions, architectural choices
standard 90 days Yes General knowledge (default)
ephemeral 7 days Yes Temporary notes, scratch work
session 1 day Yes Conversation context
result, err := g.AddMemory(ctx, gognee.MemoryInput{
    Topic: "Core System Architecture",
    Context: "...",
    RetentionPolicy: "permanent",  // Never decays or pruned
})
User Pinning

Exempt critical memories from automatic lifecycle management:

// Pin a memory
err := g.PinMemory(ctx, memoryID, "Critical customer requirement")

// Pinned memories:
// - Never decay (score stays at 1.0)
// - Never pruned (even if old or unused)
// - Marked as status="Pinned"

// Unpin when no longer critical
err = g.UnpinMemory(ctx, memoryID)
Prune with Retention Awareness

Prune operations respect retention policies:

result, err := g.Prune(ctx, gognee.PruneOptions{
    PruneSuperseded: true,
    SupersededAgeDays: 30,  // Grace period before pruning superseded memories
    DryRun: true,           // Preview what would be deleted
})

fmt.Printf("Would prune %d superseded memories\n", result.SupersededMemoriesPruned)

Prune guarantees:

  • Permanent memories never pruned
  • Pinned memories never pruned
  • Decision memories only pruned when superseded + grace period passed
  • retention_until override: explicit expiration timestamp (if set and past, memory pruned regardless of policy)
Enhanced ListMemories

Filter and sort memories for management UIs:

pinnedOnly := true
activeStatus := "Active"

memories, err := g.ListMemories(ctx, store.ListMemoriesOptions{
    Status: &activeStatus,
    Pinned: &pinnedOnly,
    OrderBy: "access_count",
    OrderDesc: true,  // Most accessed first
    Limit: 50,
})

for _, mem := range memories {
    fmt.Printf("%s - %d accesses, %s retention\n", 
        mem.Topic, mem.AccessCount, mem.RetentionPolicy)
}

Available filters:

  • Status: Filter by status (Active, Superseded, Pinned, etc.)
  • RetentionPolicy: Filter by retention policy
  • Pinned: Show only pinned memories
  • OrderBy: Sort by created_at, updated_at, access_count, last_accessed_at
  • OrderDesc: Sort direction (true = descending, false = ascending)
Lifecycle Best Practices
  1. Use retention policies intentionally:

    • permanent for facts that should never change
    • decision for rationale you want to preserve even after supersession
    • ephemeral for debugging notes or temporary context
  2. Supersede instead of delete:

    • Maintains provenance chain
    • Allows rollback if new approach fails
    • Better for auditing and learning
  3. Pin sparingly:

    • Overuse defeats automatic lifecycle management
    • Reserve for truly critical knowledge
    • Review pinned memories periodically
  4. Configure decay for your use case:

    • Short-lived assistants: aggressive decay (7-day half-life)
    • Long-term knowledge bases: conservative decay (90-day half-life)
    • Reference systems: disable decay entirely

### Example: Agent Memory Loop

```go
// Agent stores a decision
memory, _ := g.AddMemory(ctx, gognee.MemoryInput{
    Topic:   "API Design Decision",
    Context: "Chose REST over GraphQL for simplicity...",
    Decisions: []string{"Use REST API"},
    Rationale: []string{"Team familiarity", "Lower complexity"},
})

// Later: Search recalls the decision
results, _ := g.Search(ctx, "API design approach", gognee.SearchOptions{})
for _, r := range results {
    fmt.Printf("Found in memories: %v\n", r.MemoryIDs)
    // Retrieve full memory for context
    mem, _ := g.GetMemory(ctx, r.MemoryIDs[0])
    fmt.Printf("Decision: %s\n", mem.Decisions[0])
}

// Update the decision with new findings
g.UpdateMemory(ctx, memory.Memory.ID, gognee.MemoryUpdate{
    Rationale: &[]string{
        "Team familiarity",
        "Lower complexity",
        "Better caching support (discovered)",
    },
})

MVP Limitations

This is the MVP (Minimum Viable Product). Known limitations:

  1. Linear Vector Search: Vector search uses a direct-query linear scan. Acceptable for <10K nodes; larger graphs may need ANN indexing
  2. No Parallelization: Document processing is sequential. Large batches may take time
  3. Single LLM Provider: Only OpenAI is supported
  4. Basic Chunking: Token-based chunking without semantic awareness
Future Enhancements
  • ANN indexing for vector search (e.g., HNSW)
  • Multiple LLM providers (Anthropic, Ollama, local models)
  • Parallel processing of documents
  • Graph visualization
  • Incremental cognify (process only new documents)

Error Handling

gognee uses a best-effort model for batch processing:

  • Per-Chunk Errors: If a chunk fails extraction, that chunk is skipped; processing continues with remaining chunks
  • Buffer Clearing: The buffer is always cleared after Cognify(), regardless of errors. Inspect CognifyResult.Errors to see what failed
  • Return Error: Only returned for catastrophic failures (DB connection lost, context canceled)
result, err := g.Cognify(ctx, gognee.CognifyOptions{})

if err != nil {
    log.Fatal(err) // Catastrophic error
}

if len(result.Errors) > 0 {
    log.Printf("Processing had %d errors:\n", len(result.Errors))
    for _, perr := range result.Errors {
        log.Printf("  - %v\n", perr)
    }
}

Integration Testing

Integration tests with real OpenAI API are gated behind a build tag:

# Run only unit tests (no API calls)
go test ./...

# Run unit + integration tests (requires OPENAI_API_KEY)
OPENAI_API_KEY=sk-... go test -tags=integration ./...

Testing

The library includes:

  • Unit Tests: Fast, offline tests with mocked dependencies (~80% coverage)
  • Integration Tests: End-to-end tests with real OpenAI API (gated)
  • SQLite Tests: Storage layer tests including concurrent access

Run tests:

go test ./... -v

Development

This library follows:

  • TDD: Tests are written before implementation
  • Interfaces: Core components (LLM, Embeddings, Storage) use interfaces for easy mocking and extension
  • Minimal Dependencies: Only SQLite beyond the standard library

License

MIT License - See LICENSE file for details

Acknowledgments

Inspired by Cognee - a Python knowledge graph library.

Directories ΒΆ

Path Synopsis
pkg
embeddings
Package embeddings provides Ollama embedding client implementation
Package embeddings provides Ollama embedding client implementation
extraction
Package extraction provides entity and relationship extraction from text
Package extraction provides entity and relationship extraction from text
llm
Package llm provides interfaces and implementations for LLM completion clients
Package llm provides interfaces and implementations for LLM completion clients
search
Package search provides search implementations for gognee's knowledge graph.
Package search provides search implementations for gognee's knowledge graph.
store
Package store provides storage implementations for gognee's knowledge graph.
Package store provides storage implementations for gognee's knowledge graph.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL