knowing

module

v0.10.1 Latest Latest Go to latest Published: May 26, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/blackwell-systems/knowing

Links

Open Source Insights

README ¶

Self-adapting code intelligence engine. Observes its own graph density and adjusts retrieval strategy automatically. 34 edge types, 28 MCP tools, cryptographic proofs. Gets smarter with scale, not dumber.

[!NOTE] Research paper: Content-Addressing as a Computation Primitive for Software Relationship Intelligence (DOI: 10.5281/zenodo.20342255)

Your architecture diagram says service A calls service B. Can you prove it?

knowing can. It builds a content-addressed graph of extracted code relationships, snapshots it as a Merkle tree tied to a git commit, and generates cryptographic proofs that verify offline. Agents use it for ranked context. Security teams use it for audit. Platform teams use it to compare code against production traces.

It gets better every time you use it. When code changes, stale knowledge expires automatically.

brew install blackwell-systems/tap/knowing

{ "mcpServers": { "knowing": { "command": "knowing", "args": ["mcp", "--watch"] } } }

That's it. The MCP server auto-indexes your repo on first launch. Your agent now has ranked context (one call replaces grep-read loops), blast radius, test scope, and memory that compounds.

Three Things, One Architecture

knowing is three products built on one foundation (content-addressed graph with hierarchical Merkle trees):

1. Context engine for AI agents One call returns the most relevant symbols for a task, ranked by graph centrality, recency, and learned usefulness, packed to fit your token budget. 47% fewer tool calls. 84% fewer tokens. Results improve with feedback.

2. Audit primitive for compliance Every graph state is a Merkle root tied to a git commit. knowing prove generates a cryptographic proof that a relationship existed. knowing verify checks it offline. knowing fsck verifies the entire graph in 98ms.

3. Memory layer that learns Feedback from agents compounds across sessions. When code changes, feedback expires automatically (verified via package Merkle roots). The system gets smarter over time, not noisier. That is the property knowing is built around.

These aren't separate features. They're structural consequences of content-addressing: the same hash that makes context cacheable also makes it provable, and the same Merkle root that detects staleness also expires stale feedback.

What It Answers

For your agent:

"I'm changing this function. What breaks?" (blast radius across callers, tests, routes, repos)
"Give me 50,000 tokens of context for this task." (graph-ranked, not grep-searched)
"Which tests should run?" (call-graph traversal, 98% precision)

For your platform team:

"Is this route used in production?" (static analysis + OTel runtime traces)
"What did the service graph look like at a specific snapshot?" (snapshot chain, each root tied to a git commit)

For your security team:

"Prove service A calls service B at this commit." (Merkle proof, verifiable offline)
"Prove this dependency does NOT exist." (absence proof via sorted leaves)
"Generate a compliance report." (knowing audit -proofs, one command)

Numbers

What	Result
Agent context precision	+20pp after 1 round, +34pp after 5
Tool calls saved	47% fewer (one context call replaces repeated grep+read)
Token savings	84% fewer tokens (GCF wire format)
Repeat query speed	93x faster (Merkle-keyed subgraph cache)
Merkle diff	517x faster than full edge scan at 100K edges
Test scope	98% precision, 82% recall
Graph integrity check	98ms (24,936 edges)
Proof generation	72us generate, 1.2us verify
Feedback expiration	100% expire on code change, 11% overhead
Cross-repo retrieval	46.7% R@10 on foreign codebase, zero config
Cross-system retrieval	P@10=0.217, 1.63x vs codegraph (19K stars), 4.3x vs Aider, 11x vs grep
Indexing throughput	7 repos (47,150 files) in ~52s
Language coverage	7/7 repos pass (Go, Python, TS, Rust, Java, C#). codegraph: 5/7

All benchmarks are reproducible: GOWORK=off go test ./bench/... -timeout 5m

Quick Start

# Install
brew install blackwell-systems/tap/knowing
# Or: go install github.com/blackwell-systems/knowing/cmd/knowing@latest
# Or: npm install -g @blackwell-systems/knowing
# Or: pip install knowing

# That's it. Add the MCP config and start a session.
# The server auto-indexes your repo on first launch.

# Or index manually for CLI usage:
knowing add .

# Remove a repo (evicts all data: nodes, edges, snapshots, feedback)
knowing remove ./path/to/repo

# Get context for a task
knowing context -task "refactor auth middleware" -format gcf

# Find affected tests
knowing test-scope -files internal/auth/middleware.go

# Explain why a symbol ranked where it did
knowing why -task "refactor auth" -symbol "SessionHandler"

# Prove a relationship exists (cryptographic Merkle proof)
knowing prove -source "AuthService" -target "SessionStore"

# Verify offline (no database needed)
knowing verify proof.json

# Check graph integrity
knowing fsck

# Check if the graph is stale (CI gate: exits 1 if stale)
knowing stale

MCP Integration

{
  "mcpServers": {
    "knowing": {
      "command": "knowing",
      "args": ["mcp", "--watch"],
      "transport": "stdio"
    }
  }
}

The --watch flag re-indexes on file changes. Your agent always queries fresh data. No manual knowing index or database path needed: the MCP server auto-indexes the git repository on first launch and registers it in the roster for future sessions.

For HTTP transport (multi-agent, daemon mode):

knowing serve -addr :8100 .

{
  "mcpServers": {
    "knowing": {
      "url": "http://localhost:8100",
      "transport": "streamable-http"
    }
  }
}

Why This Works

Git versions files. knowing versions the understanding of code.

The entire system is built on one idea: content-addressed identity. Every symbol, relationship, and snapshot is SHA-256 hashed. This single choice gives you:

Staleness detection for free. Changed file = new hash = stale edges are known without scanning.
Caching for free. Same package root = same results. 93x speedup on unchanged queries.
Integrity for free. Verify all stored hashes and snapshot chain continuity. 98ms.
History for free. Each snapshot is a Merkle root tied to a git commit. Walk the chain.
Feedback expiration for free. Feedback stores the package Merkle root. Code changes = root changes = old feedback is invisible.
Proofs for free. Merkle path from leaf to root is a self-contained cryptographic proof.

	Git	knowing
What it versions	File contents	Code relationships and their meaning
Unit of storage	blob	node + edge + provenance + confidence
Identity	`sha256(content)`	`sha256("node\0" + repo + package + name + kind)`
Snapshot	tree of blobs	Hierarchical Merkle: repo -> package -> edge-type -> leaf
Diff	Which lines changed	Which packages changed, what broke, what's new
History	What code looked like	What the codebase understood about itself

How It Works

+------------------------------------------------------------------+
|                         knowing daemon                            |
+----------------+------------------------+--------------------------+
|   Indexer      |     Graph Store        |      MCP Server          |
|                |                        |                          |
| 26 extractors  | Content-addressed      | 28 tools + 8 resources   |
| tree-sitter    | SQLite + Merkle tree   | stdio / HTTP (1.8s index)|
| LSP + SCIP     | Hierarchical snapshots | GCF / GCB / JSON         |
| OTel traces    | Subgraph cache (93x)   | PackRoot dedup (99%)     |
|                | Community detection    |                          |
+----------------+------------------------+--------------------------+

Two planes:

Execution: indexes repos, extracts symbols and relationships, ingests traces, stores snapshots.
Intelligence: computes blast radius, context packs, test scope, feedback, communities from the stored graph.

The boundary matters: intelligence features read the graph and produce derived results. They cannot corrupt graph facts. A bad ranking produces a bad recommendation; it cannot invalidate a proof.

Capabilities

Languages And Formats

Language/Format	Extractor	Framework/Pattern Detection
Go	tree-sitter + `go/packages` + SCIP	net/http, gin, echo, chi, gorilla/mux
TypeScript/JavaScript	tree-sitter	Express.js, Fastify, Hono, NestJS, Next.js
Python	tree-sitter	Flask, FastAPI, Django
Rust	tree-sitter	Actix, Axum, Rocket
Java	tree-sitter	Spring annotations
C#	tree-sitter	ASP.NET attributes
Protocol Buffers	tree-sitter	service, message, enum, RPC declarations
Terraform (HCL)	tree-sitter	resource, data, module, variable declarations
SQL	tree-sitter	tables, views, functions, procedures, FK edges
Kubernetes YAML	yaml.v3	deployments, services, configmaps, label-selector edges
CloudFormation/SAM	yaml.v3	resources, !Ref/!GetAtt/!Sub cross-references
Docker Compose	yaml.v3	services, ports, networks, depends_on links
GitHub Actions	yaml.v3	workflows, jobs, steps, action references
Serverless Framework	yaml.v3	functions, events, resource references
CSS/SCSS	tree-sitter	selectors, custom properties, var() dependencies
Event/MQ patterns	multi-language	Kafka, NATS, SQS, RabbitMQ publish/subscribe
OpenAPI/JSON Schema	json/yaml	endpoints, models, $ref resolution
Dockerfile	parser	FROM base images, COPY --from multi-stage deps, EXPOSE ports
Makefile	parser	target dependencies, include directives, variable references
Helm Charts	yaml.v3	chart dependencies, template references, values injection
GitLab CI	yaml.v3	job needs, extends templates, include files, artifacts
package.json (npm)	json	dependencies, devDependencies, peerDependencies, scripts
GraphQL	parser	type definitions, field type references, interface implementations
Ruby	tree-sitter	classes, modules, method definitions, require edges
.env files	parser	environment variable declarations, cross-file references

All extractors fire per file via multi-dispatch; results are merged. Tree-sitter produces edges at confidence 0.7 (ast_inferred); go/packages and SCIP at 0.95-1.0 (ast_resolved, scip_resolved).

MCP Tools

Tool	Purpose
`index_repo`, `graph_query`, `repo_graph`	Build and inspect the graph
`cross_repo_callers`, `blast_radius`, `trace_dataflow`, `flow_between`	Understand impact and paths
`snapshot_diff`, `semantic_diff`, `pr_impact`, `stale_edges`	Compare graph states and review changes
`runtime_traffic`, `dead_routes`, `trace_stats`	Query runtime-observed relationships
`context_for_task`, `context_for_files`, `context_for_pr`, `explain_symbol`	Ranked context for agents
`ownership`, `ownership_query`, `test_scope`, `communities`, `plan_turn`, `feedback`	Route work, query code owners/authors, select tests, improve ranking
`prove`, `prove_absent`, `fsck`	Cryptographic proofs, absence proofs, integrity verification
`untrack_repo`	Evict all data for a repository (nodes, edges, files, snapshots, feedback, task memory, graph notes)

MCP prompts: refactor_safely, review_pr, investigate_dead_code.

MCP Resources

8 read-only resources for agent orientation without a tool call:

Resource	What it returns
`knowing://report`	Graph size, top kinds, hotspot count, snapshot age
`knowing://schema`	Node kinds, edge types, provenance tiers, hash format
`knowing://stats`	Counts by repo, kind, and edge type
`knowing://repos`	All tracked repos with counts and last-indexed time
`knowing://session`	Context calls, symbols served, cache hits/misses, uptime
`knowing://index-health`	Healthy/stale/corrupted status, integrity check
`knowing://communities`	Community list with cohesion and Merkle roots
`knowing://community/{id}`	Single community detail (resource template)

Wire Formats

Format	Purpose	Savings vs JSON
GCF (Graph Compact Format)	LLM consumption: line-oriented, positional fields	84% fewer tokens
GCB (Graph Compact Binary)	Service transport and caching: varint, length-prefixed	74% fewer bytes
JSON	Human debugging, generic consumers	Baseline

GCF uses |-separated fields and local IDs ($1 -> $3) instead of repeated qualified names. Parseable by LLMs while fitting 5x more graph context into the same token budget. Session-stateful deduplication reduces repeated symbols by 47%.

Current Boundaries

Breaking hash change (v0.3.0): Hash domain prefixes added. Databases from before v0.3.0 must be re-indexed. Run knowing fsck after.
Static blast radius follows calls edges; other edge types provide context, not traversal.
Runtime tools require OpenTelemetry trace ingestion; without traces they have no observations.
LSP enrichment: Go, TypeScript, Python, Rust, Java, C#. Auto-detected from project markers. Others fall back to tree-sitter.

Documentation

Doc	Contents
Architecture	System design, schemas, content addressing, daemon model
Features	Implementation inventory, entry points, limitations
Audit & Compliance	Merkle proofs, fsck, snapshot chain, CI gates
CLI Reference	Commands, flags, examples
MCP Tools	Tool schemas, parameters, return formats
Edge Types	Relationship semantics and provenance
Context Packing	RWR, HITS, ranking, token budgeting
Runtime Traces	OTel ingestion and runtime confidence
Wire Formats	GCF, GCB, JSON formats and benchmarks
Roadmap	Completed workstreams and next priorities
Benchmarks	Reproducible value benchmarks with performance contracts
Whitepaper	Hierarchical Identity Architecture thesis (DOI: 10.5281/zenodo.20342255)
Hooks	Claude Code hook integration

License

MIT

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
bench
agent-efficiency
cross-system/adapters Package adapters provides system-specific implementations of the benchmark Adapter interface.	Package adapters provides system-specific implementations of the benchmark Adapter interface.
cross-system/benchtype Package benchtype defines shared types for the cross-system context retrieval benchmark.	Package benchtype defines shared types for the cross-system context retrieval benchmark.
cross-system/cmd/failure-analysis command Command failure-analysis examines what knowing returns vs ground truth for each task, categorizing misses into: related-but-unlisted, noise, wrong-package, correct-package-wrong-symbol.	Command failure-analysis examines what knowing returns vs ground truth for each task, categorizing misses into: related-but-unlisted, noise, wrong-package, correct-package-wrong-symbol.
cross-system/cmd/session-bench command
cross-system/cmd/validate-fixtures command Command validate-fixtures checks each ground truth symbol against the actual DB contents and reports mismatches.	Command validate-fixtures checks each ground truth symbol against the actual DB contents and reports mismatches.
cross-system/metrics Package metrics computes retrieval quality metrics for the cross-system benchmark.	Package metrics computes retrieval quality metrics for the cross-system benchmark.
cross-system/normalize Package normalize provides symbol name canonicalization for cross-system comparison.	Package normalize provides symbol name canonicalization for cross-system comparison.
cmd
knowing command Package main is the entry point for the knowing CLI.	Package main is the entry point for the knowing CLI.
internal
cache Package cache provides a thread-safe, TTL-bounded subgraph result cache keyed by Merkle subgraph roots.	Package cache provides a thread-safe, TTL-bounded subgraph result cache keyed by Merkle subgraph roots.
community Package community provides pluggable graph community detection algorithms.	Package community provides pluggable graph community detection algorithms.
context Package context implements graph-aware context packing for AI agent consumption.	Package context implements graph-aware context packing for AI agent consumption.
daemon Package daemon provides file watching, reindex coordination, and daemon lifecycle management for the knowing system of record.	Package daemon provides file watching, reindex coordination, and daemon lifecycle management for the knowing system of record.
diff Package diff computes semantic diffs and PR impact analysis between graph snapshots.	Package diff computes semantic diffs and PR impact analysis between graph snapshots.
edgetype
embedding Package embedding provides semantic vector embeddings for symbols.	Package embedding provides semantic vector embeddings for symbols.
enrichment Package enrichment provides an LSP-based enrichment pass that upgrades ast_inferred edges to lsp_resolved by querying language servers via the agent-lsp public API.	Package enrichment provides an LSP-based enrichment pass that upgrades ast_inferred edges to lsp_resolved by querying language servers via the agent-lsp public API.
indexer Package indexer orchestrates source code extraction and graph indexing.	Package indexer orchestrates source code extraction and graph indexing.
indexer/authorship Package authorship extracts authored_by edges from git blame data.	Package authorship extracts authored_by edges from git blame data.
indexer/cloudextractor Package cloudextractor extracts cloud infrastructure and CI/CD resource definitions and their relationships from YAML configuration files.	Package cloudextractor extracts cloud infrastructure and CI/CD resource definitions and their relationships from YAML configuration files.
indexer/csharpextractor Package csharpextractor provides C# extraction with ASP.NET attribute route detection.	Package csharpextractor provides C# extraction with ASP.NET attribute route detection.
indexer/cssextractor Package cssextractor extracts CSS/SCSS selectors, custom properties, and import relationships.	Package cssextractor extracts CSS/SCSS selectors, custom properties, and import relationships.
indexer/docextract Package docextract provides language-agnostic docstring extraction from tree-sitter AST nodes.	Package docextract provides language-agnostic docstring extraction from tree-sitter AST nodes.
indexer/dockerfileextractor Package dockerfileextractor provides an extractor for Dockerfile files.	Package dockerfileextractor provides an extractor for Dockerfile files.
indexer/envextractor Package envextractor provides an extractor for environment variable files.	Package envextractor provides an extractor for environment variable files.
indexer/eventextractor Package eventextractor provides a supplementary extractor that detects message queue producer and consumer patterns across Go, TypeScript, Python, and Java source code.	Package eventextractor provides a supplementary extractor that detects message queue producer and consumer patterns across Go, TypeScript, Python, and Java source code.
indexer/gitlabciextractor Package gitlabciextractor provides an extractor for GitLab CI configuration files.	Package gitlabciextractor provides an extractor for GitLab CI configuration files.
indexer/goextractor Package goextractor provides Go-specific extraction using go/packages for full type resolution.	Package goextractor provides Go-specific extraction using go/packages for full type resolution.
indexer/gotsextractor Package gotsextractor provides Go extraction using tree-sitter for fast AST parsing with route detection.	Package gotsextractor provides Go extraction using tree-sitter for fast AST parsing with route detection.
indexer/graphqlextractor Package graphqlextractor provides an extractor for GraphQL schema files.	Package graphqlextractor provides an extractor for GraphQL schema files.
indexer/helmextractor Package helmextractor provides an extractor for Helm chart files.	Package helmextractor provides an extractor for Helm chart files.
indexer/javaextractor Package javaextractor provides Java extraction with Spring annotation route detection.	Package javaextractor provides Java extraction with Spring annotation route detection.
indexer/k8sextractor Package k8sextractor extracts Kubernetes resource definitions and their deployment relationships.	Package k8sextractor extracts Kubernetes resource definitions and their deployment relationships.
indexer/makefileextractor Package makefileextractor provides an extractor for Makefile and .mk files.	Package makefileextractor provides an extractor for Makefile and .mk files.
indexer/ownership Package ownership parses CODEOWNERS files and emits owned_by edges from file nodes to synthetic team/user nodes.	Package ownership parses CODEOWNERS files and emits owned_by edges from file nodes to synthetic team/user nodes.
indexer/packagejsonextractor Package packagejsonextractor provides an extractor for package.json files.	Package packagejsonextractor provides an extractor for package.json files.
indexer/protoextractor Package protoextractor provides a tree-sitter based extractor for Protocol Buffer (.proto) files.	Package protoextractor provides a tree-sitter based extractor for Protocol Buffer (.proto) files.
indexer/rubyextractor Package rubyextractor provides a tree-sitter based extractor for Ruby files.	Package rubyextractor provides a tree-sitter based extractor for Ruby files.
indexer/rustextractor Package rustextractor provides Rust extraction with Actix/Axum/Rocket route detection.	Package rustextractor provides Rust extraction with Actix/Axum/Rocket route detection.
indexer/schemaextractor Package schemaextractor provides an extractor for OpenAPI 3.x, Swagger 2.x, and JSON Schema files.	Package schemaextractor provides an extractor for OpenAPI 3.x, Swagger 2.x, and JSON Schema files.
indexer/scipingest Package scipingest parses SCIP (Source Code Intelligence Protocol) index files and imports their symbol definitions and references into the knowing knowledge graph.	Package scipingest parses SCIP (Source Code Intelligence Protocol) index files and imports their symbol definitions and references into the knowing knowledge graph.
indexer/sqlextractor Package sqlextractor extracts SQL tables, views, functions, and their relationships.	Package sqlextractor extracts SQL tables, views, functions, and their relationships.
indexer/terraformextractor Package terraformextractor extracts Terraform HCL resources, modules, and dependency relationships.	Package terraformextractor extracts Terraform HCL resources, modules, and dependency relationships.
indexer/treesitter Package treesitter provides a Python extractor using tree-sitter grammars.	Package treesitter provides a Python extractor using tree-sitter grammars.
indexer/tsextractor Package tsextractor provides TypeScript/JavaScript extraction with framework route detection.	Package tsextractor provides TypeScript/JavaScript extraction with framework route detection.
mcp Package mcp exposes the knowing knowledge graph as MCP (Model Context Protocol) tools over stdio and HTTP transports.	Package mcp exposes the knowing knowledge graph as MCP (Model Context Protocol) tools over stdio and HTTP transports.
resolve Package resolve provides shared utilities for determining whether an import path refers to an external dependency, a standard library module, or a local/relative import.	Package resolve provides shared utilities for determining whether an import path refers to an external dependency, a standard library module, or a local/relative import.
resolver Package resolver finds dangling cross-repo edges and retargets them to the correct node by matching across repos using hash recomputation.	Package resolver finds dangling cross-repo edges and retargets them to the correct node by matching across repos using hash recomputation.
roster Package roster manages the global registry of tracked repositories.	Package roster manages the global registry of tracked repositories.
snapshot Package snapshot manages Merkle-based graph snapshots for the knowing knowledge graph.	Package snapshot manages Merkle-based graph snapshots for the knowing knowledge graph.
store Package store provides the SQLite-backed implementation of types.GraphStore.	Package store provides the SQLite-backed implementation of types.GraphStore.
testutil Package testutil provides shared test infrastructure for the knowing project.	Package testutil provides shared test infrastructure for the knowing project.
trace Package trace implements OpenTelemetry span ingestion and runtime confidence scoring.	Package trace implements OpenTelemetry span ingestion and runtime confidence scoring.
types Package types result types for graph queries and traversals.	Package types result types for graph queries and traversals.
wire Package wire implements the GCF (Graph Compact Format) encoder and decoder.	Package wire implements the GCF (Graph Compact Format) encoder and decoder.
test
demo command setup-runtime-demo populates a knowing database with simulated microservice nodes and runtime-observed edges for demo purposes.	setup-runtime-demo populates a knowing database with simulated microservice nodes and runtime-observed edges for demo purposes.