knowing

module
v0.10.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 26, 2026 License: MIT

README

knowing

Blackwell Systems DOI MCP Tools Languages and Formats License


Self-adapting code intelligence engine. Observes its own graph density and adjusts retrieval strategy automatically. 34 edge types, 28 MCP tools, cryptographic proofs. Gets smarter with scale, not dumber.


[!NOTE] Research paper: Content-Addressing as a Computation Primitive for Software Relationship Intelligence (DOI: 10.5281/zenodo.20342255)

Your architecture diagram says service A calls service B. Can you prove it?

knowing can. It builds a content-addressed graph of extracted code relationships, snapshots it as a Merkle tree tied to a git commit, and generates cryptographic proofs that verify offline. Agents use it for ranked context. Security teams use it for audit. Platform teams use it to compare code against production traces.

It gets better every time you use it. When code changes, stale knowledge expires automatically.

brew install blackwell-systems/tap/knowing
{ "mcpServers": { "knowing": { "command": "knowing", "args": ["mcp", "--watch"] } } }

That's it. The MCP server auto-indexes your repo on first launch. Your agent now has ranked context (one call replaces grep-read loops), blast radius, test scope, and memory that compounds.


Three Things, One Architecture

knowing is three products built on one foundation (content-addressed graph with hierarchical Merkle trees):

1. Context engine for AI agents One call returns the most relevant symbols for a task, ranked by graph centrality, recency, and learned usefulness, packed to fit your token budget. 47% fewer tool calls. 84% fewer tokens. Results improve with feedback.

2. Audit primitive for compliance Every graph state is a Merkle root tied to a git commit. knowing prove generates a cryptographic proof that a relationship existed. knowing verify checks it offline. knowing fsck verifies the entire graph in 98ms.

3. Memory layer that learns Feedback from agents compounds across sessions. When code changes, feedback expires automatically (verified via package Merkle roots). The system gets smarter over time, not noisier. That is the property knowing is built around.

These aren't separate features. They're structural consequences of content-addressing: the same hash that makes context cacheable also makes it provable, and the same Merkle root that detects staleness also expires stale feedback.


What It Answers

For your agent:

  • "I'm changing this function. What breaks?" (blast radius across callers, tests, routes, repos)
  • "Give me 50,000 tokens of context for this task." (graph-ranked, not grep-searched)
  • "Which tests should run?" (call-graph traversal, 98% precision)

For your platform team:

  • "Is this route used in production?" (static analysis + OTel runtime traces)
  • "What did the service graph look like at a specific snapshot?" (snapshot chain, each root tied to a git commit)

For your security team:

  • "Prove service A calls service B at this commit." (Merkle proof, verifiable offline)
  • "Prove this dependency does NOT exist." (absence proof via sorted leaves)
  • "Generate a compliance report." (knowing audit -proofs, one command)

Numbers

What Result
Agent context precision +20pp after 1 round, +34pp after 5
Tool calls saved 47% fewer (one context call replaces repeated grep+read)
Token savings 84% fewer tokens (GCF wire format)
Repeat query speed 93x faster (Merkle-keyed subgraph cache)
Merkle diff 517x faster than full edge scan at 100K edges
Test scope 98% precision, 82% recall
Graph integrity check 98ms (24,936 edges)
Proof generation 72us generate, 1.2us verify
Feedback expiration 100% expire on code change, 11% overhead
Cross-repo retrieval 46.7% R@10 on foreign codebase, zero config
Cross-system retrieval P@10=0.217, 1.63x vs codegraph (19K stars), 4.3x vs Aider, 11x vs grep
Indexing throughput 7 repos (47,150 files) in ~52s
Language coverage 7/7 repos pass (Go, Python, TS, Rust, Java, C#). codegraph: 5/7

All benchmarks are reproducible: GOWORK=off go test ./bench/... -timeout 5m


Quick Start

# Install
brew install blackwell-systems/tap/knowing
# Or: go install github.com/blackwell-systems/knowing/cmd/knowing@latest
# Or: npm install -g @blackwell-systems/knowing
# Or: pip install knowing

# That's it. Add the MCP config and start a session.
# The server auto-indexes your repo on first launch.

# Or index manually for CLI usage:
knowing add .

# Remove a repo (evicts all data: nodes, edges, snapshots, feedback)
knowing remove ./path/to/repo

# Get context for a task
knowing context -task "refactor auth middleware" -format gcf

# Find affected tests
knowing test-scope -files internal/auth/middleware.go

# Explain why a symbol ranked where it did
knowing why -task "refactor auth" -symbol "SessionHandler"

# Prove a relationship exists (cryptographic Merkle proof)
knowing prove -source "AuthService" -target "SessionStore"

# Verify offline (no database needed)
knowing verify proof.json

# Check graph integrity
knowing fsck

# Check if the graph is stale (CI gate: exits 1 if stale)
knowing stale

MCP Integration

{
  "mcpServers": {
    "knowing": {
      "command": "knowing",
      "args": ["mcp", "--watch"],
      "transport": "stdio"
    }
  }
}

The --watch flag re-indexes on file changes. Your agent always queries fresh data. No manual knowing index or database path needed: the MCP server auto-indexes the git repository on first launch and registers it in the roster for future sessions.

For HTTP transport (multi-agent, daemon mode):

knowing serve -addr :8100 .
{
  "mcpServers": {
    "knowing": {
      "url": "http://localhost:8100",
      "transport": "streamable-http"
    }
  }
}

Why This Works

Git versions files. knowing versions the understanding of code.

The entire system is built on one idea: content-addressed identity. Every symbol, relationship, and snapshot is SHA-256 hashed. This single choice gives you:

  • Staleness detection for free. Changed file = new hash = stale edges are known without scanning.
  • Caching for free. Same package root = same results. 93x speedup on unchanged queries.
  • Integrity for free. Verify all stored hashes and snapshot chain continuity. 98ms.
  • History for free. Each snapshot is a Merkle root tied to a git commit. Walk the chain.
  • Feedback expiration for free. Feedback stores the package Merkle root. Code changes = root changes = old feedback is invisible.
  • Proofs for free. Merkle path from leaf to root is a self-contained cryptographic proof.
Git knowing
What it versions File contents Code relationships and their meaning
Unit of storage blob node + edge + provenance + confidence
Identity sha256(content) sha256("node\0" + repo + package + name + kind)
Snapshot tree of blobs Hierarchical Merkle: repo -> package -> edge-type -> leaf
Diff Which lines changed Which packages changed, what broke, what's new
History What code looked like What the codebase understood about itself

How It Works

+------------------------------------------------------------------+
|                         knowing daemon                            |
+----------------+------------------------+--------------------------+
|   Indexer      |     Graph Store        |      MCP Server          |
|                |                        |                          |
| 26 extractors  | Content-addressed      | 28 tools + 8 resources   |
| tree-sitter    | SQLite + Merkle tree   | stdio / HTTP (1.8s index)|
| LSP + SCIP     | Hierarchical snapshots | GCF / GCB / JSON         |
| OTel traces    | Subgraph cache (93x)   | PackRoot dedup (99%)     |
|                | Community detection    |                          |
+----------------+------------------------+--------------------------+

Two planes:

  • Execution: indexes repos, extracts symbols and relationships, ingests traces, stores snapshots.
  • Intelligence: computes blast radius, context packs, test scope, feedback, communities from the stored graph.

The boundary matters: intelligence features read the graph and produce derived results. They cannot corrupt graph facts. A bad ranking produces a bad recommendation; it cannot invalidate a proof.


Capabilities

Languages And Formats

Language/Format Extractor Framework/Pattern Detection
Go tree-sitter + go/packages + SCIP net/http, gin, echo, chi, gorilla/mux
TypeScript/JavaScript tree-sitter Express.js, Fastify, Hono, NestJS, Next.js
Python tree-sitter Flask, FastAPI, Django
Rust tree-sitter Actix, Axum, Rocket
Java tree-sitter Spring annotations
C# tree-sitter ASP.NET attributes
Protocol Buffers tree-sitter service, message, enum, RPC declarations
Terraform (HCL) tree-sitter resource, data, module, variable declarations
SQL tree-sitter tables, views, functions, procedures, FK edges
Kubernetes YAML yaml.v3 deployments, services, configmaps, label-selector edges
CloudFormation/SAM yaml.v3 resources, !Ref/!GetAtt/!Sub cross-references
Docker Compose yaml.v3 services, ports, networks, depends_on links
GitHub Actions yaml.v3 workflows, jobs, steps, action references
Serverless Framework yaml.v3 functions, events, resource references
CSS/SCSS tree-sitter selectors, custom properties, var() dependencies
Event/MQ patterns multi-language Kafka, NATS, SQS, RabbitMQ publish/subscribe
OpenAPI/JSON Schema json/yaml endpoints, models, $ref resolution
Dockerfile parser FROM base images, COPY --from multi-stage deps, EXPOSE ports
Makefile parser target dependencies, include directives, variable references
Helm Charts yaml.v3 chart dependencies, template references, values injection
GitLab CI yaml.v3 job needs, extends templates, include files, artifacts
package.json (npm) json dependencies, devDependencies, peerDependencies, scripts
GraphQL parser type definitions, field type references, interface implementations
Ruby tree-sitter classes, modules, method definitions, require edges
.env files parser environment variable declarations, cross-file references

All extractors fire per file via multi-dispatch; results are merged. Tree-sitter produces edges at confidence 0.7 (ast_inferred); go/packages and SCIP at 0.95-1.0 (ast_resolved, scip_resolved).

MCP Tools

Tool Purpose
index_repo, graph_query, repo_graph Build and inspect the graph
cross_repo_callers, blast_radius, trace_dataflow, flow_between Understand impact and paths
snapshot_diff, semantic_diff, pr_impact, stale_edges Compare graph states and review changes
runtime_traffic, dead_routes, trace_stats Query runtime-observed relationships
context_for_task, context_for_files, context_for_pr, explain_symbol Ranked context for agents
ownership, ownership_query, test_scope, communities, plan_turn, feedback Route work, query code owners/authors, select tests, improve ranking
prove, prove_absent, fsck Cryptographic proofs, absence proofs, integrity verification
untrack_repo Evict all data for a repository (nodes, edges, files, snapshots, feedback, task memory, graph notes)

MCP prompts: refactor_safely, review_pr, investigate_dead_code.

MCP Resources

8 read-only resources for agent orientation without a tool call:

Resource What it returns
knowing://report Graph size, top kinds, hotspot count, snapshot age
knowing://schema Node kinds, edge types, provenance tiers, hash format
knowing://stats Counts by repo, kind, and edge type
knowing://repos All tracked repos with counts and last-indexed time
knowing://session Context calls, symbols served, cache hits/misses, uptime
knowing://index-health Healthy/stale/corrupted status, integrity check
knowing://communities Community list with cohesion and Merkle roots
knowing://community/{id} Single community detail (resource template)

Wire Formats

Format Purpose Savings vs JSON
GCF (Graph Compact Format) LLM consumption: line-oriented, positional fields 84% fewer tokens
GCB (Graph Compact Binary) Service transport and caching: varint, length-prefixed 74% fewer bytes
JSON Human debugging, generic consumers Baseline

GCF uses |-separated fields and local IDs ($1 -> $3) instead of repeated qualified names. Parseable by LLMs while fitting 5x more graph context into the same token budget. Session-stateful deduplication reduces repeated symbols by 47%.


Current Boundaries

  • Breaking hash change (v0.3.0): Hash domain prefixes added. Databases from before v0.3.0 must be re-indexed. Run knowing fsck after.
  • Static blast radius follows calls edges; other edge types provide context, not traversal.
  • Runtime tools require OpenTelemetry trace ingestion; without traces they have no observations.
  • LSP enrichment: Go, TypeScript, Python, Rust, Java, C#. Auto-detected from project markers. Others fall back to tree-sitter.

Documentation

Doc Contents
Architecture System design, schemas, content addressing, daemon model
Features Implementation inventory, entry points, limitations
Audit & Compliance Merkle proofs, fsck, snapshot chain, CI gates
CLI Reference Commands, flags, examples
MCP Tools Tool schemas, parameters, return formats
Edge Types Relationship semantics and provenance
Context Packing RWR, HITS, ranking, token budgeting
Runtime Traces OTel ingestion and runtime confidence
Wire Formats GCF, GCB, JSON formats and benchmarks
Roadmap Completed workstreams and next priorities
Benchmarks Reproducible value benchmarks with performance contracts
Whitepaper Hierarchical Identity Architecture thesis (DOI: 10.5281/zenodo.20342255)
Hooks Claude Code hook integration

License

MIT

Directories

Path Synopsis
bench
cross-system/adapters
Package adapters provides system-specific implementations of the benchmark Adapter interface.
Package adapters provides system-specific implementations of the benchmark Adapter interface.
cross-system/benchtype
Package benchtype defines shared types for the cross-system context retrieval benchmark.
Package benchtype defines shared types for the cross-system context retrieval benchmark.
cross-system/cmd/failure-analysis command
Command failure-analysis examines what knowing returns vs ground truth for each task, categorizing misses into: related-but-unlisted, noise, wrong-package, correct-package-wrong-symbol.
Command failure-analysis examines what knowing returns vs ground truth for each task, categorizing misses into: related-but-unlisted, noise, wrong-package, correct-package-wrong-symbol.
cross-system/cmd/validate-fixtures command
Command validate-fixtures checks each ground truth symbol against the actual DB contents and reports mismatches.
Command validate-fixtures checks each ground truth symbol against the actual DB contents and reports mismatches.
cross-system/metrics
Package metrics computes retrieval quality metrics for the cross-system benchmark.
Package metrics computes retrieval quality metrics for the cross-system benchmark.
cross-system/normalize
Package normalize provides symbol name canonicalization for cross-system comparison.
Package normalize provides symbol name canonicalization for cross-system comparison.
cmd
knowing command
Package main is the entry point for the knowing CLI.
Package main is the entry point for the knowing CLI.
internal
cache
Package cache provides a thread-safe, TTL-bounded subgraph result cache keyed by Merkle subgraph roots.
Package cache provides a thread-safe, TTL-bounded subgraph result cache keyed by Merkle subgraph roots.
community
Package community provides pluggable graph community detection algorithms.
Package community provides pluggable graph community detection algorithms.
context
Package context implements graph-aware context packing for AI agent consumption.
Package context implements graph-aware context packing for AI agent consumption.
daemon
Package daemon provides file watching, reindex coordination, and daemon lifecycle management for the knowing system of record.
Package daemon provides file watching, reindex coordination, and daemon lifecycle management for the knowing system of record.
diff
Package diff computes semantic diffs and PR impact analysis between graph snapshots.
Package diff computes semantic diffs and PR impact analysis between graph snapshots.
embedding
Package embedding provides semantic vector embeddings for symbols.
Package embedding provides semantic vector embeddings for symbols.
enrichment
Package enrichment provides an LSP-based enrichment pass that upgrades ast_inferred edges to lsp_resolved by querying language servers via the agent-lsp public API.
Package enrichment provides an LSP-based enrichment pass that upgrades ast_inferred edges to lsp_resolved by querying language servers via the agent-lsp public API.
indexer
Package indexer orchestrates source code extraction and graph indexing.
Package indexer orchestrates source code extraction and graph indexing.
indexer/authorship
Package authorship extracts authored_by edges from git blame data.
Package authorship extracts authored_by edges from git blame data.
indexer/cloudextractor
Package cloudextractor extracts cloud infrastructure and CI/CD resource definitions and their relationships from YAML configuration files.
Package cloudextractor extracts cloud infrastructure and CI/CD resource definitions and their relationships from YAML configuration files.
indexer/csharpextractor
Package csharpextractor provides C# extraction with ASP.NET attribute route detection.
Package csharpextractor provides C# extraction with ASP.NET attribute route detection.
indexer/cssextractor
Package cssextractor extracts CSS/SCSS selectors, custom properties, and import relationships.
Package cssextractor extracts CSS/SCSS selectors, custom properties, and import relationships.
indexer/docextract
Package docextract provides language-agnostic docstring extraction from tree-sitter AST nodes.
Package docextract provides language-agnostic docstring extraction from tree-sitter AST nodes.
indexer/dockerfileextractor
Package dockerfileextractor provides an extractor for Dockerfile files.
Package dockerfileextractor provides an extractor for Dockerfile files.
indexer/envextractor
Package envextractor provides an extractor for environment variable files.
Package envextractor provides an extractor for environment variable files.
indexer/eventextractor
Package eventextractor provides a supplementary extractor that detects message queue producer and consumer patterns across Go, TypeScript, Python, and Java source code.
Package eventextractor provides a supplementary extractor that detects message queue producer and consumer patterns across Go, TypeScript, Python, and Java source code.
indexer/gitlabciextractor
Package gitlabciextractor provides an extractor for GitLab CI configuration files.
Package gitlabciextractor provides an extractor for GitLab CI configuration files.
indexer/goextractor
Package goextractor provides Go-specific extraction using go/packages for full type resolution.
Package goextractor provides Go-specific extraction using go/packages for full type resolution.
indexer/gotsextractor
Package gotsextractor provides Go extraction using tree-sitter for fast AST parsing with route detection.
Package gotsextractor provides Go extraction using tree-sitter for fast AST parsing with route detection.
indexer/graphqlextractor
Package graphqlextractor provides an extractor for GraphQL schema files.
Package graphqlextractor provides an extractor for GraphQL schema files.
indexer/helmextractor
Package helmextractor provides an extractor for Helm chart files.
Package helmextractor provides an extractor for Helm chart files.
indexer/javaextractor
Package javaextractor provides Java extraction with Spring annotation route detection.
Package javaextractor provides Java extraction with Spring annotation route detection.
indexer/k8sextractor
Package k8sextractor extracts Kubernetes resource definitions and their deployment relationships.
Package k8sextractor extracts Kubernetes resource definitions and their deployment relationships.
indexer/makefileextractor
Package makefileextractor provides an extractor for Makefile and .mk files.
Package makefileextractor provides an extractor for Makefile and .mk files.
indexer/ownership
Package ownership parses CODEOWNERS files and emits owned_by edges from file nodes to synthetic team/user nodes.
Package ownership parses CODEOWNERS files and emits owned_by edges from file nodes to synthetic team/user nodes.
indexer/packagejsonextractor
Package packagejsonextractor provides an extractor for package.json files.
Package packagejsonextractor provides an extractor for package.json files.
indexer/protoextractor
Package protoextractor provides a tree-sitter based extractor for Protocol Buffer (.proto) files.
Package protoextractor provides a tree-sitter based extractor for Protocol Buffer (.proto) files.
indexer/rubyextractor
Package rubyextractor provides a tree-sitter based extractor for Ruby files.
Package rubyextractor provides a tree-sitter based extractor for Ruby files.
indexer/rustextractor
Package rustextractor provides Rust extraction with Actix/Axum/Rocket route detection.
Package rustextractor provides Rust extraction with Actix/Axum/Rocket route detection.
indexer/schemaextractor
Package schemaextractor provides an extractor for OpenAPI 3.x, Swagger 2.x, and JSON Schema files.
Package schemaextractor provides an extractor for OpenAPI 3.x, Swagger 2.x, and JSON Schema files.
indexer/scipingest
Package scipingest parses SCIP (Source Code Intelligence Protocol) index files and imports their symbol definitions and references into the knowing knowledge graph.
Package scipingest parses SCIP (Source Code Intelligence Protocol) index files and imports their symbol definitions and references into the knowing knowledge graph.
indexer/sqlextractor
Package sqlextractor extracts SQL tables, views, functions, and their relationships.
Package sqlextractor extracts SQL tables, views, functions, and their relationships.
indexer/terraformextractor
Package terraformextractor extracts Terraform HCL resources, modules, and dependency relationships.
Package terraformextractor extracts Terraform HCL resources, modules, and dependency relationships.
indexer/treesitter
Package treesitter provides a Python extractor using tree-sitter grammars.
Package treesitter provides a Python extractor using tree-sitter grammars.
indexer/tsextractor
Package tsextractor provides TypeScript/JavaScript extraction with framework route detection.
Package tsextractor provides TypeScript/JavaScript extraction with framework route detection.
mcp
Package mcp exposes the knowing knowledge graph as MCP (Model Context Protocol) tools over stdio and HTTP transports.
Package mcp exposes the knowing knowledge graph as MCP (Model Context Protocol) tools over stdio and HTTP transports.
resolve
Package resolve provides shared utilities for determining whether an import path refers to an external dependency, a standard library module, or a local/relative import.
Package resolve provides shared utilities for determining whether an import path refers to an external dependency, a standard library module, or a local/relative import.
resolver
Package resolver finds dangling cross-repo edges and retargets them to the correct node by matching across repos using hash recomputation.
Package resolver finds dangling cross-repo edges and retargets them to the correct node by matching across repos using hash recomputation.
roster
Package roster manages the global registry of tracked repositories.
Package roster manages the global registry of tracked repositories.
snapshot
Package snapshot manages Merkle-based graph snapshots for the knowing knowledge graph.
Package snapshot manages Merkle-based graph snapshots for the knowing knowledge graph.
store
Package store provides the SQLite-backed implementation of types.GraphStore.
Package store provides the SQLite-backed implementation of types.GraphStore.
testutil
Package testutil provides shared test infrastructure for the knowing project.
Package testutil provides shared test infrastructure for the knowing project.
trace
Package trace implements OpenTelemetry span ingestion and runtime confidence scoring.
Package trace implements OpenTelemetry span ingestion and runtime confidence scoring.
types
Package types result types for graph queries and traversals.
Package types result types for graph queries and traversals.
wire
Package wire implements the GCF (Graph Compact Format) encoder and decoder.
Package wire implements the GCF (Graph Compact Format) encoder and decoder.
test
demo command
setup-runtime-demo populates a knowing database with simulated microservice nodes and runtime-observed edges for demo purposes.
setup-runtime-demo populates a knowing database with simulated microservice nodes and runtime-observed edges for demo purposes.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL