Memory Pipeline

A CLI tool that extracts structured knowledge notes from source files using a local LLM, generates embeddings, and stores them as a searchable knowledge base.
Overview
Memory Pipeline scans your codebase for files with configurable extensions (.md, .txt, .go by default), processes them through an LLM to extract categorized notes, generates vector embeddings, and persists everything to JSON files. It's designed to work with local LLMs like LM Studio via an OpenAI-compatible API.
Note Categories
- Learning — General knowledge, facts, or concepts
- Pattern — Reusable patterns, best practices, or conventions
- Cookbook — Step-by-step instructions or recipes
- Decision — Architectural decisions, trade-offs, or rationale
Features
- 🔍 File Discovery — Recursively scans directories for matching file extensions
- 🤖 LLM Extraction — Uses local LLMs to extract structured knowledge
- 📊 Vector Embeddings — Generates embeddings for semantic search
- 💾 State Tracking — Tracks processed files to avoid redundant work
- 🔄 Change Detection — Re-processes files when content changes
Requirements
- Go 1.25 or later
- A local LLM server with OpenAI-compatible API (e.g., LM Studio)
- just (optional, for task running)
Installation
# Clone the repository
git clone https://github.com/andygeiss/memory-pipeline.git
cd memory-pipeline
# Install dependencies (macOS)
just setup
# Or manually install Go dependencies
go mod download
Quick Start
-
Start your local LLM server (e.g., LM Studio on localhost:1234)
-
Run the pipeline:
just run
Or directly:
go run ./cmd/cli/main.go
-
Check the output:
.memory-state.json — Processing state for each file
.memory-notes.json — Extracted notes with embeddings
Commands
just run # Run the CLI locally
just test # Run tests with coverage
just test-integration # Run integration tests (requires LM Studio)
just profile # Run benchmarks and generate CPU profile for PGO
just fmt # Format code
just lint # Lint code
just build # Build Docker image
just up # Start services
just down # Stop services
just setup # Install dependencies (macOS)
Configuration
Configuration is done via environment variables. Create a .env file or export variables directly:
| Variable |
Default |
Description |
MEMORY_SOURCE_DIR |
. |
Directory to scan for files |
MEMORY_STATE_FILE |
.memory-state.json |
Processing state file |
MEMORY_FILE |
.memory-notes.json |
Output notes file |
APP_FILE_EXTENSIONS |
.md,.txt,.go |
Comma-separated file extensions |
OPENAI_BASE_URL |
http://localhost:1234/v1 |
LLM API endpoint |
OPENAI_API_KEY |
not-used-in-local-llm-mode |
API key (if required) |
OPENAI_CHAT_MODEL |
qwen/qwen3-coder-30b |
Chat model name |
OPENAI_EMBED_MODEL |
text-embedding-qwen3-embedding-0.6b |
Embedding model name |
Example
# Process only markdown files in ./docs
MEMORY_SOURCE_DIR=./docs APP_FILE_EXTENSIONS=.md just run
Project Structure
memory-pipeline/
├── cmd/cli/ # Application entry point + benchmarks
├── internal/
│ ├── adapters/
│ │ ├── inbound/ # File walker (input adapter)
│ │ └── outbound/ # LLM, embedding, and storage adapters
│ ├── config/ # Environment configuration
│ └── domain/
│ └── extraction/ # Core business logic
├── .justfile # Task runner commands
├── Dockerfile # Container build
└── docker-compose.yml # Service orchestration
The project follows Hexagonal Architecture (Ports and Adapters) with Domain-Driven Design principles. See CONTEXT.md for detailed architectural documentation.
How It Works
- Scan — FileWalker discovers files matching configured extensions
- Track — State manager tracks which files need processing
- Extract — LLM analyzes file content and extracts structured notes
- Embed — Embedding client generates vector representations
- Store — Notes with embeddings are persisted to JSON
Files → FileWalker → LLMClient → EmbeddingClient → NoteStore
↓ ↓
.memory-state.json .memory-notes.json
Development
Running Tests
# Unit tests with coverage
just test
# Integration tests (requires running LM Studio)
just test-integration
Building
# Build binary
go build -o bin/cli ./cmd/cli
# Build with PGO optimization (after running just profile)
go build -pgo=cpuprofile.pprof -o bin/cli ./cmd/cli
# Build Docker image
just build
Code Quality
# Format code
just fmt
# Run linter
just lint
License
MIT © Andreas Geiß