memory-pipeline

module

v0.1.3 Latest Latest Go to latest Published: Jan 14, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/andygeiss/memory-pipeline

Links

Open Source Insights

README ¶

Memory Pipeline

A CLI tool that extracts structured knowledge notes from source files using a local LLM, generates embeddings, and stores them as a searchable knowledge base.

Overview

Memory Pipeline scans your codebase for files with configurable extensions (.md, .txt, .go by default), processes them through an LLM to extract categorized notes, generates vector embeddings, and persists everything to JSON files. It's designed to work with local LLMs like LM Studio via an OpenAI-compatible API.

Note Categories

Learning — General knowledge, facts, or concepts
Pattern — Reusable patterns, best practices, or conventions
Cookbook — Step-by-step instructions or recipes
Decision — Architectural decisions, trade-offs, or rationale

Features

🔍 File Discovery — Recursively scans directories for matching file extensions
🤖 LLM Extraction — Uses local LLMs to extract structured knowledge
📊 Vector Embeddings — Generates embeddings for semantic search
💾 State Tracking — Tracks processed files to avoid redundant work
🔄 Change Detection — Re-processes files when content changes

Requirements

Go 1.25 or later
A local LLM server with OpenAI-compatible API (e.g., LM Studio)
just (optional, for task running)

Installation

# Clone the repository
git clone https://github.com/andygeiss/memory-pipeline.git
cd memory-pipeline

# Install dependencies (macOS)
just setup

# Or manually install Go dependencies
go mod download

Quick Start

Start your local LLM server (e.g., LM Studio on localhost:1234)
Run the pipeline:
```
just run
```
Or directly:
```
go run ./cmd/cli/main.go
```
Check the output:
- .memory-state.json — Processing state for each file
- .memory-notes.json — Extracted notes with embeddings

Commands

just run              # Run the CLI locally
just test             # Run tests with coverage
just test-integration # Run integration tests (requires LM Studio)
just profile          # Run benchmarks and generate CPU profile for PGO
just fmt              # Format code
just lint             # Lint code
just build            # Build Docker image
just up               # Start services
just down             # Stop services
just setup            # Install dependencies (macOS)

Configuration

Configuration is done via environment variables. Create a .env file or export variables directly:

Variable	Default	Description
`MEMORY_SOURCE_DIR`	`.`	Directory to scan for files
`MEMORY_STATE_FILE`	`.memory-state.json`	Processing state file
`MEMORY_FILE`	`.memory-notes.json`	Output notes file
`APP_FILE_EXTENSIONS`	`.md,.txt,.go`	Comma-separated file extensions
`OPENAI_BASE_URL`	`http://localhost:1234/v1`	LLM API endpoint
`OPENAI_API_KEY`	`not-used-in-local-llm-mode`	API key (if required)
`OPENAI_CHAT_MODEL`	`qwen/qwen3-coder-30b`	Chat model name
`OPENAI_EMBED_MODEL`	`text-embedding-qwen3-embedding-0.6b`	Embedding model name

Example

# Process only markdown files in ./docs
MEMORY_SOURCE_DIR=./docs APP_FILE_EXTENSIONS=.md just run

Project Structure

memory-pipeline/
├── cmd/cli/              # Application entry point + benchmarks
├── internal/
│   ├── adapters/
│   │   ├── inbound/      # File walker (input adapter)
│   │   └── outbound/     # LLM, embedding, and storage adapters
│   ├── config/           # Environment configuration
│   └── domain/
│       └── extraction/   # Core business logic
├── .justfile             # Task runner commands
├── Dockerfile            # Container build
└── docker-compose.yml    # Service orchestration

The project follows Hexagonal Architecture (Ports and Adapters) with Domain-Driven Design principles. See CONTEXT.md for detailed architectural documentation.

How It Works

Scan — FileWalker discovers files matching configured extensions
Track — State manager tracks which files need processing
Extract — LLM analyzes file content and extracts structured notes
Embed — Embedding client generates vector representations
Store — Notes with embeddings are persisted to JSON

Files → FileWalker → LLMClient → EmbeddingClient → NoteStore
            ↓                                          ↓
      .memory-state.json                      .memory-notes.json

Development

Running Tests

# Unit tests with coverage
just test

# Integration tests (requires running LM Studio)
just test-integration

Building

# Build binary
go build -o bin/cli ./cmd/cli

# Build with PGO optimization (after running just profile)
go build -pgo=cpuprofile.pprof -o bin/cli ./cmd/cli

# Build Docker image
just build

Code Quality

# Format code
just fmt

# Run linter
just lint

License

Directories ¶

Path	Synopsis
cmd
cli command
internal
adapters/inbound
adapters/outbound
config
domain/extraction

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL