ragx — a terminal first local RAG assistant

[!NOTE]
Experimental project: This is an experimental exploration of LLM programming. Expect some rough edges and ongoing changes as it evolves.
ragx is a minimal and hackable Retrieval-Augmented Generation (RAG) CLI tool designed for the terminal. It embeds your local files (or stdin), retrieves relevant chunks with KNN search, and queries OpenAI-compatible LLMs (local or remote) via a CLI/TUI workflow.
(click to expand) Table of Contents
Installation
Option 1: Install via Go
go install github.com/ladzaretti/ragx-cli/cmd/ragx@latest
Option 2: Install via curl
curl -sSL https://raw.githubusercontent.com/ladzaretti/ragx-cli/main/install.sh | bash
This will auto detect your OS/arch, downloads the latest release, and installs ragx to /usr/local/bin.
Option 3: Download a release
Visit the Releases page for a list of available downloads.
Overview
ragx focuses on the essentials of RAG:
- Embed: split content into chunks and generate embeddings with your chosen embedding model.
- Retrieve: run KNN over embeddings to select the most relevant chunks.
- Generate: send a prompt (system + user template + retrieved context) to an OpenAI-API compatible chat model.
Key features
- OpenAI API
v1 compatible: point ragx at any base URL (local Ollama or remote).
- Per-provider/per-model overrides: control temperature and context length.
- TUI chat: a lightweight Bubble Tea interface for iterative querying.
- Terminal first: pipe text in, embed directories/files, and print results.
Use cases
- Local knowledge bases: notes, READMEs, docs.
- Quick “ask my files” workflows.
Pipeline Overview
flowchart LR
subgraph Ingest
A["Files / stdin"] --> B["Chunker"]
B --> C["Embedder"]
C --> D["Vector Index / KNN"]
end
subgraph Query
Q["User Query"] --> QE["Embed Query"]
QE --> D
D --> K["Top-K Chunks"]
K --> P["Prompt Builder (system + template + context)"]
P --> M["LLM (OpenAI-compatible)"]
M --> R["Answer"]
end
Usage
$ ragx --help
ragx is a terminal-first RAG assistant.
Embed data, run retrieval, and query local or remote OpenAI API-compatible LLMs.
Usage:
ragx [command]
Available Commands:
chat Start the interactive terminal chat UI
config Show and inspect configuration
help Help about any command
list List available models
query Embed data from paths or stdin and query the LLM
version Show version
Flags:
-h, --help help for ragx
Use "ragx [command] --help" for more information about a command.
Configuration file
The optional configuration file can be generated using ragx config generate command:
[llm]
# Default model to use
default_model = ''
# LLM providers (uncomment and duplicate as needed)
# [[llm.providers]]
# base_url = 'http://localhost:11434'
# api_key = '<KEY>' # optional
# temperature = 0.7 # optional (provider default)
# Optional model definitions for context length control (uncomment and duplicate as needed)
# [[llm.models]]
# id = 'qwen:8b' # Model identifier
# context = 4096 # Maximum context length in tokens
# temperature = 0.7 # optional (model override)
[prompt]
# System prompt to override the default assistant behavior
# system_prompt = ''
# Go text/template for building the USER QUERY + CONTEXT block.
# Supported template vars:
# .Query — the user's raw query string
# .Chunks — slice of retrieved chunks (may be empty). Each chunk has:
# .ID — numeric identifier of the chunk
# .Source — source file/path of the chunk
# .Content — text content of the chunk
# user_prompt_tmpl = ''
[embedding]
# Model used for embeddings
embedding_model = ''
# Number of characters per chunk
# chunk_size = 2000
# Number of characters overlapped between chunks (must be less than chunk_size)
# overlap = 200
# Number of chunks to retrieve during RAG
# top_k = 20
# [logging]
# Directory where log file will be stored (default: XDG_STATE_HOME or ~/.local/state/ragx)
# log_dir = '/home/gbi/.local/state/ragx'
# Filename for the log file
# log_filename = '.log'
# log_level = 'info'
Default prompts
System Prompt
User Query Template
Config precedence (highest -> lowest)
- CLI flags
- Environment variables (if supported)
- OpenAI environment variables are auto-detected:
OPENAI_API_BASE, OPENAI_API_KEY
- Config file
- Defaults
Examples
Example: Listing available models
$ ragx list
http://localhost:11434/v1
jina/jina-embeddings-v2-base-en:latest
gpt-oss:20b
qwen3:8b-fast
nomic-embed-text:latest
mxbai-embed-large:latest
llama3.1:8b
qwen2.5-coder:14b
deepseek-r1:8b
qwen3:8b
nomic-embed-text:v1.5
hf.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL
Example: TUI session
Example: CLI one-shot query
$ ragx query readme.md \
--model qwen3:8b \
--embedding-model jina/jina-embeddings-v2-base-en:latest \
"how do i tune chunk_size and overlap for large docs?"
- Tune `chunk_size` (chars per chunk) and `overlap` (chars overlapped between chunks) via config or CLI flags. For large documents, increase `chunk_size` (e.g., 2000+ chars) but keep `overlap` < `chunk_size` (e.g., 200). Adjust based on your content type and retrieval needs. [1]
Sources:
[1] (chunk 2) /home/gbi/GitHub/Gabriel-Ladzaretti/ragx-cli/readme.md
These are minimal examples to get you started.
For detailed usage and more examples, run each subcommand with --help.
Common command patterns
[!NOTE]
These examples assume you already have a valid config file with at least one provider, a default chat model, and an embedding model set.
Generate a starter config with: ragx config generate > ~/.ragx.toml.
# embed all .go files in current dir and query via --query/-q
ragx query . -M '\.go$' -q "<query>"
# embed a single file and provide query after flag terminator --
ragx query readme.md -- "<query>"
# embed stdin and provide query as the last positional argument
cat readme.md | ragx query "<query>"
# embed multiple paths with filter
ragx query docs src -M '(?i)\.(md|txt)$' -q "<query>"
# embed all .go files in current dir and start the TUI
ragx chat . -M '\.go$'
# embed multiple paths (markdown and txt) and start the TUI
ragx chat ./docs ./src -M '(?i)\.(md|txt)$'
# embed stdin and start the TUI
cat readme.md | ragx chat
Notes & Limitation
- Chunking is character-based by default; adjust
chunk_size/overlap for your content and use case.
- The vector database is ephemeral: created fresh per session and not saved to disk.