ragx-cli

module

v0.1.0-devel.1 Latest Latest Go to latest Published: Sep 7, 2025 License: Unlicense

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ladzaretti/ragx-cli

Links

Open Source Insights

README ¶

ragx — a terminal first local RAG assistant

license

[!NOTE] Experimental project: This is an experimental exploration of LLM programming. Expect some rough edges and ongoing changes as it evolves.

ragx is a minimal and hackable Retrieval-Augmented Generation (RAG) CLI tool designed for the terminal. It embeds your local files (or stdin), retrieves relevant chunks with KNN search, and queries OpenAI-compatible LLMs (local or remote) via a CLI/TUI workflow.

(click to expand) Table of Contents

Installation
Overview
- Key features
- Use cases
Pipeline Overview
Usage
Configuration file
- Default prompts
- Config precedence (highest -> lowest)
Examples
Notes & Limitation

Installation

Option 1: Install via Go

 go install github.com/ladzaretti/ragx-cli/cmd/ragx@latest

Option 2: Install via curl

curl -sSL https://raw.githubusercontent.com/ladzaretti/ragx-cli/main/install.sh | bash

This will auto detect your OS/arch, downloads the latest release, and installs ragx to /usr/local/bin.

Option 3: Download a release

Visit the Releases page for a list of available downloads.

Overview

ragx focuses on the essentials of RAG:

Embed: split content into chunks and generate embeddings with your chosen embedding model.
Retrieve: run KNN over embeddings to select the most relevant chunks.
Generate: send a prompt (system + user template + retrieved context) to an OpenAI-API compatible chat model.

Key features

OpenAI API v1 compatible: point ragx at any base URL (local Ollama or remote).
Per-provider/per-model overrides: control temperature and context length.
TUI chat: a lightweight Bubble Tea interface for iterative querying.
Terminal first: pipe text in, embed directories/files, and print results.

Use cases

Local knowledge bases: notes, READMEs, docs.
Quick “ask my files” workflows.

Pipeline Overview

flowchart LR
  subgraph Ingest
    A["Files / stdin"] --> B["Chunker"]
    B --> C["Embedder"]
    C --> D["Vector Index / KNN"]
  end

  subgraph Query
    Q["User Query"] --> QE["Embed Query"]
    QE --> D
    D --> K["Top-K Chunks"]
    K --> P["Prompt Builder (system + template + context)"]
    P --> M["LLM (OpenAI-compatible)"]
    M --> R["Answer"]
  end

Usage

$ ragx --help
ragx is a terminal-first RAG assistant. 
Embed data, run retrieval, and query local or remote OpenAI API-compatible LLMs.

Usage:
  ragx [command]

Available Commands:
  chat        Start the interactive terminal chat UI
  config      Show and inspect configuration
  help        Help about any command
  list        List available models
  query       Embed data from paths or stdin and query the LLM
  version     Show version

Flags:
  -h, --help   help for ragx

Use "ragx [command] --help" for more information about a command.

Configuration file

The optional configuration file can be generated using ragx config generate command:

[llm]
# Default model to use
default_model = ''
# LLM providers (uncomment and duplicate as needed)
# [[llm.providers]]
# base_url = 'http://localhost:11434'
# api_key = '<KEY>'		# optional
# temperature = 0.7		# optional (provider default)
# Optional model definitions for context length control (uncomment and duplicate as needed)
# [[llm.models]]
# id = 'qwen:8b'		# Model identifier
# context = 4096		# Maximum context length in tokens
# temperature = 0.7		# optional (model override)

[prompt]
# System prompt to override the default assistant behavior
# system_prompt = ''
# Go text/template for building the USER QUERY + CONTEXT block.
# Supported template vars:
#   .Query   — the user's raw query string
#   .Chunks  — slice of retrieved chunks (may be empty). Each chunk has:
#       .ID       — numeric identifier of the chunk
#       .Source   — source file/path of the chunk
#       .Content  — text content of the chunk
# user_prompt_tmpl = ''

[embedding]
# Model used for embeddings
embedding_model = ''
# Number of characters per chunk
# chunk_size = 2000
# Number of characters overlapped between chunks (must be less than chunk_size)
# overlap = 200
# Number of chunks to retrieve during RAG
# top_k = 20

# [logging]
# Directory where log file will be stored (default: XDG_STATE_HOME or ~/.local/state/ragx)
# log_dir = '/home/gbi/.local/state/ragx'
# Filename for the log file
# log_filename = '.log'
# log_level = 'info'

Default prompts

System Prompt

User Query Template

Config precedence (highest -> lowest)

CLI flags
Environment variables (if supported)
- OpenAI environment variables are auto-detected: OPENAI_API_BASE, OPENAI_API_KEY
Config file
Defaults

Examples

Example: Listing available models

$ ragx list
http://localhost:11434/v1
      jina/jina-embeddings-v2-base-en:latest
      gpt-oss:20b
      qwen3:8b-fast
      nomic-embed-text:latest
      mxbai-embed-large:latest
      llama3.1:8b
      qwen2.5-coder:14b
      deepseek-r1:8b
      qwen3:8b
      nomic-embed-text:v1.5
      hf.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL

Example: TUI session

Example: CLI one-shot query

$ ragx query readme.md \
            --model qwen3:8b \
            --embedding-model jina/jina-embeddings-v2-base-en:latest \
            "how do i tune chunk_size and overlap for large docs?"
- Tune `chunk_size` (chars per chunk) and `overlap` (chars overlapped between chunks) via config or CLI flags. For large documents, increase `chunk_size` (e.g., 2000+ chars) but keep `overlap` < `chunk_size` (e.g., 200). Adjust based on your content type and retrieval needs. [1]

Sources:
[1] (chunk 2) /home/gbi/GitHub/Gabriel-Ladzaretti/ragx-cli/readme.md

These are minimal examples to get you started.
For detailed usage and more examples, run each subcommand with --help.

Common command patterns

[!NOTE] These examples assume you already have a valid config file with at least one provider, a default chat model, and an embedding model set.
Generate a starter config with: ragx config generate > ~/.ragx.toml.

  # embed all .go files in current dir and query via --query/-q
  ragx query . -M '\.go$' -q "<query>"

  # embed a single file and provide query after flag terminator --
  ragx query readme.md -- "<query>"

  # embed stdin and provide query as the last positional argument
  cat readme.md | ragx query "<query>"

  # embed multiple paths with filter
  ragx query docs src -M '(?i)\.(md|txt)$' -q "<query>"

  # embed all .go files in current dir and start the TUI
  ragx chat . -M '\.go$'

  # embed multiple paths (markdown and txt) and start the TUI
  ragx chat ./docs ./src -M '(?i)\.(md|txt)$'

  # embed stdin and start the TUI
  cat readme.md | ragx chat

Notes & Limitation

Chunking is character-based by default; adjust chunk_size/overlap for your content and use case.
The vector database is ephemeral: created fresh per session and not saved to disk.

Directories ¶

Path	Synopsis
chatui
cli
prompt
clierror
cmd
ragx command
genericclioptions
llm Package llm provides a minimal OpenAI compatible client with chat, completion and embedding helpers.	Package llm provides a minimal OpenAI compatible client with chat, completion and embedding helpers.
types
vecdb

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL