gorag

package module

v1.0.4 Latest Latest Go to latest Published: Mar 17, 2026 License: MIT Imports: 0 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/DotNetAge/gorag

Links

Open Source Insights

README ¶

🦖 GoRAG

A Production-Ready, High-Performance Modular RAG Framework for Go

English | 中文文档

GoRAG is an enterprise-grade Retrieval-Augmented Generation (RAG) framework written entirely in Go. Designed for developers who are tired of Python dependency hell and slow async loops, GoRAG brings high concurrency, memory efficiency, and static type safety to the AI engineering world.

Whether you are building a simple document Q&A bot or a complex Agentic RAG system with multi-hop reasoning, GoRAG provides the foundational building blocks you need with zero bloat.

✨ Why GoRAG?

🚀 Blazing Fast: Built-in concurrent workers (10+ goroutines by default) and streaming parsers with O(1) memory footprint. Effortlessly index 100M+ scale document repositories.
🧩 Lego-like Modularity: Strictly follows Clean Architecture. Swap out LLMs, Vector Stores, or Document Parsers with a single line of code.
🧠 Advanced RAG Patterns Built-in: Out-of-the-box support for HyDE, RAG-Fusion, Semantic Chunking, Cross-Encoder Reranking, and Context Pruning.
☁️ Cloud-Native & Production-Ready: Compiles to a single binary. Features built-in circuit breakers, rate limiters, graceful degradation, and observability metrics.
📦 Zero-Dependency Quickstart: Deeply integrated with govector (a pure-Go embedded vector database) and gochat (a unified LLM SDK). Run a 100% local, privacy-first RAG pipeline without deploying external databases like Milvus or Qdrant.

🧰 Ecosystem & Integrations

🤖 LLM Providers (Powered by `gochat`)

Global: OpenAI, Anthropic (Claude 3), Azure OpenAI.
Local/Open-Source: Ollama (Llama 3, Qwen, Mistral, etc.).
Chinese AI: Kimi, DeepSeek, GLM-4, Minimax, Baichuan, etc.

🗄️ Vector Databases

govector 🌟 (Pure Go embedded vector store - Zero external dependencies!)
Milvus / Zilliz (Enterprise standard)
Qdrant (High-performance Rust engine)
Weaviate (Leading semantic search)
Pinecone (Fully managed cloud DB)

📄 Universal Parsers

Native streaming support for 16+ formats including: Text, PDF, DOCX, Markdown, HTML, CSV, JSON, and source code (Go, Python, Java, TS/JS).

🚀 Quick Start

Installation

go get github.com/DotNetAge/gorag

10 Lines to Your Private Knowledge Base

Using Ollama and our built-in govector engine, you can build a 100% local, privacy-first RAG system without any API keys or external database deployments:

package main

import (
    "context"
    "fmt"
    
    "github.com/DotNetAge/gochat/pkg/client/base"
    "github.com/DotNetAge/gochat/pkg/client/ollama"
    "github.com/DotNetAge/gorag/rag"
    "github.com/DotNetAge/gorag/vectorstore/govector"
)

func main() {
    ctx := context.Background()

    // 1. Init LLM Client (via gochat)
    llmClient, _ := ollama.New(ollama.Config{
        Config: base.Config{Model: "qwen:0.5b"},
    })

    // 2. Init Pure-Go Vector Store (Zero dependencies)
    vectorStore, _ := govector.NewStore(govector.Config{
        Dimension:  1536,
        Collection: "my_knowledge",
    })

    // 3. Build RAG Engine
    engine, _ := rag.New(
        rag.WithLLM(llmClient),
        rag.WithVectorStore(vectorStore),
    )

    // 4. Index your private data (Auto-chunking & vectorization)
    engine.Index(ctx, rag.Source{
        Type:    "text",
        Content: "GoRAG is a high-performance RAG framework written in pure Go.",
    })

    // 5. Query
    resp, _ := engine.Query(ctx, "What is GoRAG?", rag.QueryOptions{TopK: 3})
    fmt.Println("Answer:", resp.Answer)
}

High-Concurrency Directory Indexing

Need to process a massive codebase or 50GB of company documents? GoRAG handles it concurrently:

// 🚀 One-click index an entire directory! 
// Auto-detects .pdf, .go, .md, .docx, etc., and routes to the correct parser.
err := engine.IndexDirectory(ctx, "./my-company-docs")

// Stream the response back (Typewriter effect for frontend UX)
ch, _ := engine.QueryStream(ctx, "Summarize the Q3 financial report from the docs", rag.QueryOptions{
    Stream: true,
})

for resp := range ch {
    fmt.Print(resp.Chunk)
}

⚡ Advanced Capabilities

GoRAG is not just a glue framework; it implements cutting-edge retrieval paradigms natively:

Agentic RAG / CRAG: Intelligent routing, self-reflection, and fallback mechanisms for complex queries.
RAG-Fusion & Multi-Query: Rewrites user queries into multiple perspectives, retrieving and applying Reciprocal Rank Fusion (RRF) for higher accuracy.
Context Pruning & Cross-Encoder: Extracts only the most relevant sentences from chunks and reranks them, saving LLM tokens and reducing hallucinations.
Graph RAG: Native support for Neo4j and ArangoDB for cross-node multi-hop reasoning.

🛠️ CLI Tool

GoRAG comes with a powerful built-in CLI for rapid testing and administration:

# Install the CLI
go install github.com/DotNetAge/gorag/cmd/gorag@latest

# Index a file directly from the terminal
gorag index --api-key $OPENAI_API_KEY --file ./docs/architecture.md

# Query your knowledge base
gorag query --api-key $OPENAI_API_KEY "How does the circuit breaker work?"

📈 Roadmap

Core architecture and pluggable interfaces
Advanced Enhancers (HyDE, Context Pruning, Reranking)
Native Graph RAG integration (Neo4j, ArangoDB)
Multi-modal RAG (Image & Video indexing)
Enterprise Dashboard and API server

🤝 Contributing

We welcome contributions! Whether it's adding a new vector store driver, improving the documentation, or fixing a bug, please check out our Contributing Guidelines.

Give us a ⭐️ if this project helped you build faster and safer AI applications!

📄 License

GoRAG is dual-licensed under the MIT License.

Documentation ¶

Overview ¶

Package gorag provides a Retrieval-Augmented Generation (RAG) framework for Go

GoRAG is a comprehensive framework for building RAG applications that combine large language models (LLMs) with vector databases for efficient information retrieval.

Key features include: - Circuit breaker pattern for service resilience - Graceful degradation for unreliable services - Lazy loading for efficient memory usage - Observability with metrics, logging, and tracing - Plugin system for extensibility - Connection pooling for efficient resource management - Support for multiple vector stores (Memory, Milvus, Pinecone, Qdrant, Weaviate) - Support for multiple embedding providers (Cohere, Ollama, OpenAI, Voyage) - Support for multiple LLM clients (Anthropic, Azure OpenAI, Ollama, OpenAI) - Support for multiple document parsers (CSV, JSON, Markdown, PDF, etc.)

To get started, see the examples in the cmd/gorag directory or refer to the documentation in the docs directory.

Source Files ¶

View all Source files

gorag.go

Directories ¶

Path	Synopsis
examples
01_native_rag Package nativerag demonstrates a basic Native RAG pipeline using GoRAG Steps.	Package nativerag demonstrates a basic Native RAG pipeline using GoRAG Steps.
02_hybrid_rag command Package hybridrag demonstrates how to compose Hybrid RAG pipeline with multiple retrieval strategies.	Package hybridrag demonstrates how to compose Hybrid RAG pipeline with multiple retrieval strategies.
infra
cache
chunker/semantic Package semantic provides semantic chunking utilities for RAG systems.	Package semantic provides semantic chunking utilities for RAG systems.
enhancer Package enhancer provides query and document enhancement utilities for RAG systems.	Package enhancer provides query and document enhancement utilities for RAG systems.
evaluation
fusion
generation
graph Package graph provides graph-related utilities for RAG systems.	Package graph provides graph-related utilities for RAG systems.
graphstore
indexer
indexing
parser/base
parser/config
parser/config/types
parser/csv
parser/dbschema
parser/docx
parser/email
parser/excel
parser/gocode
parser/html
parser/image
parser/javacode
parser/jscode
parser/json
parser/log
parser/markdown
parser/pdf
parser/ppt
parser/pycode
parser/text
parser/tscode
parser/xml
parser/yaml
resilience
searcher/agentic Package agentic provides the Agentic RAG searcher:	Package agentic provides the Agentic RAG searcher:
searcher/core Package core provides shared default factory functions used by all searcher sub-packages (native, rerank, hybrid, multimodal, agentic, graphlocal, graphglobal, multiagent).	Package core provides shared default factory functions used by all searcher sub-packages (native, rerank, hybrid, multimodal, agentic, graphlocal, graphglobal, multiagent).
searcher/crag Package crag provides CRAG (Corrective RAG) implementation.	Package crag provides CRAG (Corrective RAG) implementation.
searcher/graphglobal Package graphglobal provides the Graph-Global RAG searcher (community-summary-based macro pipeline):	Package graphglobal provides the Graph-Global RAG searcher (community-summary-based macro pipeline):
searcher/graphlocal Package graphlocal provides the Graph-Local RAG searcher (entity-centric N-Hop pipeline):	Package graphlocal provides the Graph-Local RAG searcher (entity-centric N-Hop pipeline):
searcher/hybrid Package hybrid provides the Hybrid RAG searcher:	Package hybrid provides the Hybrid RAG searcher:
searcher/hyde Package hyde provides HyDE (Hypothetical Document Embeddings) RAG implementation.	Package hyde provides HyDE (Hypothetical Document Embeddings) RAG implementation.
searcher/multiagent Package multiagent provides the Multi-Agent RAG searcher:	Package multiagent provides the Multi-Agent RAG searcher:
searcher/native Package native provides the NativeRAG searcher:	Package native provides the NativeRAG searcher:
searcher/selfquery Package selfquery provides Self-Query RAG implementation.	Package selfquery provides Self-Query RAG implementation.
searcher/stepback Package stepback provides StepBack RAG implementation.	Package stepback provides StepBack RAG implementation.
service
steps
steps/agentic Package agentic provides agentic orchestration steps for autonomous RAG workflows.	Package agentic provides agentic orchestration steps for autonomous RAG workflows.
steps/post_retrieval Package post_retrieval provides steps that process and optimize retrieval results.	Package post_retrieval provides steps that process and optimize retrieval results.
steps/pre_retrieval Package pre_retrieval provides query enhancement steps that occur before retrieval.	Package pre_retrieval provides query enhancement steps that occur before retrieval.
steps/retrieval Package retrieval provides retrieval strategy steps that execute different search algorithms.	Package retrieval provides retrieval strategy steps that execute different search algorithms.
vectorstore
vectorstore/govector
vectorstore/memory
vectorstore/milvus
vectorstore/pinecone
vectorstore/qdrant
vectorstore/weaviate
pkg
di
domain/abstraction Package abstraction defines the storage abstraction interfaces for the goRAG framework.	Package abstraction defines the storage abstraction interfaces for the goRAG framework.
domain/entity Package entity defines the core entities for the goRAG framework.	Package entity defines the core entities for the goRAG framework.
logging
observability
testkit
usecase/dataprep Package dataprep provides data preparation utilities for RAG systems.	Package dataprep provides data preparation utilities for RAG systems.
usecase/evaluation
usecase/retrieval
usecase/service

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL