rag

command
v0.4.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 28, 2025 License: Apache-2.0 Imports: 9 Imported by: 0

README

DeepSeek RAG Example

This comprehensive example demonstrates how to build a production-ready Retrieval-Augmented Generation (RAG) system using DeepSeek LLM and the GoAgent framework.

Overview

This example showcases:

  1. Basic RAG Setup: Initialize DeepSeek LLM and Qdrant vector store
  2. Document Management: Add and manage knowledge base documents
  3. Semantic Search: Retrieve relevant documents using vector similarity
  4. RAG Chain: Combine retrieval with generation for contextual answers
  5. Advanced Features:
    • TopK configuration for controlling result count
    • Score threshold filtering for quality control
    • Multi-query retrieval for improved recall
    • Document reranking strategies (MMR, Cross-Encoder, Rank Fusion)
    • Custom prompt templates

Prerequisites

1. DeepSeek API Key

Get your API key from DeepSeek:

export DEEPSEEK_API_KEY="your-api-key-here"
2. Qdrant Vector Database

You have several options to run Qdrant:

docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage:z \
    qdrant/qdrant
Option B: Qdrant Cloud
  1. Sign up at Qdrant Cloud
  2. Create a cluster
  3. Get your connection URL and API key
  4. Set environment variables:
export QDRANT_URL="https://your-cluster.qdrant.io:6334"
export QDRANT_API_KEY="your-api-key"
Option C: Local Installation
# macOS
brew install qdrant

# Linux
wget https://github.com/qdrant/qdrant/releases/download/v1.7.0/qdrant-x86_64-unknown-linux-gnu.tar.gz
tar -xvf qdrant-x86_64-unknown-linux-gnu.tar.gz
./qdrant
3. Go Dependencies

Ensure you have Go 1.25.0 or later:

go mod download

Running the Example

Basic Execution
cd examples/rag
go run deepseek_rag_example.go
With Custom Qdrant URL
QDRANT_URL="localhost:6334" go run deepseek_rag_example.go
With Qdrant Cloud
QDRANT_URL="https://your-cluster.qdrant.io:6334" \
QDRANT_API_KEY="your-api-key" \
go run deepseek_rag_example.go

Example Output

=== Step 1: Setting up DeepSeek LLM Client ===
✓ DeepSeek client initialized successfully

=== Step 2: Setting up Qdrant Vector Store ===
✓ Qdrant vector store initialized successfully

=== Step 3: Adding Sample Documents ===
✓ Sample documents added successfully

=== Step 4: Creating RAG Retriever ===
✓ RAG retriever created successfully

=== Step 5: Creating RAG Chain ===
✓ RAG chain created successfully

=== Step 6: Basic RAG Query ===
Query: What is machine learning and how does it work?
Retrieving relevant documents and generating answer...

Answer:
Machine Learning is a subset of artificial intelligence that enables systems
to learn and improve from experience without being explicitly programmed...
[Detailed answer generated by DeepSeek based on retrieved documents]

Query completed in: 2.3s

=== Step 7: TopK Configuration ===
--- TopK = 2 ---
Retrieved 2 documents:
1. [Score: 0.8542] Machine Learning is a subset of artificial intelligence...
2. [Score: 0.7834] Deep Learning is a specialized subset of machine learning...

=== Step 8: Score Threshold Filtering ===
--- Score Threshold = 0.50 ---
Retrieved 3 documents (filtered by threshold):
1. [Score: 0.8542] Topic: machine_learning
2. [Score: 0.7834] Topic: deep_learning
3. [Score: 0.6123] Topic: neural_networks

=== Step 9: Multi-Query Retrieval ===
Original Query: How do neural networks learn?
Generating query variations and retrieving documents...
Retrieved 5 unique documents (merged from multiple queries)...

=== Step 10: Document Reranking ===
Query: What are the applications of artificial intelligence?
--- Original Ranking (by similarity score) ---
--- MMR Reranking (lambda=0.7) ---
--- Cross-Encoder Reranking ---
--- Rank Fusion (RRF) ---

=== Step 11: Advanced RAG with Custom Prompts ===
Query: Explain transformers in simple terms
Using custom educational prompt template...
Custom Prompt Response: [Educational explanation tailored for beginners]

=== RAG Demo Completed Successfully ===

Configuration Options

RAG Retriever Configuration
config := retrieval.RAGRetrieverConfig{
    VectorStore:      store,        // Vector store instance
    TopK:             4,             // Number of documents to retrieve
    ScoreThreshold:   0.3,           // Minimum similarity score (0-1)
    IncludeMetadata:  true,          // Include document metadata
    MaxContentLength: 500,           // Max characters per document
}
DeepSeek LLM Configuration
config := &llm.Config{
    Provider:    llm.ProviderDeepSeek,
    APIKey:      apiKey,
    Model:       "deepseek-chat",    // or "deepseek-coder"
    Temperature: 0.7,                 // 0.0 = deterministic, 1.0 = creative
    MaxTokens:   2000,                // Max response length
    Timeout:     60,                  // Request timeout in seconds
}
Qdrant Configuration
config := retrieval.QdrantConfig{
    URL:            "localhost:6334",      // Qdrant server URL
    APIKey:         "",                    // Optional API key
    CollectionName: "my_knowledge_base",   // Collection name
    VectorSize:     384,                   // Embedding dimension
    Distance:       "cosine",              // cosine, euclidean, or dot
    Embedder:       embedder,              // Embedding model
}

Features Demonstrated

1. Basic RAG Query

Retrieves relevant documents and generates contextual answers:

ragChain := retrieval.NewRAGChain(ragRetriever, llmClient)
answer, err := ragChain.Run(ctx, "What is machine learning?")
2. TopK Configuration

Control the number of retrieved documents:

retriever.SetTopK(4)  // Retrieve top 4 documents
docs, err := retriever.Retrieve(ctx, query)
3. Score Threshold Filtering

Filter low-quality results:

retriever.SetScoreThreshold(0.5)  // Only keep scores >= 0.5
docs, err := retriever.Retrieve(ctx, query)
4. Multi-Query Retrieval

Generate query variations for improved recall:

multiQueryRetriever := retrieval.NewRAGMultiQueryRetriever(
    baseRetriever,
    3,          // Generate 3 variations
    llmClient,  // Use LLM to generate variations
)
docs, err := multiQueryRetriever.Retrieve(ctx, query)
5. Document Reranking
MMR Reranking (Maximal Marginal Relevance)

Balances relevance and diversity:

mmrReranker := retrieval.NewMMRReranker(
    0.7,  // Lambda: 0.0 = diversity, 1.0 = relevance
    4,    // TopN results
)
reranked, err := mmrReranker.Rerank(ctx, query, docs)
Cross-Encoder Reranking

Uses a cross-encoder model for precise relevance scoring:

ceReranker := retrieval.NewCrossEncoderReranker(
    "cross-encoder/ms-marco-MiniLM-L-6-v2",
    4,  // TopN results
)
reranked, err := ceReranker.Rerank(ctx, query, docs)
Rank Fusion

Combines multiple ranking strategies:

rankFusion := retrieval.NewRankFusion("rrf")  // Reciprocal Rank Fusion
fusedDocs := rankFusion.Fuse([][]*retrieval.Document{
    ranking1,
    ranking2,
    ranking3,
})
6. Custom Prompt Templates

Create domain-specific prompt templates:

customTemplate := `You are an AI tutor helping students.

Reference Materials:
{documents}

Question: {query}

Provide a beginner-friendly explanation.`

formattedPrompt, err := retriever.RetrieveAndFormat(ctx, query, customTemplate)

Knowledge Base Documents

The example includes sample documents about:

  • Machine Learning fundamentals
  • Deep Learning concepts
  • Natural Language Processing
  • Computer Vision
  • Reinforcement Learning
  • Neural Networks architecture
  • Transformers architecture
  • RAG techniques

Production Considerations

1. Embedding Models

For production use, replace the SimpleEmbedder with a real embedding model:

// Using OpenAI embeddings
embedder := openai.NewEmbedder(openaiClient, "text-embedding-3-small")

// Or using other models via LangChain
embedder := langchain.NewEmbedder("sentence-transformers/all-MiniLM-L6-v2")
2. Scalability
  • Use Qdrant Cloud for production workloads
  • Enable horizontal scaling with multiple Qdrant nodes
  • Implement caching for frequently accessed documents
  • Use batch operations for bulk document uploads
3. Error Handling
// Implement retry logic
maxRetries := 3
for i := 0; i < maxRetries; i++ {
    docs, err := retriever.Retrieve(ctx, query)
    if err == nil {
        break
    }
    time.Sleep(time.Second * time.Duration(i+1))
}

// Handle context cancellation
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
4. Monitoring
// Add observability
import "github.com/kart-io/goagent/observability"

// Enable tracing
tracer := observability.NewTracer("rag-service")
ctx = tracer.StartSpan(ctx, "rag_query")
defer tracer.EndSpan(ctx)

// Track metrics
metrics.RecordRetrievalLatency(elapsed)
metrics.RecordDocumentCount(len(docs))
5. Document Chunking

For large documents, implement chunking:

// Split documents into chunks
chunker := document.NewRecursiveTextSplitter(
    1000,  // Chunk size
    200,   // Overlap
)
chunks := chunker.Split(largeDocument)

// Add chunks to vector store
for _, chunk := range chunks {
    store.AddDocuments(ctx, []*retrieval.Document{chunk})
}

Troubleshooting

"DEEPSEEK_API_KEY not set"

Make sure you've exported your API key:

export DEEPSEEK_API_KEY="your-key"
"Failed to connect to Qdrant"
  1. Check if Qdrant is running: curl http://localhost:6333
  2. Verify the URL in environment variable: echo $QDRANT_URL
  3. Check Docker logs: docker logs <container-id>
"Collection already exists"

The example creates a collection named goagent_rag_demo. If it exists from a previous run:

# Delete the collection via Qdrant API
curl -X DELETE http://localhost:6333/collections/goagent_rag_demo

# Or use a different collection name
export QDRANT_COLLECTION="my_new_collection"
Low Retrieval Quality
  1. Increase TopK: Retrieve more documents
  2. Adjust threshold: Lower the score threshold
  3. Use multi-query: Enable query variations
  4. Apply reranking: Use MMR or cross-encoder
  5. Improve embeddings: Use better embedding models

Performance Tuning

Query Optimization
// Use smaller TopK for faster retrieval
retriever.SetTopK(3)

// Set timeout for LLM calls
ctx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()

// Use streaming for large responses
stream, err := llmClient.Stream(ctx, prompt)
Batch Operations
// Add documents in batches
batchSize := 100
for i := 0; i < len(allDocs); i += batchSize {
    end := min(i+batchSize, len(allDocs))
    batch := allDocs[i:end]
    store.AddDocuments(ctx, batch)
}

Advanced Use Cases

1. Conversational RAG

Maintain conversation history:

conversationHistory := []llm.Message{}
for {
    // Add user message
    conversationHistory = append(conversationHistory,
        llm.UserMessage(userQuery))

    // Retrieve context
    context := retriever.RetrieveWithContext(ctx, userQuery)

    // Add context as system message
    conversationHistory = append([]llm.Message{
        llm.SystemMessage(context),
    }, conversationHistory...)

    // Generate response
    response, _ := llmClient.Chat(ctx, conversationHistory)

    // Add assistant response
    conversationHistory = append(conversationHistory,
        llm.AssistantMessage(response.Content))
}
2. Multi-Document RAG

Retrieve from multiple collections:

// Create multiple stores for different domains
techStore := setupQdrantStore(ctx, "tech_docs")
legalStore := setupQdrantStore(ctx, "legal_docs")

// Retrieve from both
techDocs, _ := techStore.Search(ctx, query, 3)
legalDocs, _ := legalStore.Search(ctx, query, 3)

// Merge and rerank
allDocs := append(techDocs, legalDocs...)
reranked, _ := reranker.Rerank(ctx, query, allDocs)

Combine vector search with keyword search:

// Vector search
vectorDocs, _ := vectorStore.Search(ctx, query, 5)

// Keyword search (BM25)
keywordDocs, _ := keywordIndex.Search(query, 5)

// Fuse results
fusion := retrieval.NewRankFusion("rrf")
finalDocs := fusion.Fuse([][]*retrieval.Document{
    vectorDocs,
    keywordDocs,
})

Resources

License

This example is part of the GoAgent project and is licensed under the same terms.

Support

For issues or questions:

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL