README
¶
DeepSeek RAG Example
This comprehensive example demonstrates how to build a production-ready Retrieval-Augmented Generation (RAG) system using DeepSeek LLM and the GoAgent framework.
Overview
This example showcases:
- Basic RAG Setup: Initialize DeepSeek LLM and Qdrant vector store
- Document Management: Add and manage knowledge base documents
- Semantic Search: Retrieve relevant documents using vector similarity
- RAG Chain: Combine retrieval with generation for contextual answers
- Advanced Features:
- TopK configuration for controlling result count
- Score threshold filtering for quality control
- Multi-query retrieval for improved recall
- Document reranking strategies (MMR, Cross-Encoder, Rank Fusion)
- Custom prompt templates
Prerequisites
1. DeepSeek API Key
Get your API key from DeepSeek:
export DEEPSEEK_API_KEY="your-api-key-here"
2. Qdrant Vector Database
You have several options to run Qdrant:
Option A: Docker (Recommended)
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage:z \
qdrant/qdrant
Option B: Qdrant Cloud
- Sign up at Qdrant Cloud
- Create a cluster
- Get your connection URL and API key
- Set environment variables:
export QDRANT_URL="https://your-cluster.qdrant.io:6334"
export QDRANT_API_KEY="your-api-key"
Option C: Local Installation
# macOS
brew install qdrant
# Linux
wget https://github.com/qdrant/qdrant/releases/download/v1.7.0/qdrant-x86_64-unknown-linux-gnu.tar.gz
tar -xvf qdrant-x86_64-unknown-linux-gnu.tar.gz
./qdrant
3. Go Dependencies
Ensure you have Go 1.25.0 or later:
go mod download
Running the Example
Basic Execution
cd examples/rag
go run deepseek_rag_example.go
With Custom Qdrant URL
QDRANT_URL="localhost:6334" go run deepseek_rag_example.go
With Qdrant Cloud
QDRANT_URL="https://your-cluster.qdrant.io:6334" \
QDRANT_API_KEY="your-api-key" \
go run deepseek_rag_example.go
Example Output
=== Step 1: Setting up DeepSeek LLM Client ===
✓ DeepSeek client initialized successfully
=== Step 2: Setting up Qdrant Vector Store ===
✓ Qdrant vector store initialized successfully
=== Step 3: Adding Sample Documents ===
✓ Sample documents added successfully
=== Step 4: Creating RAG Retriever ===
✓ RAG retriever created successfully
=== Step 5: Creating RAG Chain ===
✓ RAG chain created successfully
=== Step 6: Basic RAG Query ===
Query: What is machine learning and how does it work?
Retrieving relevant documents and generating answer...
Answer:
Machine Learning is a subset of artificial intelligence that enables systems
to learn and improve from experience without being explicitly programmed...
[Detailed answer generated by DeepSeek based on retrieved documents]
Query completed in: 2.3s
=== Step 7: TopK Configuration ===
--- TopK = 2 ---
Retrieved 2 documents:
1. [Score: 0.8542] Machine Learning is a subset of artificial intelligence...
2. [Score: 0.7834] Deep Learning is a specialized subset of machine learning...
=== Step 8: Score Threshold Filtering ===
--- Score Threshold = 0.50 ---
Retrieved 3 documents (filtered by threshold):
1. [Score: 0.8542] Topic: machine_learning
2. [Score: 0.7834] Topic: deep_learning
3. [Score: 0.6123] Topic: neural_networks
=== Step 9: Multi-Query Retrieval ===
Original Query: How do neural networks learn?
Generating query variations and retrieving documents...
Retrieved 5 unique documents (merged from multiple queries)...
=== Step 10: Document Reranking ===
Query: What are the applications of artificial intelligence?
--- Original Ranking (by similarity score) ---
--- MMR Reranking (lambda=0.7) ---
--- Cross-Encoder Reranking ---
--- Rank Fusion (RRF) ---
=== Step 11: Advanced RAG with Custom Prompts ===
Query: Explain transformers in simple terms
Using custom educational prompt template...
Custom Prompt Response: [Educational explanation tailored for beginners]
=== RAG Demo Completed Successfully ===
Configuration Options
RAG Retriever Configuration
config := retrieval.RAGRetrieverConfig{
VectorStore: store, // Vector store instance
TopK: 4, // Number of documents to retrieve
ScoreThreshold: 0.3, // Minimum similarity score (0-1)
IncludeMetadata: true, // Include document metadata
MaxContentLength: 500, // Max characters per document
}
DeepSeek LLM Configuration
config := &llm.Config{
Provider: llm.ProviderDeepSeek,
APIKey: apiKey,
Model: "deepseek-chat", // or "deepseek-coder"
Temperature: 0.7, // 0.0 = deterministic, 1.0 = creative
MaxTokens: 2000, // Max response length
Timeout: 60, // Request timeout in seconds
}
Qdrant Configuration
config := retrieval.QdrantConfig{
URL: "localhost:6334", // Qdrant server URL
APIKey: "", // Optional API key
CollectionName: "my_knowledge_base", // Collection name
VectorSize: 384, // Embedding dimension
Distance: "cosine", // cosine, euclidean, or dot
Embedder: embedder, // Embedding model
}
Features Demonstrated
1. Basic RAG Query
Retrieves relevant documents and generates contextual answers:
ragChain := retrieval.NewRAGChain(ragRetriever, llmClient)
answer, err := ragChain.Run(ctx, "What is machine learning?")
2. TopK Configuration
Control the number of retrieved documents:
retriever.SetTopK(4) // Retrieve top 4 documents
docs, err := retriever.Retrieve(ctx, query)
3. Score Threshold Filtering
Filter low-quality results:
retriever.SetScoreThreshold(0.5) // Only keep scores >= 0.5
docs, err := retriever.Retrieve(ctx, query)
4. Multi-Query Retrieval
Generate query variations for improved recall:
multiQueryRetriever := retrieval.NewRAGMultiQueryRetriever(
baseRetriever,
3, // Generate 3 variations
llmClient, // Use LLM to generate variations
)
docs, err := multiQueryRetriever.Retrieve(ctx, query)
5. Document Reranking
MMR Reranking (Maximal Marginal Relevance)
Balances relevance and diversity:
mmrReranker := retrieval.NewMMRReranker(
0.7, // Lambda: 0.0 = diversity, 1.0 = relevance
4, // TopN results
)
reranked, err := mmrReranker.Rerank(ctx, query, docs)
Cross-Encoder Reranking
Uses a cross-encoder model for precise relevance scoring:
ceReranker := retrieval.NewCrossEncoderReranker(
"cross-encoder/ms-marco-MiniLM-L-6-v2",
4, // TopN results
)
reranked, err := ceReranker.Rerank(ctx, query, docs)
Rank Fusion
Combines multiple ranking strategies:
rankFusion := retrieval.NewRankFusion("rrf") // Reciprocal Rank Fusion
fusedDocs := rankFusion.Fuse([][]*retrieval.Document{
ranking1,
ranking2,
ranking3,
})
6. Custom Prompt Templates
Create domain-specific prompt templates:
customTemplate := `You are an AI tutor helping students.
Reference Materials:
{documents}
Question: {query}
Provide a beginner-friendly explanation.`
formattedPrompt, err := retriever.RetrieveAndFormat(ctx, query, customTemplate)
Knowledge Base Documents
The example includes sample documents about:
- Machine Learning fundamentals
- Deep Learning concepts
- Natural Language Processing
- Computer Vision
- Reinforcement Learning
- Neural Networks architecture
- Transformers architecture
- RAG techniques
Production Considerations
1. Embedding Models
For production use, replace the SimpleEmbedder with a real embedding model:
// Using OpenAI embeddings
embedder := openai.NewEmbedder(openaiClient, "text-embedding-3-small")
// Or using other models via LangChain
embedder := langchain.NewEmbedder("sentence-transformers/all-MiniLM-L6-v2")
2. Scalability
- Use Qdrant Cloud for production workloads
- Enable horizontal scaling with multiple Qdrant nodes
- Implement caching for frequently accessed documents
- Use batch operations for bulk document uploads
3. Error Handling
// Implement retry logic
maxRetries := 3
for i := 0; i < maxRetries; i++ {
docs, err := retriever.Retrieve(ctx, query)
if err == nil {
break
}
time.Sleep(time.Second * time.Duration(i+1))
}
// Handle context cancellation
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
4. Monitoring
// Add observability
import "github.com/kart-io/goagent/observability"
// Enable tracing
tracer := observability.NewTracer("rag-service")
ctx = tracer.StartSpan(ctx, "rag_query")
defer tracer.EndSpan(ctx)
// Track metrics
metrics.RecordRetrievalLatency(elapsed)
metrics.RecordDocumentCount(len(docs))
5. Document Chunking
For large documents, implement chunking:
// Split documents into chunks
chunker := document.NewRecursiveTextSplitter(
1000, // Chunk size
200, // Overlap
)
chunks := chunker.Split(largeDocument)
// Add chunks to vector store
for _, chunk := range chunks {
store.AddDocuments(ctx, []*retrieval.Document{chunk})
}
Troubleshooting
"DEEPSEEK_API_KEY not set"
Make sure you've exported your API key:
export DEEPSEEK_API_KEY="your-key"
"Failed to connect to Qdrant"
- Check if Qdrant is running:
curl http://localhost:6333 - Verify the URL in environment variable:
echo $QDRANT_URL - Check Docker logs:
docker logs <container-id>
"Collection already exists"
The example creates a collection named goagent_rag_demo. If it exists from a previous run:
# Delete the collection via Qdrant API
curl -X DELETE http://localhost:6333/collections/goagent_rag_demo
# Or use a different collection name
export QDRANT_COLLECTION="my_new_collection"
Low Retrieval Quality
- Increase TopK: Retrieve more documents
- Adjust threshold: Lower the score threshold
- Use multi-query: Enable query variations
- Apply reranking: Use MMR or cross-encoder
- Improve embeddings: Use better embedding models
Performance Tuning
Query Optimization
// Use smaller TopK for faster retrieval
retriever.SetTopK(3)
// Set timeout for LLM calls
ctx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
// Use streaming for large responses
stream, err := llmClient.Stream(ctx, prompt)
Batch Operations
// Add documents in batches
batchSize := 100
for i := 0; i < len(allDocs); i += batchSize {
end := min(i+batchSize, len(allDocs))
batch := allDocs[i:end]
store.AddDocuments(ctx, batch)
}
Advanced Use Cases
1. Conversational RAG
Maintain conversation history:
conversationHistory := []llm.Message{}
for {
// Add user message
conversationHistory = append(conversationHistory,
llm.UserMessage(userQuery))
// Retrieve context
context := retriever.RetrieveWithContext(ctx, userQuery)
// Add context as system message
conversationHistory = append([]llm.Message{
llm.SystemMessage(context),
}, conversationHistory...)
// Generate response
response, _ := llmClient.Chat(ctx, conversationHistory)
// Add assistant response
conversationHistory = append(conversationHistory,
llm.AssistantMessage(response.Content))
}
2. Multi-Document RAG
Retrieve from multiple collections:
// Create multiple stores for different domains
techStore := setupQdrantStore(ctx, "tech_docs")
legalStore := setupQdrantStore(ctx, "legal_docs")
// Retrieve from both
techDocs, _ := techStore.Search(ctx, query, 3)
legalDocs, _ := legalStore.Search(ctx, query, 3)
// Merge and rerank
allDocs := append(techDocs, legalDocs...)
reranked, _ := reranker.Rerank(ctx, query, allDocs)
3. Hybrid Search
Combine vector search with keyword search:
// Vector search
vectorDocs, _ := vectorStore.Search(ctx, query, 5)
// Keyword search (BM25)
keywordDocs, _ := keywordIndex.Search(query, 5)
// Fuse results
fusion := retrieval.NewRankFusion("rrf")
finalDocs := fusion.Fuse([][]*retrieval.Document{
vectorDocs,
keywordDocs,
})
Resources
License
This example is part of the GoAgent project and is licensed under the same terms.
Support
For issues or questions:
- Open an issue on GitHub
- Check the documentation
- Review other examples
Documentation
¶
There is no documentation for this package.