goframe

module

v0.36.3 Latest Latest Go to latest Published: Mar 21, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

GoFrame

A Go RAG library built for code understanding. GoFrame handles the plumbing — document loading, AST-based chunking, embedding, hybrid vector search, and dependency graph traversal — so you can focus on building applications on top of it.

It is the library underlying Code-Warden, a self-hosted GitHub App that performs context-aware code reviews using a 6-stage RAG pipeline.

What Makes It Different

Most RAG libraries treat code as plain text. GoFrame understands it:

AST-aware chunking — splits at function and type boundaries, not arbitrary character counts; file-level metadata (package name, imports) propagates to every chunk
Multi-language parsing — Go, TypeScript/TSX, Markdown, JSON, YAML, Terraform, Protobuf, PDF, CSV, HTML, RSS; each parser extracts language-specific metadata
Dependency graph traversal — DependencyRetriever answers "who imports this package?" and "what does this file depend on?" using metadata stored at index time
Code-aware sparse tokenization — splits camelCase (processPayment → process, payment) and acronyms (XMLParser → xml, parser) before hashing into a sparse vector; hybrid search combines this with dense embeddings for better identifier recall
Test linkage — indexes test files with tested_symbols metadata so tests can be retrieved by the symbols they exercise, not just by text similarity

Quick Start

package main

import (
    "context"
    "fmt"

    "github.com/sevigo/goframe/chains"
    "github.com/sevigo/goframe/embeddings"
    "github.com/sevigo/goframe/llms/ollama"
    "github.com/sevigo/goframe/schema"
    "github.com/sevigo/goframe/vectorstores"
    "github.com/sevigo/goframe/vectorstores/qdrant"
)

func main() {
    ctx := context.Background()

    llm, _ := ollama.New(ollama.WithModel("qwen2.5-coder:7b"))
    embedder, _ := embeddings.NewEmbedder(llm)
    store, _ := qdrant.New(
        qdrant.WithCollectionName("my-repo"),
        qdrant.WithEmbedder(embedder),
    )

    docs := []schema.Document{
        schema.NewDocument("func getUserByID(id string) (*User, error) { ... }", map[string]any{
            "source":       "internal/users/service.go",
            "chunk_type":   "definition",
            "identifier":   "getUserByID",
            "package_name": "users",
        }),
    }
    store.AddDocuments(ctx, docs)

    retriever := vectorstores.ToRetriever(store, 5)
    chain, _ := chains.NewRetrievalQA(retriever, llm)
    answer, _ := chain.Call(ctx, "How does user lookup work?")
    fmt.Println(answer)
}

Installation

go get github.com/sevigo/goframe@latest

Requires Go 1.21+, Ollama for local LLMs and embeddings, and Qdrant for vector storage.

Core Pipeline

[GitLoader] → [ParserRegistry] → [CodeAwareSplitter] → [Embedder + SparseProvider] → [Qdrant]
   (load)      (AST metadata)       (chunk at          (dense + code sparse          (store with
               (imports, pkg)        boundaries)        vectors per chunk)             metadata)

At query time:

[Query] → [SparseProvider] → [SimilaritySearch with sparse+dense] → [Reranker] → [LLM Chain]

Key Packages

Package	What it does
`schema/`	Core types: `Document`, `SparseVector`, `Retriever`, `Reranker`
`llms/ollama`	Ollama LLM client — chat, completion, streaming
`embeddings/`	`Embedder` interface + batch embedding with retry
`embeddings/sparse/`	Sparse vector generation — default BoW provider, pluggable
`embeddings/sparse/code`	Code-aware sparse tokenizer (camelCase/snake_case splitting + FNV32a)
`vectorstores/qdrant`	Qdrant store — hybrid search, metadata filtering, binary quantization
`vectorstores/`	`DependencyRetriever`, `DefinitionRetriever`, `ToRetriever` helpers
`parsers/`	Language parser plugins — Go, TypeScript, Markdown, JSON, YAML, Terraform, Protobuf, PDF, CSV, HTML, RSS
`textsplitter/`	`CodeAwareTextSplitter` — AST-boundary splitting with metadata propagation
`documentloaders/`	`GitLoader` — streaming file ingestion from git repos with metadata
`chains/`	`LLMChain[T]`, `RetrievalQA`, `MapReduceChain`
`agent/`	OpenCode agent SDK — session management, MCP server config, streaming

Examples

Hybrid Search (Dense + Sparse)

import (
    "github.com/sevigo/goframe/embeddings/sparse"
    sparsecode "github.com/sevigo/goframe/embeddings/sparse/code"
    "github.com/sevigo/goframe/vectorstores/qdrant"
    "github.com/sevigo/goframe/vectorstores"
)

// Register code-aware sparse tokenizer (once at startup)
sparse.RegisterProvider(sparsecode.NewCodeSparseProvider())

// Create store with sparse vector support
store, _ := qdrant.New(
    qdrant.WithCollectionName("code"),
    qdrant.WithEmbedder(embedder),
    qdrant.WithSparseVector("code_sparse"),
)

// Index with sparse vectors
doc := schema.NewDocument("func getUserByID(id string) (*User, error)", nil)
doc.Sparse, _ = sparse.GenerateSparseVector(ctx, doc.PageContent)
store.AddDocuments(ctx, []schema.Document{doc})

// Hybrid search
sparseQuery, _ := sparse.GenerateSparseVector(ctx, "getUserByID")
results, _ := store.SimilaritySearch(ctx, "getUserByID", 5,
    vectorstores.WithSparseQuery(sparseQuery),
)

Dependency Graph Traversal

retriever, _ := vectorstores.NewDependencyRetriever(store)

// Who imports this package? (impact analysis)
network, _ := retriever.GetContextNetwork(ctx, "github.com/my/project/pkg/users", nil)
for _, dependent := range network.Dependents {
    fmt.Println("Affected file:", dependent.Metadata["source"])
}

// What does this file depend on?
network, _ = retriever.GetContextNetwork(ctx, "github.com/my/project/pkg/users",
    []string{"context", "database/sql"})
for _, dep := range network.Dependencies {
    fmt.Println("Dependency:", dep.Metadata["source"])
}

Git Repository Ingestion

import (
    "github.com/sevigo/goframe/documentloaders"
    "github.com/sevigo/goframe/parsers"
    "github.com/sevigo/goframe/textsplitter"
)

registry := parsers.NewRegistry(logger)
splitter := textsplitter.NewCodeAwareTextSplitter(registry,
    textsplitter.WithChunkSize(800),
    textsplitter.WithChunkOverlap(100),
)

loader, _ := documentloaders.NewGit(repoPath, registry,
    documentloaders.WithSplitter(splitter),
    documentloaders.WithBatchSize(50),
)

// Stream directly into vector store
loader.LoadAndProcessStream(ctx, func(ctx context.Context, batch []schema.Document) error {
    for i := range batch {
        batch[i].Sparse, _ = sparse.GenerateSparseVector(ctx, batch[i].PageContent)
    }
    _, err := store.AddDocuments(ctx, batch)
    return err
})

Multi-Model Consensus (MapReduceChain)

import "github.com/sevigo/goframe/chains"

models := []llms.Model{model1, model2, model3}
chain, _ := chains.NewMapReduceChain(models, reducerModel, prompt,
    chains.WithMaxParallel(3),
    chains.WithQuorum(0.66), // Proceed when 66% of models finish
)
result, _ := chain.Call(ctx, map[string]any{"context": ctx, "diff": diff})

Exact Definition Lookup

defRetriever, _ := vectorstores.NewDefinitionRetriever(store)

// Fast path: exact filter on identifier + chunk_type
exactDocs, err := store.SimilaritySearch(ctx, symbol, 1,
    vectorstores.WithFilters(map[string]any{
        "chunk_type": "definition",
        "identifier": symbol,
    }),
)

// Semantic fallback
if err != nil || len(exactDocs) == 0 {
    exactDocs, _ = defRetriever.GetDefinition(ctx, symbol)
}

Running the Example

go run ./examples/qdrant-ultimate-rag/main.go

The example demonstrates full repository ingestion (Go + TypeScript), hybrid search, and dependency graph verification against a real Qdrant instance.

Sparse Vector Provider

The default sparse provider uses a pretrained BoW tokenizer. For source code, register the code-aware provider instead:

import (
    "github.com/sevigo/goframe/embeddings/sparse"
    sparsecode "github.com/sevigo/goframe/embeddings/sparse/code"
)

// Call once at application startup
sparse.RegisterProvider(sparsecode.NewCodeSparseProvider())

The code provider:

Splits camelCase: processPayment → ["process", "payment"]
Splits acronyms: XMLParser → ["xml", "parser"], HTTPClient → ["http", "client"]
Handles mixed: get_HTTPClient → ["get", "http", "client"]
Filters Go/JS/Python/Rust language keywords
Hashes via FNV32a into 50,000-dimension sparse space with L2 normalization

How to Contribute

make lint      # Run linters
make test      # Run all tests
make pre-push  # lint + test combined

For a single package:

go test ./vectorstores/qdrant/... -v
go test -run TestStoreSimilaritySearch ./vectorstores/qdrant/...

See TODO.md for what's next.

License

MIT — see LICENSE.

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
agent Package agent provides an abstraction layer for managing communication with OpenCode in agent mode, with a focus on MCP (Model Context Protocol) server configuration.	Package agent provides an abstraction layer for managing communication with OpenCode in agent mode, with a focus on MCP (Model Context Protocol) server configuration.
examples/basic command Example demonstrates basic usage of the agent package with MCP servers	Example demonstrates basic usage of the agent package with MCP servers
examples/feedback-loop command
examples/mcp-feedback command
chains Package chains provides composable chains for LLM workflows.	Package chains provides composable chains for LLM workflows.
contextpacker Package contextpacker provides utilities for packing and optimizing context for LLM workflows.	Package contextpacker provides utilities for packing and optimizing context for LLM workflows.
fake Package fake provides mock tokenizers for testing context packing.	Package fake provides mock tokenizers for testing context packing.
documentloaders Package documentloaders provides utilities for loading documents from various sources.	Package documentloaders provides utilities for loading documents from various sources.
embeddings Package embeddings provides interfaces and utilities for text embedding.	Package embeddings provides interfaces and utilities for text embedding.
fastapi Package fastapi provides an embedder client for remote FastAPI embedding servers.	Package fastapi provides an embedder client for remote FastAPI embedding servers.
sparse Package sparse provides utilities for generating sparse vectors for hybrid search.	Package sparse provides utilities for generating sparse vectors for hybrid search.
sparse/code Package code provides code-aware sparse vector generation for source code.	Package code provides code-aware sparse vector generation for source code.
examples
gemini-chat-example command
html-parser-demo command
hybrid-search command
kokoro-captioned-dialogue command Package main demonstrates captioned dialogue synthesis with timestamp-based timing.	Package main demonstrates captioned dialogue synthesis with timestamp-based timing.
kokoro-dialogue command
kokoro-streaming command
kokoro-tts command
ollama-chat-example command
ollama-git-terraform-qa command
ollama-qdrant-vectorstore-example command
ollama-retrieval-qa command
openai-dialogue command
openai-tts command Package main demonstrates OpenAI Text-to-Speech API usage.	Package main demonstrates OpenAI Text-to-Speech API usage.
qdrant-rerank command
qdrant-ultimate-rag command
rag-evaluation-msmarco-csv command
rag-with-validation command
rss-ingestion command
rss-with-html-parser command
gitutil Package gitutil provides utilities for working with Git repositories.	Package gitutil provides utilities for working with Git repositories.
httpclient Package httpclient provides a shared HTTP client with sensible defaults for connection pooling, timeouts, and retry logic.	Package httpclient provides a shared HTTP client with sensible defaults for connection pooling, timeouts, and retry logic.
llms Package llms provides interfaces and utilities for LLM providers.	Package llms provides interfaces and utilities for LLM providers.
fake Package fake provides mock LLM implementations for testing.	Package fake provides mock LLM implementations for testing.
gemini Package gemini provides LLM and embedding support for Google's Gemini models.	Package gemini provides LLM and embedding support for Google's Gemini models.
ollama Package ollama provides a client for interacting with Ollama's local LLM server.	Package ollama provides a client for interacting with Ollama's local LLM server.
output Package output provides parsers for structured LLM output.	Package output provides parsers for structured LLM output.
parsers Package parsers provides a registry for language-specific parser plugins.	Package parsers provides a registry for language-specific parser plugins.
csv Package csv parses CSV files with header support.	Package csv parses CSV files with header support.
golang Package golang parses Go source files for code analysis and documentation extraction.	Package golang parses Go source files for code analysis and documentation extraction.
html Package html provides an HTML parser plugin for transforming HTML content into clean Markdown suitable for LLM consumption and RAG applications.	Package html provides an HTML parser plugin for transforming HTML content into clean Markdown suitable for LLM consumption and RAG applications.
json Package json parses JSON files for structure extraction.	Package json parses JSON files for structure extraction.
markdown Package markdown parses Markdown files for structured content extraction.	Package markdown parses Markdown files for structured content extraction.
pdf Package pdf parses PDF documents for text extraction.	Package pdf parses PDF documents for text extraction.
protobuf Package protobuf parses Protocol Buffer definition files.	Package protobuf parses Protocol Buffer definition files.
terraform Package terraform parses Terraform configuration files.	Package terraform parses Terraform configuration files.
testing Package testing provides test utilities for parser development.	Package testing provides test utilities for parser development.
text Package text parses plain text files for content extraction.	Package text parses plain text files for content extraction.
typescript Package typescript parses TypeScript source files for code analysis.	Package typescript parses TypeScript source files for code analysis.
yaml Package yaml parses YAML configuration files.	Package yaml parses YAML configuration files.
prompts Package prompts provides prompt templates for LLM interactions.	Package prompts provides prompt templates for LLM interactions.
schema Package schema defines core data structures and interfaces used throughout the goframe library.	Package schema defines core data structures and interfaces used throughout the goframe library.
fake Package fake provides mock implementations for schema interfaces in tests.	Package fake provides mock implementations for schema interfaces in tests.
textsplitter Package textsplitter provides text splitting utilities for chunking documents.	Package textsplitter provides text splitting utilities for chunking documents.
vectorstores Package vectorstores provides interfaces and implementations for vector databases.	Package vectorstores provides interfaces and implementations for vector databases.
fake Package fake provides an in-memory vector store for testing.	Package fake provides an in-memory vector store for testing.
qdrant Package qdrant provides a Qdrant vector database integration for GoFrame.	Package qdrant provides a Qdrant vector database integration for GoFrame.
voice Package voice provides interfaces and types for Text-to-Speech synthesis.	Package voice provides interfaces and types for Text-to-Speech synthesis.
elevenlabs Package elevenlabs provides text-to-speech synthesis using the ElevenLabs API.	Package elevenlabs provides text-to-speech synthesis using the ElevenLabs API.
openai Package openai provides an OpenAI-compatible Text-to-Speech implementation.	Package openai provides an OpenAI-compatible Text-to-Speech implementation.