goframe

module
v0.36.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 21, 2026 License: MIT

README

GoFrame

Go Reference Go Report Card

A Go RAG library built for code understanding. GoFrame handles the plumbing — document loading, AST-based chunking, embedding, hybrid vector search, and dependency graph traversal — so you can focus on building applications on top of it.

It is the library underlying Code-Warden, a self-hosted GitHub App that performs context-aware code reviews using a 6-stage RAG pipeline.


What Makes It Different

Most RAG libraries treat code as plain text. GoFrame understands it:

  • AST-aware chunking — splits at function and type boundaries, not arbitrary character counts; file-level metadata (package name, imports) propagates to every chunk
  • Multi-language parsing — Go, TypeScript/TSX, Markdown, JSON, YAML, Terraform, Protobuf, PDF, CSV, HTML, RSS; each parser extracts language-specific metadata
  • Dependency graph traversalDependencyRetriever answers "who imports this package?" and "what does this file depend on?" using metadata stored at index time
  • Code-aware sparse tokenization — splits camelCase (processPaymentprocess, payment) and acronyms (XMLParserxml, parser) before hashing into a sparse vector; hybrid search combines this with dense embeddings for better identifier recall
  • Test linkage — indexes test files with tested_symbols metadata so tests can be retrieved by the symbols they exercise, not just by text similarity

Quick Start

package main

import (
    "context"
    "fmt"

    "github.com/sevigo/goframe/chains"
    "github.com/sevigo/goframe/embeddings"
    "github.com/sevigo/goframe/llms/ollama"
    "github.com/sevigo/goframe/schema"
    "github.com/sevigo/goframe/vectorstores"
    "github.com/sevigo/goframe/vectorstores/qdrant"
)

func main() {
    ctx := context.Background()

    llm, _ := ollama.New(ollama.WithModel("qwen2.5-coder:7b"))
    embedder, _ := embeddings.NewEmbedder(llm)
    store, _ := qdrant.New(
        qdrant.WithCollectionName("my-repo"),
        qdrant.WithEmbedder(embedder),
    )

    docs := []schema.Document{
        schema.NewDocument("func getUserByID(id string) (*User, error) { ... }", map[string]any{
            "source":       "internal/users/service.go",
            "chunk_type":   "definition",
            "identifier":   "getUserByID",
            "package_name": "users",
        }),
    }
    store.AddDocuments(ctx, docs)

    retriever := vectorstores.ToRetriever(store, 5)
    chain, _ := chains.NewRetrievalQA(retriever, llm)
    answer, _ := chain.Call(ctx, "How does user lookup work?")
    fmt.Println(answer)
}

Installation

go get github.com/sevigo/goframe@latest

Requires Go 1.21+, Ollama for local LLMs and embeddings, and Qdrant for vector storage.


Core Pipeline

[GitLoader] → [ParserRegistry] → [CodeAwareSplitter] → [Embedder + SparseProvider] → [Qdrant]
   (load)      (AST metadata)       (chunk at          (dense + code sparse          (store with
               (imports, pkg)        boundaries)        vectors per chunk)             metadata)

At query time:

[Query] → [SparseProvider] → [SimilaritySearch with sparse+dense] → [Reranker] → [LLM Chain]

Key Packages

Package What it does
schema/ Core types: Document, SparseVector, Retriever, Reranker
llms/ollama Ollama LLM client — chat, completion, streaming
embeddings/ Embedder interface + batch embedding with retry
embeddings/sparse/ Sparse vector generation — default BoW provider, pluggable
embeddings/sparse/code Code-aware sparse tokenizer (camelCase/snake_case splitting + FNV32a)
vectorstores/qdrant Qdrant store — hybrid search, metadata filtering, binary quantization
vectorstores/ DependencyRetriever, DefinitionRetriever, ToRetriever helpers
parsers/ Language parser plugins — Go, TypeScript, Markdown, JSON, YAML, Terraform, Protobuf, PDF, CSV, HTML, RSS
textsplitter/ CodeAwareTextSplitter — AST-boundary splitting with metadata propagation
documentloaders/ GitLoader — streaming file ingestion from git repos with metadata
chains/ LLMChain[T], RetrievalQA, MapReduceChain
agent/ OpenCode agent SDK — session management, MCP server config, streaming

Examples

Hybrid Search (Dense + Sparse)
import (
    "github.com/sevigo/goframe/embeddings/sparse"
    sparsecode "github.com/sevigo/goframe/embeddings/sparse/code"
    "github.com/sevigo/goframe/vectorstores/qdrant"
    "github.com/sevigo/goframe/vectorstores"
)

// Register code-aware sparse tokenizer (once at startup)
sparse.RegisterProvider(sparsecode.NewCodeSparseProvider())

// Create store with sparse vector support
store, _ := qdrant.New(
    qdrant.WithCollectionName("code"),
    qdrant.WithEmbedder(embedder),
    qdrant.WithSparseVector("code_sparse"),
)

// Index with sparse vectors
doc := schema.NewDocument("func getUserByID(id string) (*User, error)", nil)
doc.Sparse, _ = sparse.GenerateSparseVector(ctx, doc.PageContent)
store.AddDocuments(ctx, []schema.Document{doc})

// Hybrid search
sparseQuery, _ := sparse.GenerateSparseVector(ctx, "getUserByID")
results, _ := store.SimilaritySearch(ctx, "getUserByID", 5,
    vectorstores.WithSparseQuery(sparseQuery),
)
Dependency Graph Traversal
retriever, _ := vectorstores.NewDependencyRetriever(store)

// Who imports this package? (impact analysis)
network, _ := retriever.GetContextNetwork(ctx, "github.com/my/project/pkg/users", nil)
for _, dependent := range network.Dependents {
    fmt.Println("Affected file:", dependent.Metadata["source"])
}

// What does this file depend on?
network, _ = retriever.GetContextNetwork(ctx, "github.com/my/project/pkg/users",
    []string{"context", "database/sql"})
for _, dep := range network.Dependencies {
    fmt.Println("Dependency:", dep.Metadata["source"])
}
Git Repository Ingestion
import (
    "github.com/sevigo/goframe/documentloaders"
    "github.com/sevigo/goframe/parsers"
    "github.com/sevigo/goframe/textsplitter"
)

registry := parsers.NewRegistry(logger)
splitter := textsplitter.NewCodeAwareTextSplitter(registry,
    textsplitter.WithChunkSize(800),
    textsplitter.WithChunkOverlap(100),
)

loader, _ := documentloaders.NewGit(repoPath, registry,
    documentloaders.WithSplitter(splitter),
    documentloaders.WithBatchSize(50),
)

// Stream directly into vector store
loader.LoadAndProcessStream(ctx, func(ctx context.Context, batch []schema.Document) error {
    for i := range batch {
        batch[i].Sparse, _ = sparse.GenerateSparseVector(ctx, batch[i].PageContent)
    }
    _, err := store.AddDocuments(ctx, batch)
    return err
})
Multi-Model Consensus (MapReduceChain)
import "github.com/sevigo/goframe/chains"

models := []llms.Model{model1, model2, model3}
chain, _ := chains.NewMapReduceChain(models, reducerModel, prompt,
    chains.WithMaxParallel(3),
    chains.WithQuorum(0.66), // Proceed when 66% of models finish
)
result, _ := chain.Call(ctx, map[string]any{"context": ctx, "diff": diff})
Exact Definition Lookup
defRetriever, _ := vectorstores.NewDefinitionRetriever(store)

// Fast path: exact filter on identifier + chunk_type
exactDocs, err := store.SimilaritySearch(ctx, symbol, 1,
    vectorstores.WithFilters(map[string]any{
        "chunk_type": "definition",
        "identifier": symbol,
    }),
)

// Semantic fallback
if err != nil || len(exactDocs) == 0 {
    exactDocs, _ = defRetriever.GetDefinition(ctx, symbol)
}

Running the Example

go run ./examples/qdrant-ultimate-rag/main.go

The example demonstrates full repository ingestion (Go + TypeScript), hybrid search, and dependency graph verification against a real Qdrant instance.


Sparse Vector Provider

The default sparse provider uses a pretrained BoW tokenizer. For source code, register the code-aware provider instead:

import (
    "github.com/sevigo/goframe/embeddings/sparse"
    sparsecode "github.com/sevigo/goframe/embeddings/sparse/code"
)

// Call once at application startup
sparse.RegisterProvider(sparsecode.NewCodeSparseProvider())

The code provider:

  • Splits camelCase: processPayment["process", "payment"]
  • Splits acronyms: XMLParser["xml", "parser"], HTTPClient["http", "client"]
  • Handles mixed: get_HTTPClient["get", "http", "client"]
  • Filters Go/JS/Python/Rust language keywords
  • Hashes via FNV32a into 50,000-dimension sparse space with L2 normalization

How to Contribute

make lint      # Run linters
make test      # Run all tests
make pre-push  # lint + test combined

For a single package:

go test ./vectorstores/qdrant/... -v
go test -run TestStoreSimilaritySearch ./vectorstores/qdrant/...

See TODO.md for what's next.

License

MIT — see LICENSE.

Directories

Path Synopsis
Package agent provides an abstraction layer for managing communication with OpenCode in agent mode, with a focus on MCP (Model Context Protocol) server configuration.
Package agent provides an abstraction layer for managing communication with OpenCode in agent mode, with a focus on MCP (Model Context Protocol) server configuration.
examples/basic command
Example demonstrates basic usage of the agent package with MCP servers
Example demonstrates basic usage of the agent package with MCP servers
Package chains provides composable chains for LLM workflows.
Package chains provides composable chains for LLM workflows.
Package contextpacker provides utilities for packing and optimizing context for LLM workflows.
Package contextpacker provides utilities for packing and optimizing context for LLM workflows.
fake
Package fake provides mock tokenizers for testing context packing.
Package fake provides mock tokenizers for testing context packing.
Package documentloaders provides utilities for loading documents from various sources.
Package documentloaders provides utilities for loading documents from various sources.
Package embeddings provides interfaces and utilities for text embedding.
Package embeddings provides interfaces and utilities for text embedding.
fastapi
Package fastapi provides an embedder client for remote FastAPI embedding servers.
Package fastapi provides an embedder client for remote FastAPI embedding servers.
sparse
Package sparse provides utilities for generating sparse vectors for hybrid search.
Package sparse provides utilities for generating sparse vectors for hybrid search.
sparse/code
Package code provides code-aware sparse vector generation for source code.
Package code provides code-aware sparse vector generation for source code.
examples
hybrid-search command
kokoro-captioned-dialogue command
Package main demonstrates captioned dialogue synthesis with timestamp-based timing.
Package main demonstrates captioned dialogue synthesis with timestamp-based timing.
kokoro-dialogue command
kokoro-tts command
openai-dialogue command
openai-tts command
Package main demonstrates OpenAI Text-to-Speech API usage.
Package main demonstrates OpenAI Text-to-Speech API usage.
qdrant-rerank command
rss-ingestion command
Package gitutil provides utilities for working with Git repositories.
Package gitutil provides utilities for working with Git repositories.
Package httpclient provides a shared HTTP client with sensible defaults for connection pooling, timeouts, and retry logic.
Package httpclient provides a shared HTTP client with sensible defaults for connection pooling, timeouts, and retry logic.
Package llms provides interfaces and utilities for LLM providers.
Package llms provides interfaces and utilities for LLM providers.
fake
Package fake provides mock LLM implementations for testing.
Package fake provides mock LLM implementations for testing.
gemini
Package gemini provides LLM and embedding support for Google's Gemini models.
Package gemini provides LLM and embedding support for Google's Gemini models.
ollama
Package ollama provides a client for interacting with Ollama's local LLM server.
Package ollama provides a client for interacting with Ollama's local LLM server.
Package output provides parsers for structured LLM output.
Package output provides parsers for structured LLM output.
Package parsers provides a registry for language-specific parser plugins.
Package parsers provides a registry for language-specific parser plugins.
csv
Package csv parses CSV files with header support.
Package csv parses CSV files with header support.
golang
Package golang parses Go source files for code analysis and documentation extraction.
Package golang parses Go source files for code analysis and documentation extraction.
html
Package html provides an HTML parser plugin for transforming HTML content into clean Markdown suitable for LLM consumption and RAG applications.
Package html provides an HTML parser plugin for transforming HTML content into clean Markdown suitable for LLM consumption and RAG applications.
json
Package json parses JSON files for structure extraction.
Package json parses JSON files for structure extraction.
markdown
Package markdown parses Markdown files for structured content extraction.
Package markdown parses Markdown files for structured content extraction.
pdf
Package pdf parses PDF documents for text extraction.
Package pdf parses PDF documents for text extraction.
protobuf
Package protobuf parses Protocol Buffer definition files.
Package protobuf parses Protocol Buffer definition files.
terraform
Package terraform parses Terraform configuration files.
Package terraform parses Terraform configuration files.
testing
Package testing provides test utilities for parser development.
Package testing provides test utilities for parser development.
text
Package text parses plain text files for content extraction.
Package text parses plain text files for content extraction.
typescript
Package typescript parses TypeScript source files for code analysis.
Package typescript parses TypeScript source files for code analysis.
yaml
Package yaml parses YAML configuration files.
Package yaml parses YAML configuration files.
Package prompts provides prompt templates for LLM interactions.
Package prompts provides prompt templates for LLM interactions.
Package schema defines core data structures and interfaces used throughout the goframe library.
Package schema defines core data structures and interfaces used throughout the goframe library.
fake
Package fake provides mock implementations for schema interfaces in tests.
Package fake provides mock implementations for schema interfaces in tests.
Package textsplitter provides text splitting utilities for chunking documents.
Package textsplitter provides text splitting utilities for chunking documents.
Package vectorstores provides interfaces and implementations for vector databases.
Package vectorstores provides interfaces and implementations for vector databases.
fake
Package fake provides an in-memory vector store for testing.
Package fake provides an in-memory vector store for testing.
qdrant
Package qdrant provides a Qdrant vector database integration for GoFrame.
Package qdrant provides a Qdrant vector database integration for GoFrame.
Package voice provides interfaces and types for Text-to-Speech synthesis.
Package voice provides interfaces and types for Text-to-Speech synthesis.
elevenlabs
Package elevenlabs provides text-to-speech synthesis using the ElevenLabs API.
Package elevenlabs provides text-to-speech synthesis using the ElevenLabs API.
openai
Package openai provides an OpenAI-compatible Text-to-Speech implementation.
Package openai provides an OpenAI-compatible Text-to-Speech implementation.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL