corpus

module
v0.0.0-...-7b76843 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 6, 2026 License: AGPL-3.0

README

Logo

Corpus

An easy to deploy RAG service.

⚠️ Disclaimer

Corpus is currently under active development and should be considered a work in progress. The API is in a preliminary stage and may not be stable. Please be aware that changes, including modifications, updates, or deprecations, can occur at any time without prior notice.

Features

  • OIDC authentication with email-based role mapping and access whitelist
  • Markdown-based chunking
  • Use full-text and vector-based indexes (via Bleve and SQLite Vec)
  • Web interface and REST API
  • Backup and restore via the REST API
  • CLI with abstract filesystem watching and auto-indexing (local, S3, FTP, SFTP, WebDAV, SMB...)

Getting started

With Docker
# Create data volume
docker volume create corpus_data

# Start container
docker run \
  -it --rm \
  -v corpus_data:/data \
  --net=host \
  -e CORPUS_LLM_PROVIDER_KEY="<LLM_SERVICE_API_KEY>" \
  -e CORPUS_LLM_PROVIDER_BASE_URL="<LLM_SERVICE_BASE_URL>" \
  -e CORPUS_LLM_PROVIDER_CHAT_COMPLETION_MODEL="<LLM_SERVICE_CHAT_COMPLETION_MODEL>" \
  -e CORPUS_LLM_PROVIDER_EMBEDDINGS_MODEL="<LLM_SERVICE_EMBEDDINGS_MODEL>" \
  -e CORPUS_HTTP_AUTHN_PROVIDERS_GITHUB_KEY="<github_oauth2_app_key>"
  -e CORPUS_HTTP_AUTHN_PROVIDERS_GITHUB_SECRET="<github_oauth2_app_secret>" \
  -e CORPUS_HTTP_AUTHN_PROVIDERS_GITHUB_SCOPES=openid,user \
  ghcr.io/bornholm/corpus-server:latest

Then open http://localhost:3002 in your browser.

Examples

With a local ollama instance:

CORPUS_LLM_PROVIDER_BASE_URL=http://127.0.0.1:11434/v1/
CORPUS_LLM_PROVIDER_CHAT_COMPLETION_MODEL=qwen2.5:7b
CORPUS_LLM_PROVIDER_EMBEDDINGS_MODEL=mxbai-embed-large

With Mistral:

CORPUS_LLM_PROVIDER_BASE_URL=https://api.mistral.ai/v1/
CORPUS_LLM_PROVIDER_CHAT_COMPLETION_MODEL=mistral-small-latest
CORPUS_LLM_PROVIDER_EMBEDDINGS_MODEL=mistral-embed

Usage as a library

Corpus can be embedded directly in a Go project to index and search documents without running a server.

Installation
go get github.com/bornholm/corpus

Note: The sqlitevec adapter uses CGO. Make sure a C compiler is available in your build environment.

Quick start
import (
    "context"

    "github.com/bornholm/corpus/pkg/corpus"
    "github.com/bornholm/genai/llm/provider"
    providerenv "github.com/bornholm/genai/llm/provider/env"
)

// Create a client from environment variables (see configuration below)
llmClient, _ := provider.Create(ctx, providerenv.With("LLM_", ".env"))

// Initialise an embedded corpus (SQLite + Bleve + SQLiteVec, auto-composed)
c, err := corpus.New(ctx,
    corpus.WithStoragePath("./data"),
    corpus.WithLLMClient(llmClient),
)

// Create a collection, index a file, search, ask
collID, _ := c.CreateCollection(ctx, "my-docs")
taskID, _ := c.IndexFile(ctx, collID, "notes.md", reader)

results, _ := c.Search(ctx, "my question",
    corpus.WithSearchCollections(collID),
    corpus.WithSearchMaxResults(5),
)

answer, _, _ := c.Ask(ctx, "my question", results)
LLM configuration

The providerenv.With(prefix) helper reads LLM configuration from environment variables:

Variable Description Example (ollama)
LLM_CHAT_COMPLETION_PROVIDER Chat completion provider openai
LLM_CHAT_COMPLETION_OPENAI_BASE_URL Chat completion endpoint http://localhost:11434/v1
LLM_CHAT_COMPLETION_OPENAI_MODEL Chat completion model qwen2.5:7b
LLM_EMBEDDINGS_PROVIDER Embeddings provider openai
LLM_EMBEDDINGS_OPENAI_BASE_URL Embeddings endpoint http://localhost:11434/v1
LLM_EMBEDDINGS_OPENAI_MODEL Embeddings model mxbai-embed-large

Variables can also be loaded from a .env file by passing its path to providerenv.With.

More advanced options are available. A full runnable example is available in example/embedded/.

Sponsors

Logo Cadoles

License

AGPL-3.0

Directories

Path Synopsis
cmd
client command
desktop command
server command
example
embedded command
Package main demonstrates two ways to use the corpus embedded library.
Package main demonstrates two ways to use the corpus embedded library.
internal
desktop/component
templ: version: v0.3.1001
templ: version: v0.3.1001
http/handler/webui/admin/component
templ: version: v0.3.1001
templ: version: v0.3.1001
http/handler/webui/ask/component
templ: version: v0.3.1001
templ: version: v0.3.1001
http/handler/webui/collection/component
templ: version: v0.3.1001
templ: version: v0.3.1001
http/handler/webui/common/component
templ: version: v0.3.1001
templ: version: v0.3.1001
http/handler/webui/profile/component
templ: version: v0.3.1001
templ: version: v0.3.1001
http/handler/webui/pubshare/component
templ: version: v0.3.1001
templ: version: v0.3.1001
http/handler/webui/templui/component/icon
templui component icon - version: v1.5.0 installed by templui v1.5.0 📚 Documentation: https://templui.io/docs/components/icon
templui component icon - version: v1.5.0 installed by templui v1.5.0 📚 Documentation: https://templui.io/docs/components/icon
http/handler/webui/templui/utils
templui util templui.go - version: v1.5.0 installed by templui v1.5.0
templui util templui.go - version: v1.5.0 installed by templui v1.5.0
http/middleware/authn/oidc/component
templ: version: v0.3.1001
templ: version: v0.3.1001
http/middleware/authn/token/component
templ: version: v0.3.1001
templ: version: v0.3.1001
llm
pkg
templx
form/renderer/templui
templ: version: v0.3.1001
templ: version: v0.3.1001

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL