kodit

package module
v1.3.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2026 License: Apache-2.0 Imports: 34 Imported by: 0

README

Kodit

Kodit

A code and document intelligence server that indexes Git repositories and provides search through MCP and REST APIs.

License Discussions

AI coding assistants work better when they have access to real examples from your codebase. Kodit indexes your repositories, splits source files into searchable snippets, and serves them to any MCP-compatible assistant. When your assistant needs to write new code, it queries Kodit first and gets back relevant, up-to-date examples drawn from your own projects.

Kodit also handles documents. PDFs, Word files, PowerPoint decks, and spreadsheets are rasterized and indexed so you can search across both code and documentation in one place.

What you get:

  • Multiple search strategies including BM25 keyword search, semantic vector search, regex grep, and visual document search, each exposed as a separate MCP tool so your assistant picks the right approach for each query
  • MCP server that works with Claude Code, Cursor, Cline, Kilo Code, and any other MCP-compatible assistant
  • REST API for programmatic access to search, repositories, enrichments, and indexing status
  • AI enrichments (optional) including architecture docs, API docs, database schema detection, cookbook examples, and commit summaries, all generated by an LLM
  • Document intelligence with visual search across PDF pages, Office documents, and images using multimodal embeddings
  • No external dependencies required for basic operation, with a built-in embedding model and SQLite storage

Quickstart

docker run -p 8080:8080 registry.helix.ml/helix/kodit:latest

This starts Kodit with SQLite storage and a built-in embedding model. No API keys needed.

Pre-built binaries

Download a binary from the releases page, then:

chmod +x kodit
./kodit serve

Verify it works

Open the interactive API docs at http://localhost:8080/docs.

Or index a small repository and run a search:

# Index a repository
curl http://localhost:8080/api/v1/repositories \
  -X POST -H "Content-Type: application/json" \
  -d '{
    "data": {
      "type": "repository",
      "attributes": {
        "remote_uri": "https://gist.github.com/philwinder/7aa38185e20433c04c533f2b28f4e217.git"
      }
    }
  }'

# Check indexing progress
curl http://localhost:8080/api/v1/repositories/1/status

# Search (once indexing is complete)
curl http://localhost:8080/api/v1/search \
  -X POST -H "Content-Type: application/json" \
  -d '{
    "data": {
      "type": "search",
      "attributes": {
        "keywords": ["orders"],
        "text": "code to get all orders"
      }
    }
  }'

Connecting to AI Assistants

Kodit exposes an MCP endpoint at /mcp. Connect your assistant to start using Kodit as a code search tool.

Claude Code

claude mcp add --transport http kodit http://localhost:8080/mcp

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "kodit": {
      "url": "http://localhost:8080/mcp"
    }
  }
}

Cline

Add to the MCP Servers configuration (Remote Servers tab):

{
  "mcpServers": {
    "kodit": {
      "autoApprove": [],
      "disabled": false,
      "timeout": 60,
      "type": "streamableHttp",
      "url": "http://localhost:8080/mcp"
    }
  }
}

Kilo Code

Add to the MCP configuration (Edit Project/Global MCP):

{
  "mcpServers": {
    "kodit": {
      "type": "streamable-http",
      "url": "http://localhost:8080/mcp",
      "alwaysAllow": [],
      "disabled": false
    }
  }
}

Replace http://localhost:8080 with your server URL if running remotely.

Encouraging assistants to use Kodit

Some assistants may not call Kodit tools automatically. Add this to your project rules or system prompt to enforce usage:

For every request that involves writing or modifying code, the assistant's first
action must be to call the kodit search MCP tools. Only produce or edit code after
the tool call returns results.

In Cursor, save this as .cursor/rules/kodit.mdc with alwaysApply: true frontmatter.

MCP Tools

Kodit exposes these tools to connected AI assistants:

Tool Description
kodit_repositories List all indexed repositories
kodit_semantic_search Semantic similarity search across code
kodit_keyword_search BM25 keyword search
kodit_visual_search Search document page images
kodit_grep Regex pattern matching
kodit_ls List files by glob pattern
kodit_read_resource Read file content by URI
kodit_architecture_docs Architecture documentation for a repo
kodit_api_docs Public API documentation
kodit_database_schema Database schema documentation
kodit_cookbook Usage examples and patterns
kodit_commit_description Commit description
kodit_wiki Wiki table of contents
kodit_wiki_page Read a specific wiki page
kodit_version Server version

The enrichment tools (architecture_docs, api_docs, database_schema, cookbook, wiki, commit_description) require an LLM provider to be configured. See Enrichment Providers under Configuration Reference.

Go Library

Kodit can be embedded directly as a Go library. This is how Helix integrates Kodit into its platform.

import "github.com/helixml/kodit"

client, err := kodit.New(
    kodit.WithSQLite(".kodit/data.db"),
)
if err != nil {
    log.Fatal(err)
}
defer client.Close()

// Index a repository
repo, err := client.Repositories.Add(ctx, &service.RepositoryAddParams{
    URL: "https://github.com/kubernetes/kubernetes",
})

// Search
results, err := client.Search.Query(ctx, "create a deployment",
    service.WithLimit(10),
)

for _, snippet := range results.Snippets() {
    fmt.Println(snippet.Path(), snippet.Name())
}

Library options

Option Description
WithSQLite(path) Use SQLite for storage
WithPostgresVectorchord(dsn) Use PostgreSQL with VectorChord
WithOpenAI(apiKey) OpenAI for embeddings and text
WithAnthropic(apiKey) Anthropic Claude for text (needs separate embedding provider)
WithTextProvider(p) Custom text generation provider
WithEmbeddingProvider(p) Custom embedding provider
WithRAGPipeline() Skip LLM enrichments, index and search only
WithFullPipeline() Require all enrichments (errors without a text provider)
WithDataDir(dir) Data directory (default: ~/.kodit)
WithCloneDir(dir) Repository clone directory
WithAPIKeys(keys...) API keys for HTTP authentication
WithWorkerCount(n) Number of background workers (default: 1)
WithPeriodicSyncConfig(cfg) Automatic repository sync settings

Search options

Option Description
WithSemanticWeight(w) Weight for semantic vs keyword search (0.0 to 1.0)
WithLimit(n) Maximum number of results
WithOffset(n) Offset for pagination
WithLanguages(langs...) Filter by programming languages
WithRepositories(ids...) Filter by repository IDs
WithMinScore(score) Minimum score threshold

Go HTTP client

A generated HTTP client is available for calling a remote Kodit server from Go:

go get github.com/helixml/kodit/clients/go
import koditclient "github.com/helixml/kodit/clients/go"

client, err := koditclient.NewClient("https://kodit.example.com")

// List repositories
resp, err := client.GetApiV1Repositories(ctx)

// Search
resp, err := client.PostApiV1SearchMulti(ctx, koditclient.PostApiV1SearchMultiJSONRequestBody{
    TextQuery: "create a deployment",
    TopK:      10,
})

Types are auto-generated from the OpenAPI spec. See the interactive API docs at /docs for the full endpoint list.

Production Deployment

For production use, deploy with PostgreSQL (VectorChord) for scalable vector search and a dedicated LLM provider for enrichments.

Docker Compose

Save this as docker-compose.yaml:

services:
  kodit:
    image: registry.helix.ml/helix/kodit:latest
    ports:
      - "8080:8080"
    command: ["serve"]
    restart: unless-stopped
    depends_on:
      - vectorchord
    environment:
      DATA_DIR: /data
      DB_URL: postgresql://postgres:mysecretpassword@vectorchord:5432/kodit

      # Enrichment LLM (optional, enables AI-generated docs)
      ENRICHMENT_ENDPOINT_BASE_URL: http://ollama:11434
      ENRICHMENT_ENDPOINT_MODEL: ollama/qwen3:1.7b

      # External embedding provider (optional, replaces built-in model)
      # EMBEDDING_ENDPOINT_API_KEY: sk-proj-xxxx
      # EMBEDDING_ENDPOINT_MODEL: openai/text-embedding-3-small

      LOG_LEVEL: INFO
      API_KEYS: ${KODIT_API_KEYS:-}
    volumes:
      - kodit-data:/data

  vectorchord:
    image: tensorchord/vchord-suite:pg17-20250601
    environment:
      POSTGRES_DB: kodit
      POSTGRES_PASSWORD: mysecretpassword
    volumes:
      - vectorchord-data:/var/lib/postgresql/data
    restart: unless-stopped

volumes:
  kodit-data:
  vectorchord-data:

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vectorchord
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vectorchord
  template:
    metadata:
      labels:
        app: vectorchord
    spec:
      containers:
        - name: vectorchord
          image: tensorchord/vchord-suite:pg17-20250601
          env:
            - name: POSTGRES_DB
              value: kodit
            - name: POSTGRES_PASSWORD
              value: mysecretpassword
          ports:
            - containerPort: 5432
---
apiVersion: v1
kind: Service
metadata:
  name: vectorchord
spec:
  selector:
    app: vectorchord
  ports:
    - port: 5432
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kodit
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kodit
  template:
    metadata:
      labels:
        app: kodit
    spec:
      containers:
        - name: kodit
          image: registry.helix.ml/helix/kodit:latest # pin to a specific version
          args: ["serve"]
          env: [] # see Configuration Reference for environment variables
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: kodit
spec:
  type: LoadBalancer
  selector:
    app: kodit
  ports:
    - port: 8080

Authentication

Set the API_KEYS environment variable to a comma-separated list of keys. Write endpoints (creating repositories, triggering syncs) require a valid key in the Authorization: Bearer <key> header. Search endpoints are open by default.

Configuration Reference

Configuration is done through environment variables. You can also use a .env file:

kodit serve --env-file .env

Server

Variable Default Description
HOST 0.0.0.0 Listen address
PORT 8080 Listen port
DATA_DIR ~/.kodit Data directory for models, clones, and database
DB_URL (empty) PostgreSQL connection string (uses SQLite if empty)
LOG_LEVEL INFO Logging verbosity: DEBUG, INFO, WARN, ERROR
LOG_FORMAT pretty Log format: pretty or json
API_KEYS (empty) Comma-separated API keys for write endpoints
WORKER_COUNT 1 Number of background workers
SEARCH_LIMIT 10 Default search result limit
DISABLE_TELEMETRY false Disable anonymous usage telemetry

Embedding Provider

These configure an external embedding model. If unset, Kodit uses its built-in model.

Variable Default Description
EMBEDDING_ENDPOINT_BASE_URL (empty) Base URL of embedding service
EMBEDDING_ENDPOINT_MODEL (empty) Model identifier
EMBEDDING_ENDPOINT_API_KEY (empty) API key
EMBEDDING_ENDPOINT_MAX_TOKENS 0 Max tokens per request (0 = provider default)
EMBEDDING_ENDPOINT_MAX_BATCH_CHARS 16000 Max total characters per embedding batch
EMBEDDING_ENDPOINT_MAX_BATCH_SIZE 1 Max items per batch
EMBEDDING_ENDPOINT_TIMEOUT 60 Request timeout in seconds
EMBEDDING_ENDPOINT_NUM_PARALLEL_TASKS 1 Concurrent embedding requests
EMBEDDING_ENDPOINT_EXTRA_PARAMS (empty) JSON-encoded extra parameters for the embedding provider
EMBEDDING_ENDPOINT_QUERY_INSTRUCTION (empty) Instruction prepended to queries for asymmetric retrieval
EMBEDDING_ENDPOINT_DOCUMENT_INSTRUCTION (empty) Instruction prepended to documents for asymmetric retrieval

Vision Embedding Provider

These configure a remote service for image and text vision embeddings. If unset, Kodit uses its built-in SigLIP2 model.

Variable Default Description
VISION_EMBEDDING_ENDPOINT_BASE_URL (empty) Base URL of vision embedding service
VISION_EMBEDDING_ENDPOINT_MODEL (empty) Model identifier
VISION_EMBEDDING_ENDPOINT_API_KEY (empty) API key
VISION_EMBEDDING_ENDPOINT_MAX_TOKENS 0 Max tokens per request (0 = provider default)
VISION_EMBEDDING_ENDPOINT_MAX_BATCH_CHARS 16000 Max total characters per embedding batch
VISION_EMBEDDING_ENDPOINT_MAX_BATCH_SIZE 1 Max items per batch
VISION_EMBEDDING_ENDPOINT_TIMEOUT 60 Request timeout in seconds
VISION_EMBEDDING_ENDPOINT_NUM_PARALLEL_TASKS 1 Concurrent vision embedding requests
VISION_EMBEDDING_ENDPOINT_EXTRA_PARAMS (empty) JSON-encoded extra parameters for the vision embedding provider
VISION_EMBEDDING_ENDPOINT_QUERY_INSTRUCTION (empty) Instruction prepended to queries for asymmetric retrieval
VISION_EMBEDDING_ENDPOINT_DOCUMENT_INSTRUCTION (empty) Instruction prepended to documents for asymmetric retrieval

Enrichment Providers

These configure an LLM for generating architecture docs, API docs, database schemas, cookbooks, commit summaries, and wiki pages. Without this, Kodit indexes and searches code but does not generate any AI documentation.

Variable Default Description
ENRICHMENT_ENDPOINT_BASE_URL (empty) Base URL of LLM service
ENRICHMENT_ENDPOINT_MODEL (empty) Model identifier
ENRICHMENT_ENDPOINT_API_KEY (empty) API key
ENRICHMENT_ENDPOINT_NUM_PARALLEL_TASKS 1 Concurrent enrichment requests
ENRICHMENT_ENDPOINT_TIMEOUT 60 Request timeout in seconds
ENRICHMENT_ENDPOINT_EXTRA_PARAMS (empty) JSON-encoded extra parameters for the LLM

Enrichment is typically the slowest part of indexing because each enrichment requires a round-trip to the LLM provider. Increase NUM_PARALLEL_TASKS to speed things up, but respect your provider's rate limits. Start low and increase over time.

Provider examples:

# OpenAI
ENRICHMENT_ENDPOINT_BASE_URL=https://api.openai.com/v1
ENRICHMENT_ENDPOINT_MODEL=gpt-4o-mini
ENRICHMENT_ENDPOINT_API_KEY=sk-proj-xxxx

# Ollama (local)
ENRICHMENT_ENDPOINT_BASE_URL=http://localhost:11434
ENRICHMENT_ENDPOINT_MODEL=ollama/qwen3:1.7b

# Helix (private cloud)
ENRICHMENT_ENDPOINT_BASE_URL=https://app.helix.ml/v1
ENRICHMENT_ENDPOINT_MODEL=Qwen/Qwen3-8B
ENRICHMENT_ENDPOINT_API_KEY=your-helix-key

Periodic Sync

Variable Default Description
PERIODIC_SYNC_ENABLED true Auto-sync repositories on an interval
PERIODIC_SYNC_INTERVAL_SECONDS 1800 Sync interval (default: 30 minutes)
PERIODIC_SYNC_RETRY_ATTEMPTS 3 Retry count on sync failure

Chunking

Variable Default Description
CHUNK_SIZE 1500 Characters per chunk
CHUNK_OVERLAP 200 Overlap between adjacent chunks
CHUNK_MIN_SIZE 50 Minimum chunk size

REST API

The full API is documented interactively at /docs on a running Kodit instance. The OpenAPI 3.0 specification is available at /docs/openapi.json.

Key endpoints:

Method Path Description
POST /api/v1/repositories Add a repository for indexing
GET /api/v1/repositories List indexed repositories
GET /api/v1/repositories/{id}/status Indexing progress
POST /api/v1/repositories/{id}/sync Trigger a sync
DELETE /api/v1/repositories/{id} Remove a repository
POST /api/v1/search Combined search (keyword + semantic)
GET /api/v1/search/semantic Semantic search only
GET /api/v1/search/keyword Keyword search only
GET /api/v1/search/visual Visual search on document pages
GET /api/v1/search/grep Regex pattern search
GET /api/v1/search/ls List files by glob

All write endpoints require an Authorization: Bearer <key> header when API_KEYS is set.

How Indexing Works

When you add a repository, Kodit runs a pipeline:

  1. Clone the Git repository to local storage
  2. Scan commits, branches, and tags to extract metadata
  3. Extract snippets by splitting source files into overlapping text chunks
  4. Build search indexes with BM25 (keyword) and vector embeddings (semantic)
  5. Generate enrichments (if an LLM provider is configured): architecture docs, API docs, database schemas, cookbook examples, commit summaries, and wiki pages

Kodit tracks which files have changed between syncs and only reprocesses modified content. Repositories sync automatically on a configurable interval (default: every 30 minutes).

Supported sources

Kodit indexes any Git repository accessible via HTTPS, SSH, or the Git protocol. This includes GitHub, GitLab, Bitbucket, Azure DevOps, and self-hosted servers.

Private repositories

Private repositories are supported through personal access tokens or SSH keys:

# HTTPS with token
https://username:token@github.com/username/repo.git

# SSH (ensure your SSH key is configured)
git@github.com:username/repo.git

Privacy

Kodit respects .gitignore and .noindex files. Files matching these patterns are excluded from indexing.

Storage Backends

SQLite (default)

No configuration needed. Kodit creates a SQLite database in the data directory with FTS5 for keyword search and in-process vector storage. Good for single-user and small-team deployments.

PostgreSQL with VectorChord

For larger deployments, use PostgreSQL with the VectorChord extension. This provides scalable vector search and concurrent access. Set the DB_URL environment variable to your connection string.

The recommended Docker image is tensorchord/vchord-suite:pg17-20250601, which bundles PostgreSQL 17 with VectorChord, vchord_bm25, and pg_tokenizer.

Building from Source

git clone https://github.com/helixml/kodit.git
cd kodit
make tools          # Install development tools
make download-model # Download the built-in embedding model
make build          # Build the binary
./bin/kodit version
./bin/kodit serve

Run the tests:

make test                         # All tests
make test PKG=./internal/foo/...  # Specific package
make check                        # Format, vet, lint, and test

Troubleshooting

MCP connection error after restart: If you see No valid session ID provided after restarting the Kodit server, reload the MCP client in your assistant. MCP sessions do not survive server restarts.

No search results: Check that indexing has completed by calling GET /api/v1/repositories/{id}/status. If status shows errors, check the server logs with LOG_LEVEL=DEBUG.

Enrichments not generating: Enrichments require an LLM provider. Check that ENRICHMENT_ENDPOINT_BASE_URL and ENRICHMENT_ENDPOINT_MODEL are set. Without these, Kodit indexes and searches code but does not generate AI documentation.

Telemetry

Kodit collects limited anonymous telemetry (usage metadata only, no user data) to guide development. Disable it with:

DISABLE_TELEMETRY=true

Commercial Support

Helix provides a managed platform built on Kodit with additional features including a management UI, repository browsing, team collaboration, and hosted infrastructure. For commercial support or enterprise integration, contact founders@helix.ml.

Contributing

See CONTRIBUTING.md for guidelines.

License

Apache 2.0

Documentation

Overview

Package kodit provides a library for code understanding, indexing, and search.

Kodit indexes Git repositories, extracts semantic code snippets using AST parsing, and provides hybrid search (BM25 + vector embeddings) with LLM-powered enrichments.

Basic usage:

client, err := kodit.New(
    kodit.WithSQLite(".kodit/data.db"),
    kodit.WithOpenAI(os.Getenv("OPENAI_API_KEY")),
)
if err != nil {
    log.Fatal(err)
}
defer client.Close()

// Index a repository
repo, err := client.Repositories.Add(ctx, &service.RepositoryAddParams{
    URL: "https://github.com/kubernetes/kubernetes",
})

// Hybrid search
results, err := client.Search.Query(ctx, "create a deployment",
    service.WithSemanticWeight(0.7),
    service.WithLimit(10),
)

// Iterate results
for _, snippet := range results.Snippets() {
    fmt.Println(snippet.Path(), snippet.Name())
}

Pipeline presets

By default, kodit runs all indexing operations including LLM enrichments when a text provider is configured. Use WithRAGPipeline to skip LLM enrichments and run only the operations needed for retrieval-augmented generation:

client, err := kodit.New(
    kodit.WithSQLite(".kodit/data.db"),
    kodit.WithRAGPipeline(), // skip wiki, summaries, architecture docs, etc.
)

Use WithFullPipeline to explicitly require all enrichments (returns an error if no text provider is configured):

client, err := kodit.New(
    kodit.WithSQLite(".kodit/data.db"),
    kodit.WithOpenAI(os.Getenv("OPENAI_API_KEY")),
    kodit.WithFullPipeline(),
)

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrEmptySource indicates a source with no content to process.
	ErrEmptySource = errors.New("kodit: source is empty")

	// ErrNotFound indicates a requested resource was not found.
	ErrNotFound = errors.New("kodit: not found")

	// ErrValidation indicates a validation error.
	ErrValidation = errors.New("kodit: validation error")

	// ErrConflict indicates a conflict with existing data.
	ErrConflict = errors.New("kodit: conflict")

	// ErrNoDatabase indicates no database was configured.
	ErrNoDatabase = errors.New("kodit: no database configured")

	// ErrNoProvider indicates no AI provider was configured.
	ErrNoProvider = errors.New("kodit: no AI provider configured")

	// ErrProviderNotCapable indicates the provider lacks required capability.
	ErrProviderNotCapable = errors.New("kodit: provider does not support required capability")

	// ErrClientClosed is the canonical error for a closed client.
	// It references the service-level error so errors.Is works across packages.
	ErrClientClosed = service.ErrClientClosed
)

Exported errors for library consumers.

Functions

func NewScopedMCPHandler added in v1.1.7

func NewScopedMCPHandler(client *Client, repoIDs []int64) http.Handler

NewScopedMCPHandler creates an HTTP handler for the MCP protocol scoped to the given repository IDs. Only repositories in repoIDs are visible through the returned handler's tools and searches.

When repoIDs is nil or empty, the handler is unscoped — identical to the full MCP endpoint that sees all repositories.

Types

type Client

type Client struct {
	// Public resource fields (direct service access)
	Repositories *service.Repository
	Commits      *service.Commit
	Tags         *service.Tag
	Files        *service.File
	Blobs        *service.Blob
	Enrichments  *service.Enrichment
	Tasks        *service.Queue
	Tracking     *service.Tracking
	Search       *service.Search
	Grep         *service.Grep
	Pipelines    *service.Pipeline

	// MCPServer describes the MCP server's tools and instructions.
	MCPServer MCPServer
	// contains filtered or unexported fields
}

Client is the main entry point for the kodit library. The background worker starts automatically on creation.

Access resources via struct fields:

client.Repositories.Find(ctx)
client.Commits.Find(ctx, repository.WithRepoID(id))
client.Search.Query(ctx, "query")

func New

func New(opts ...Option) (*Client, error)

New creates a new Client with the given options. The background worker is started automatically.

func (*Client) Close

func (c *Client) Close() error

Close releases all resources and stops the background worker.

func (*Client) Logger

func (c *Client) Logger() zerolog.Logger

Logger returns the client's logger.

func (*Client) Rasterizers added in v1.3.0

func (c *Client) Rasterizers() *rasterization.Registry

Rasterizers returns the document rasterization registry, or nil if unavailable.

func (*Client) TextRenderers added in v1.3.1

func (c *Client) TextRenderers() *extraction.TextRendererRegistry

TextRenderers returns the document text rendering registry.

func (*Client) WorkerIdle

func (c *Client) WorkerIdle() bool

WorkerIdle reports whether the background worker has no in-flight tasks.

type MCPServer added in v1.1.3

type MCPServer struct {
	// contains filtered or unexported fields
}

MCPServer describes the metadata of a kodit MCP server: its usage instructions and the tools it provides.

func NewMCPServer added in v1.1.3

func NewMCPServer(instructions string, tools []Tool) MCPServer

NewMCPServer creates an MCPServer.

func (MCPServer) Instructions added in v1.1.3

func (s MCPServer) Instructions() string

Instructions returns the server's usage instructions.

func (MCPServer) Tools added in v1.1.3

func (s MCPServer) Tools() []Tool

Tools returns a copy of the server's tools.

type Option

type Option func(*clientConfig)

Option configures the Client.

func WithAPIKeys

func WithAPIKeys(keys ...string) Option

WithAPIKeys sets the API keys for HTTP API authentication.

func WithAnthropic

func WithAnthropic(apiKey string) Option

WithAnthropic sets Anthropic Claude as the text generation provider. Requires a separate embedding provider since Anthropic doesn't provide embeddings.

func WithAnthropicConfig

func WithAnthropicConfig(cfg provider.AnthropicConfig) Option

WithAnthropicConfig sets Anthropic Claude with custom configuration.

func WithChunkParams

func WithChunkParams(params chunking.ChunkParams) Option

WithChunkParams sets the chunk parameters for chunking.

func WithCloneDir

func WithCloneDir(dir string) Option

WithCloneDir sets the directory where repositories are cloned. If not specified, defaults to {dataDir}/repos.

func WithCloser

func WithCloser(c io.Closer) Option

WithCloser registers a resource to be closed when the Client shuts down.

func WithDataDir

func WithDataDir(dir string) Option

WithDataDir sets the data directory for cloned repositories and database storage.

func WithEmbeddingBudget

func WithEmbeddingBudget(b search.TokenBudget) Option

WithEmbeddingBudget sets the token budget for code embedding batches.

func WithEmbeddingParallelism

func WithEmbeddingParallelism(n int) Option

WithEmbeddingParallelism sets how many embedding batches are dispatched concurrently. Defaults to 1. Values <= 0 are ignored.

func WithEmbeddingProvider

func WithEmbeddingProvider(p search.Embedder) Option

WithEmbeddingProvider sets a custom embedding provider.

func WithEnricherParallelism

func WithEnricherParallelism(n int) Option

WithEnricherParallelism sets how many enrichment LLM requests are dispatched concurrently. Defaults to 1. Values <= 0 are ignored.

func WithEnrichmentBudget

func WithEnrichmentBudget(b search.TokenBudget) Option

WithEnrichmentBudget sets the token budget for enrichment embedding batches.

func WithEnrichmentParallelism

func WithEnrichmentParallelism(n int) Option

WithEnrichmentParallelism sets how many enrichment embedding batches are dispatched concurrently. Defaults to 1. Values <= 0 are ignored.

func WithFullPipeline added in v1.2.1

func WithFullPipeline() Option

WithFullPipeline runs all indexing operations including LLM enrichments. A text provider must be configured or New() returns an error. This is the default when a text provider is configured.

func WithLogger

func WithLogger(l zerolog.Logger) Option

WithLogger sets a custom logger.

func WithModelDir

func WithModelDir(dir string) Option

WithModelDir sets the directory where built-in model files are stored. Defaults to {dataDir}/models if not specified.

func WithOpenAI

func WithOpenAI(apiKey string) Option

WithOpenAI sets OpenAI as the AI provider (text + embeddings).

func WithOpenAIConfig

func WithOpenAIConfig(cfg provider.OpenAIConfig) Option

WithOpenAIConfig sets OpenAI with custom configuration.

func WithPeriodicSyncConfig

func WithPeriodicSyncConfig(cfg config.PeriodicSyncConfig) Option

WithPeriodicSyncConfig sets the periodic sync configuration.

func WithPostgresVectorchord

func WithPostgresVectorchord(dsn string) Option

WithPostgresVectorchord configures PostgreSQL with VectorChord extension. VectorChord provides both BM25 and vector search.

func WithRAGPipeline added in v1.2.1

func WithRAGPipeline() Option

WithRAGPipeline configures the indexing pipeline for RAG use cases. Snippet extraction, BM25 indexing, code embeddings, and AST-based API docs run. All LLM enrichments (commit descriptions, architecture docs, database schema, cookbook, wiki) are skipped even if a text provider is configured.

func WithSQLite

func WithSQLite(path string) Option

WithSQLite configures SQLite as the database. BM25 uses FTS5, vector search uses the configured embedding provider.

func WithSkipProviderValidation

func WithSkipProviderValidation() Option

WithSkipProviderValidation skips the provider configuration validation. This is intended for testing only. In production, embedding and text providers are required for full functionality.

func WithTextProvider

func WithTextProvider(p provider.TextGenerator) Option

WithTextProvider sets a custom text generation provider.

func WithVisionEmbedder added in v1.3.1

func WithVisionEmbedder(e search.Embedder) Option

WithVisionEmbedder sets the vision embedder. The embedder must accept both image items and text items and produce vectors in the same embedding space. When set, replaces the local SigLIP2 model.

func WithWorkerCount

func WithWorkerCount(n int) Option

WithWorkerCount sets the number of background worker goroutines. Defaults to 1 if not specified.

func WithWorkerPollPeriod

func WithWorkerPollPeriod(d time.Duration) Option

WithWorkerPollPeriod sets how often the background worker checks for new tasks. Defaults to 1 second. Lower values speed up task processing at the cost of more frequent polling — useful in tests.

type Parameter added in v1.1.3

type Parameter struct {
	// contains filtered or unexported fields
}

Parameter describes a single parameter accepted by an MCP tool.

func NewParameter added in v1.1.3

func NewParameter(name, description, typ string, required bool) Parameter

NewParameter creates a Parameter.

func (Parameter) Description added in v1.1.3

func (p Parameter) Description() string

Description returns the parameter description.

func (Parameter) Name added in v1.1.3

func (p Parameter) Name() string

Name returns the parameter name.

func (Parameter) Required added in v1.1.3

func (p Parameter) Required() bool

Required reports whether the parameter is required.

func (Parameter) Type added in v1.1.3

func (p Parameter) Type() string

Type returns the parameter type (e.g. "string", "number").

type Tool added in v1.1.3

type Tool struct {
	// contains filtered or unexported fields
}

Tool describes an MCP tool with its parameters.

func NewTool added in v1.1.3

func NewTool(name, description string, parameters []Parameter) Tool

NewTool creates a Tool.

func (Tool) Description added in v1.1.3

func (t Tool) Description() string

Description returns the tool description.

func (Tool) Name added in v1.1.3

func (t Tool) Name() string

Name returns the tool name.

func (Tool) Parameters added in v1.1.3

func (t Tool) Parameters() []Parameter

Parameters returns a copy of the tool's parameters.

Directories

Path Synopsis
application
handler
Package handler provides task handlers for processing queued operations.
Package handler provides task handlers for processing queued operations.
handler/enrichment
Package enrichment provides task handlers for enrichment operations.
Package enrichment provides task handlers for enrichment operations.
service
Package service provides application layer services that orchestrate domain operations.
Package service provides application layer services that orchestrate domain operations.
clients
go
Package kodit provides primitives to interact with the openapi HTTP API.
Package kodit provides primitives to interact with the openapi HTTP API.
cmd
download-model command
Standalone tool that converts the st-codesearch-distilroberta-base model to ONNX format for hugot embedding.
Standalone tool that converts the st-codesearch-distilroberta-base model to ONNX format for hugot embedding.
download-siglip2 command
Standalone tool that downloads the pre-converted SigLIP2 ONNX model from onnx-community on Hugging Face.
Standalone tool that downloads the pre-converted SigLIP2 ONNX model from onnx-community on Hugging Face.
kodit command
Package main is the entry point for the kodit CLI.
Package main is the entry point for the kodit CLI.
domain
enrichment
Package enrichment provides domain types for AI-generated semantic metadata.
Package enrichment provides domain types for AI-generated semantic metadata.
repository
Package repository provides Git repository domain types.
Package repository provides Git repository domain types.
search
Package search provides search domain types for hybrid code retrieval.
Package search provides search domain types for hybrid code retrieval.
service
Package service provides domain service interfaces.
Package service provides domain service interfaces.
sourcelocation
Package sourcelocation provides metadata about where an enrichment's content originates within a source file.
Package sourcelocation provides metadata about where an enrichment's content originates within a source file.
task
Package task provides task queue domain types for async work processing.
Package task provides task queue domain types for async work processing.
tracking
Package tracking provides progress tracking and reporting types for long-running tasks.
Package tracking provides progress tracking and reporting types for long-running tasks.
infrastructure
api
Package api provides HTTP server and API documentation.
Package api provides HTTP server and API documentation.
api/jsonapi
Package jsonapi provides JSON:API specification compliant types for API responses.
Package jsonapi provides JSON:API specification compliant types for API responses.
api/middleware
Package middleware provides HTTP middleware for the API server.
Package middleware provides HTTP middleware for the API server.
api/v1
Package v1 provides the v1 API routes.
Package v1 provides the v1 API routes.
api/v1/dto
Package dto provides data transfer objects for the API layer.
Package dto provides data transfer objects for the API layer.
chunking
Package chunking provides fixed-size text chunking with overlap for RAG indexing.
Package chunking provides fixed-size text chunking with overlap for RAG indexing.
enricher
Package enricher provides AI-powered enrichment generation.
Package enricher provides AI-powered enrichment generation.
enricher/example
Package example provides extraction of code examples from documentation.
Package example provides extraction of code examples from documentation.
git
Package git provides Git repository infrastructure implementations.
Package git provides Git repository infrastructure implementations.
persistence
Package persistence provides database storage implementations.
Package persistence provides database storage implementations.
provider
Package provider provides AI provider implementations for text generation and embedding generation.
Package provider provides AI provider implementations for text generation and embedding generation.
rasterization
Package rasterization converts document pages to images.
Package rasterization converts document pages to images.
internal
config
Package config provides application configuration.
Package config provides application configuration.
database
Package database provides database connection and session management using GORM.
Package database provides database connection and session management using GORM.
log
Package log provides structured logging with correlation IDs.
Package log provides structured logging with correlation IDs.
mcp
Package mcp provides Model Context Protocol server functionality.
Package mcp provides Model Context Protocol server functionality.
testdb
Package testdb provides a shared test database helper for fast, realistic testing against an in-memory SQLite database.
Package testdb provides a shared test database helper for fast, realistic testing against an in-memory SQLite database.
tools
download-ort command
Build-time tool that downloads the ONNX Runtime shared library and the HuggingFace tokenizers static library for the current platform.
Build-time tool that downloads the ONNX Runtime shared library and the HuggingFace tokenizers static library for the current platform.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL