Documentation
¶
Overview ¶
Package kodit provides a library for code understanding, indexing, and search.
Kodit indexes Git repositories, extracts semantic code snippets using AST parsing, and provides hybrid search (BM25 + vector embeddings) with LLM-powered enrichments.
Basic usage:
client, err := kodit.New(
kodit.WithSQLite(".kodit/data.db"),
kodit.WithOpenAI(os.Getenv("OPENAI_API_KEY")),
)
if err != nil {
log.Fatal(err)
}
defer client.Close()
// Index a repository
repo, err := client.Repositories.Add(ctx, &service.RepositoryAddParams{
URL: "https://github.com/kubernetes/kubernetes",
})
// Hybrid search
results, err := client.Search.Query(ctx, "create a deployment",
service.WithSemanticWeight(0.7),
service.WithLimit(10),
)
// Iterate results
for _, snippet := range results.Snippets() {
fmt.Println(snippet.Path(), snippet.Name())
}
Pipeline presets ¶
By default, kodit runs all indexing operations including LLM enrichments when a text provider is configured. Use WithRAGPipeline to skip LLM enrichments and run only the operations needed for retrieval-augmented generation:
client, err := kodit.New(
kodit.WithSQLite(".kodit/data.db"),
kodit.WithRAGPipeline(), // skip wiki, summaries, architecture docs, etc.
)
Use WithFullPipeline to explicitly require all enrichments (returns an error if no text provider is configured):
client, err := kodit.New(
kodit.WithSQLite(".kodit/data.db"),
kodit.WithOpenAI(os.Getenv("OPENAI_API_KEY")),
kodit.WithFullPipeline(),
)
Index ¶
- Variables
- func NewScopedMCPHandler(client *Client, repoIDs []int64) http.Handler
- type Client
- type MCPServer
- type Option
- func WithAPIKeys(keys ...string) Option
- func WithAnthropic(apiKey string) Option
- func WithAnthropicConfig(cfg provider.AnthropicConfig) Option
- func WithChunkParams(params chunking.ChunkParams) Option
- func WithCloneDir(dir string) Option
- func WithCloser(c io.Closer) Option
- func WithDataDir(dir string) Option
- func WithEmbeddingBudget(b search.TokenBudget) Option
- func WithEmbeddingParallelism(n int) Option
- func WithEmbeddingProvider(p search.Embedder) Option
- func WithEnricherParallelism(n int) Option
- func WithEnrichmentBudget(b search.TokenBudget) Option
- func WithEnrichmentParallelism(n int) Option
- func WithFullPipeline() Option
- func WithLogger(l zerolog.Logger) Option
- func WithModelDir(dir string) Option
- func WithOpenAI(apiKey string) Option
- func WithOpenAIConfig(cfg provider.OpenAIConfig) Option
- func WithPeriodicSyncConfig(cfg config.PeriodicSyncConfig) Option
- func WithPostgresVectorchord(dsn string) Option
- func WithRAGPipeline() Option
- func WithSQLite(path string) Option
- func WithSkipProviderValidation() Option
- func WithTextProvider(p provider.TextGenerator) Option
- func WithVisionEmbedder(e search.Embedder) Option
- func WithWorkerCount(n int) Option
- func WithWorkerPollPeriod(d time.Duration) Option
- type Parameter
- type Tool
Constants ¶
This section is empty.
Variables ¶
var ( // ErrEmptySource indicates a source with no content to process. ErrEmptySource = errors.New("kodit: source is empty") // ErrNotFound indicates a requested resource was not found. ErrNotFound = errors.New("kodit: not found") // ErrValidation indicates a validation error. ErrValidation = errors.New("kodit: validation error") // ErrConflict indicates a conflict with existing data. ErrConflict = errors.New("kodit: conflict") // ErrNoDatabase indicates no database was configured. ErrNoDatabase = errors.New("kodit: no database configured") // ErrNoProvider indicates no AI provider was configured. ErrNoProvider = errors.New("kodit: no AI provider configured") // ErrProviderNotCapable indicates the provider lacks required capability. ErrProviderNotCapable = errors.New("kodit: provider does not support required capability") // ErrClientClosed is the canonical error for a closed client. // It references the service-level error so errors.Is works across packages. ErrClientClosed = service.ErrClientClosed )
Exported errors for library consumers.
Functions ¶
func NewScopedMCPHandler ¶ added in v1.1.7
NewScopedMCPHandler creates an HTTP handler for the MCP protocol scoped to the given repository IDs. Only repositories in repoIDs are visible through the returned handler's tools and searches.
When repoIDs is nil or empty, the handler is unscoped — identical to the full MCP endpoint that sees all repositories.
Types ¶
type Client ¶
type Client struct {
// Public resource fields (direct service access)
Repositories *service.Repository
Commits *service.Commit
Tags *service.Tag
Files *service.File
Blobs *service.Blob
Enrichments *service.Enrichment
Tasks *service.Queue
Tracking *service.Tracking
Search *service.Search
Grep *service.Grep
Pipelines *service.Pipeline
// MCPServer describes the MCP server's tools and instructions.
MCPServer MCPServer
// contains filtered or unexported fields
}
Client is the main entry point for the kodit library. The background worker starts automatically on creation.
Access resources via struct fields:
client.Repositories.Find(ctx) client.Commits.Find(ctx, repository.WithRepoID(id)) client.Search.Query(ctx, "query")
func New ¶
New creates a new Client with the given options. The background worker is started automatically.
func (*Client) Rasterizers ¶ added in v1.3.0
func (c *Client) Rasterizers() *rasterization.Registry
Rasterizers returns the document rasterization registry, or nil if unavailable.
func (*Client) TextRenderers ¶ added in v1.3.1
func (c *Client) TextRenderers() *extraction.TextRendererRegistry
TextRenderers returns the document text rendering registry.
func (*Client) WorkerIdle ¶
WorkerIdle reports whether the background worker has no in-flight tasks.
type MCPServer ¶ added in v1.1.3
type MCPServer struct {
// contains filtered or unexported fields
}
MCPServer describes the metadata of a kodit MCP server: its usage instructions and the tools it provides.
func NewMCPServer ¶ added in v1.1.3
NewMCPServer creates an MCPServer.
func (MCPServer) Instructions ¶ added in v1.1.3
Instructions returns the server's usage instructions.
type Option ¶
type Option func(*clientConfig)
Option configures the Client.
func WithAPIKeys ¶
WithAPIKeys sets the API keys for HTTP API authentication.
func WithAnthropic ¶
WithAnthropic sets Anthropic Claude as the text generation provider. Requires a separate embedding provider since Anthropic doesn't provide embeddings.
func WithAnthropicConfig ¶
func WithAnthropicConfig(cfg provider.AnthropicConfig) Option
WithAnthropicConfig sets Anthropic Claude with custom configuration.
func WithChunkParams ¶
func WithChunkParams(params chunking.ChunkParams) Option
WithChunkParams sets the chunk parameters for chunking.
func WithCloneDir ¶
WithCloneDir sets the directory where repositories are cloned. If not specified, defaults to {dataDir}/repos.
func WithCloser ¶
WithCloser registers a resource to be closed when the Client shuts down.
func WithDataDir ¶
WithDataDir sets the data directory for cloned repositories and database storage.
func WithEmbeddingBudget ¶
func WithEmbeddingBudget(b search.TokenBudget) Option
WithEmbeddingBudget sets the token budget for code embedding batches.
func WithEmbeddingParallelism ¶
WithEmbeddingParallelism sets how many embedding batches are dispatched concurrently. Defaults to 1. Values <= 0 are ignored.
func WithEmbeddingProvider ¶
WithEmbeddingProvider sets a custom embedding provider.
func WithEnricherParallelism ¶
WithEnricherParallelism sets how many enrichment LLM requests are dispatched concurrently. Defaults to 1. Values <= 0 are ignored.
func WithEnrichmentBudget ¶
func WithEnrichmentBudget(b search.TokenBudget) Option
WithEnrichmentBudget sets the token budget for enrichment embedding batches.
func WithEnrichmentParallelism ¶
WithEnrichmentParallelism sets how many enrichment embedding batches are dispatched concurrently. Defaults to 1. Values <= 0 are ignored.
func WithFullPipeline ¶ added in v1.2.1
func WithFullPipeline() Option
WithFullPipeline runs all indexing operations including LLM enrichments. A text provider must be configured or New() returns an error. This is the default when a text provider is configured.
func WithModelDir ¶
WithModelDir sets the directory where built-in model files are stored. Defaults to {dataDir}/models if not specified.
func WithOpenAI ¶
WithOpenAI sets OpenAI as the AI provider (text + embeddings).
func WithOpenAIConfig ¶
func WithOpenAIConfig(cfg provider.OpenAIConfig) Option
WithOpenAIConfig sets OpenAI with custom configuration.
func WithPeriodicSyncConfig ¶
func WithPeriodicSyncConfig(cfg config.PeriodicSyncConfig) Option
WithPeriodicSyncConfig sets the periodic sync configuration.
func WithPostgresVectorchord ¶
WithPostgresVectorchord configures PostgreSQL with VectorChord extension. VectorChord provides both BM25 and vector search.
func WithRAGPipeline ¶ added in v1.2.1
func WithRAGPipeline() Option
WithRAGPipeline configures the indexing pipeline for RAG use cases. Snippet extraction, BM25 indexing, code embeddings, and AST-based API docs run. All LLM enrichments (commit descriptions, architecture docs, database schema, cookbook, wiki) are skipped even if a text provider is configured.
func WithSQLite ¶
WithSQLite configures SQLite as the database. BM25 uses FTS5, vector search uses the configured embedding provider.
func WithSkipProviderValidation ¶
func WithSkipProviderValidation() Option
WithSkipProviderValidation skips the provider configuration validation. This is intended for testing only. In production, embedding and text providers are required for full functionality.
func WithTextProvider ¶
func WithTextProvider(p provider.TextGenerator) Option
WithTextProvider sets a custom text generation provider.
func WithVisionEmbedder ¶ added in v1.3.1
WithVisionEmbedder sets the vision embedder. The embedder must accept both image items and text items and produce vectors in the same embedding space. When set, replaces the local SigLIP2 model.
func WithWorkerCount ¶
WithWorkerCount sets the number of background worker goroutines. Defaults to 1 if not specified.
func WithWorkerPollPeriod ¶
WithWorkerPollPeriod sets how often the background worker checks for new tasks. Defaults to 1 second. Lower values speed up task processing at the cost of more frequent polling — useful in tests.
type Parameter ¶ added in v1.1.3
type Parameter struct {
// contains filtered or unexported fields
}
Parameter describes a single parameter accepted by an MCP tool.
func NewParameter ¶ added in v1.1.3
NewParameter creates a Parameter.
func (Parameter) Description ¶ added in v1.1.3
Description returns the parameter description.
type Tool ¶ added in v1.1.3
type Tool struct {
// contains filtered or unexported fields
}
Tool describes an MCP tool with its parameters.
func (Tool) Description ¶ added in v1.1.3
Description returns the tool description.
func (Tool) Parameters ¶ added in v1.1.3
Parameters returns a copy of the tool's parameters.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
application
|
|
|
handler
Package handler provides task handlers for processing queued operations.
|
Package handler provides task handlers for processing queued operations. |
|
handler/enrichment
Package enrichment provides task handlers for enrichment operations.
|
Package enrichment provides task handlers for enrichment operations. |
|
service
Package service provides application layer services that orchestrate domain operations.
|
Package service provides application layer services that orchestrate domain operations. |
|
clients
|
|
|
go
Package kodit provides primitives to interact with the openapi HTTP API.
|
Package kodit provides primitives to interact with the openapi HTTP API. |
|
cmd
|
|
|
download-model
command
Standalone tool that converts the st-codesearch-distilroberta-base model to ONNX format for hugot embedding.
|
Standalone tool that converts the st-codesearch-distilroberta-base model to ONNX format for hugot embedding. |
|
download-siglip2
command
Standalone tool that downloads the pre-converted SigLIP2 ONNX model from onnx-community on Hugging Face.
|
Standalone tool that downloads the pre-converted SigLIP2 ONNX model from onnx-community on Hugging Face. |
|
kodit
command
Package main is the entry point for the kodit CLI.
|
Package main is the entry point for the kodit CLI. |
|
domain
|
|
|
enrichment
Package enrichment provides domain types for AI-generated semantic metadata.
|
Package enrichment provides domain types for AI-generated semantic metadata. |
|
repository
Package repository provides Git repository domain types.
|
Package repository provides Git repository domain types. |
|
search
Package search provides search domain types for hybrid code retrieval.
|
Package search provides search domain types for hybrid code retrieval. |
|
service
Package service provides domain service interfaces.
|
Package service provides domain service interfaces. |
|
sourcelocation
Package sourcelocation provides metadata about where an enrichment's content originates within a source file.
|
Package sourcelocation provides metadata about where an enrichment's content originates within a source file. |
|
task
Package task provides task queue domain types for async work processing.
|
Package task provides task queue domain types for async work processing. |
|
tracking
Package tracking provides progress tracking and reporting types for long-running tasks.
|
Package tracking provides progress tracking and reporting types for long-running tasks. |
|
infrastructure
|
|
|
api
Package api provides HTTP server and API documentation.
|
Package api provides HTTP server and API documentation. |
|
api/jsonapi
Package jsonapi provides JSON:API specification compliant types for API responses.
|
Package jsonapi provides JSON:API specification compliant types for API responses. |
|
api/middleware
Package middleware provides HTTP middleware for the API server.
|
Package middleware provides HTTP middleware for the API server. |
|
api/v1
Package v1 provides the v1 API routes.
|
Package v1 provides the v1 API routes. |
|
api/v1/dto
Package dto provides data transfer objects for the API layer.
|
Package dto provides data transfer objects for the API layer. |
|
chunking
Package chunking provides fixed-size text chunking with overlap for RAG indexing.
|
Package chunking provides fixed-size text chunking with overlap for RAG indexing. |
|
enricher
Package enricher provides AI-powered enrichment generation.
|
Package enricher provides AI-powered enrichment generation. |
|
enricher/example
Package example provides extraction of code examples from documentation.
|
Package example provides extraction of code examples from documentation. |
|
git
Package git provides Git repository infrastructure implementations.
|
Package git provides Git repository infrastructure implementations. |
|
persistence
Package persistence provides database storage implementations.
|
Package persistence provides database storage implementations. |
|
provider
Package provider provides AI provider implementations for text generation and embedding generation.
|
Package provider provides AI provider implementations for text generation and embedding generation. |
|
rasterization
Package rasterization converts document pages to images.
|
Package rasterization converts document pages to images. |
|
internal
|
|
|
config
Package config provides application configuration.
|
Package config provides application configuration. |
|
database
Package database provides database connection and session management using GORM.
|
Package database provides database connection and session management using GORM. |
|
log
Package log provides structured logging with correlation IDs.
|
Package log provides structured logging with correlation IDs. |
|
mcp
Package mcp provides Model Context Protocol server functionality.
|
Package mcp provides Model Context Protocol server functionality. |
|
testdb
Package testdb provides a shared test database helper for fast, realistic testing against an in-memory SQLite database.
|
Package testdb provides a shared test database helper for fast, realistic testing against an in-memory SQLite database. |
|
tools
|
|
|
download-ort
command
Build-time tool that downloads the ONNX Runtime shared library and the HuggingFace tokenizers static library for the current platform.
|
Build-time tool that downloads the ONNX Runtime shared library and the HuggingFace tokenizers static library for the current platform. |
