documentloaders

package
v0.27.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 25, 2026 License: MIT Imports: 15 Imported by: 0

Documentation

Overview

Package documentloaders provides document loading utilities for RAG applications. It includes loaders for git repositories and other document sources with support for streaming, batch processing, and memory protection.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrInvalidPath is returned when the repository path is invalid.
	ErrInvalidPath = errors.New("documentloaders: invalid repository path")
	// ErrNilRegistry is returned when the parser registry is nil.
	ErrNilRegistry = errors.New("documentloaders: parser registry is nil")
	// ErrPathNotExist is returned when the path does not exist.
	ErrPathNotExist = errors.New("documentloaders: path does not exist")
	// ErrMemoryLimitExceeded is returned when memory limit is exceeded during loading.
	ErrMemoryLimitExceeded = errors.New("documentloaders: memory limit exceeded")
)

Error variables for document loading operations.

Functions

This section is empty.

Types

type CLICommandLoader added in v0.2.0

type CLICommandLoader struct {
	Command string
	Args    []string
}

func NewCLICommandLoader added in v0.2.0

func NewCLICommandLoader(command string, args ...string) *CLICommandLoader

func (*CLICommandLoader) Load added in v0.2.0

type FileData added in v0.15.0

type FileData struct {
	// Path is the file path relative to the repository root.
	Path string
	// Content is the file content.
	Content string
	// FileInfo contains file metadata.
	FileInfo fs.FileInfo
}

FileData is an in-memory representation of a file to be processed.

type GitLoader

type GitLoader struct {
	// contains filtered or unexported fields
}

GitLoader loads and processes documents from a git repository on the local file system. It supports batch processing, parallel file processing, and memory protection.

func NewGit

func NewGit(path string, registry parsers.ParserRegistry, opts ...GitLoaderOption) (*GitLoader, error)

NewGit creates a new GitLoader for the specified repository path. Returns an error if the path is invalid, registry is nil, or path doesn't exist.

func (*GitLoader) Load

func (g *GitLoader) Load(ctx context.Context) ([]schema.Document, error)

func (*GitLoader) LoadAndProcessStream added in v0.10.0

func (g *GitLoader) LoadAndProcessStream(ctx context.Context, processFn func(ctx context.Context, docs []schema.Document) error) error

LoadAndProcessStream uses a pipeline with controlled memory usage

type GitLoaderOption added in v0.2.0

type GitLoaderOption func(*gitLoaderOptions)

GitLoaderOption configures a GitLoader.

func WithBatchSize added in v0.15.0

func WithBatchSize(size int) GitLoaderOption

WithBatchSize sets the batch size for document processing.

func WithExcludeDirs added in v0.4.0

func WithExcludeDirs(dirs []string) GitLoaderOption

WithExcludeDirs sets directory names to exclude from loading.

func WithExcludeExts added in v0.4.0

func WithExcludeExts(exts []string) GitLoaderOption

WithExcludeExts sets file extensions to exclude from loading. Extensions can be provided with or without the leading dot.

func WithGeneratedCodeDetection added in v0.15.0

func WithGeneratedCodeDetection(enable bool) GitLoaderOption

WithGeneratedCodeDetection enables or disables detection of auto-generated code. When enabled, files detected as generated will be skipped.

func WithIncludeExts added in v0.2.0

func WithIncludeExts(exts []string) GitLoaderOption

WithIncludeExts sets file extensions to include in loading. If set, only files with these extensions will be loaded.

func WithLogger added in v0.2.0

func WithLogger(logger *slog.Logger) GitLoaderOption

WithLogger sets the logger for the loader.

func WithMaxMemoryBuffer added in v0.15.0

func WithMaxMemoryBuffer(bytes int64) GitLoaderOption

WithMaxMemoryBuffer sets the maximum memory buffer in bytes. Processing will pause when memory usage exceeds this limit.

func WithWorkerCount added in v0.15.0

func WithWorkerCount(count int) GitLoaderOption

WithWorkerCount sets the number of parallel workers for file processing.

type Loader

type Loader interface {
	// Load loads all documents from the source.
	Load(ctx context.Context) ([]schema.Document, error)
	// LoadAndProcessStream loads documents in batches and processes them.
	LoadAndProcessStream(ctx context.Context, processFn func(ctx context.Context, docs []schema.Document) error) error
}

Loader is the interface for document loaders.

type RemoteGitRepoLoader added in v0.2.0

type RemoteGitRepoLoader struct {
	RepoURL        string
	ParserRegistry parsers.ParserRegistry
	Logger         *slog.Logger
}

func NewRemoteGitRepoLoader added in v0.2.0

func NewRemoteGitRepoLoader(repoURL string, registry parsers.ParserRegistry, logger *slog.Logger) *RemoteGitRepoLoader

func (*RemoteGitRepoLoader) Load added in v0.2.0

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL