Documentation
¶
Overview ¶
Package documentloaders provides document loading utilities for RAG applications. It includes loaders for git repositories and other document sources with support for streaming, batch processing, and memory protection.
Index ¶
- Variables
- type CLICommandLoader
- type FileData
- type GitLoader
- type GitLoaderOption
- func WithBatchSize(size int) GitLoaderOption
- func WithExcludeDirs(dirs []string) GitLoaderOption
- func WithExcludeExts(exts []string) GitLoaderOption
- func WithGeneratedCodeDetection(enable bool) GitLoaderOption
- func WithIncludeExts(exts []string) GitLoaderOption
- func WithLogger(logger *slog.Logger) GitLoaderOption
- func WithMaxMemoryBuffer(bytes int64) GitLoaderOption
- func WithWorkerCount(count int) GitLoaderOption
- type Loader
- type RemoteGitRepoLoader
Constants ¶
This section is empty.
Variables ¶
var ( // ErrInvalidPath is returned when the repository path is invalid. ErrInvalidPath = errors.New("documentloaders: invalid repository path") // ErrNilRegistry is returned when the parser registry is nil. ErrNilRegistry = errors.New("documentloaders: parser registry is nil") // ErrPathNotExist is returned when the path does not exist. ErrPathNotExist = errors.New("documentloaders: path does not exist") // ErrMemoryLimitExceeded is returned when memory limit is exceeded during loading. ErrMemoryLimitExceeded = errors.New("documentloaders: memory limit exceeded") )
Error variables for document loading operations.
Functions ¶
This section is empty.
Types ¶
type CLICommandLoader ¶ added in v0.2.0
func NewCLICommandLoader ¶ added in v0.2.0
func NewCLICommandLoader(command string, args ...string) *CLICommandLoader
type FileData ¶ added in v0.15.0
type FileData struct {
// Path is the file path relative to the repository root.
Path string
// Content is the file content.
Content string
// FileInfo contains file metadata.
FileInfo fs.FileInfo
}
FileData is an in-memory representation of a file to be processed.
type GitLoader ¶
type GitLoader struct {
// contains filtered or unexported fields
}
GitLoader loads and processes documents from a git repository on the local file system. It supports batch processing, parallel file processing, and memory protection.
func NewGit ¶
func NewGit(path string, registry parsers.ParserRegistry, opts ...GitLoaderOption) (*GitLoader, error)
NewGit creates a new GitLoader for the specified repository path. Returns an error if the path is invalid, registry is nil, or path doesn't exist.
type GitLoaderOption ¶ added in v0.2.0
type GitLoaderOption func(*gitLoaderOptions)
GitLoaderOption configures a GitLoader.
func WithBatchSize ¶ added in v0.15.0
func WithBatchSize(size int) GitLoaderOption
WithBatchSize sets the batch size for document processing.
func WithExcludeDirs ¶ added in v0.4.0
func WithExcludeDirs(dirs []string) GitLoaderOption
WithExcludeDirs sets directory names to exclude from loading.
func WithExcludeExts ¶ added in v0.4.0
func WithExcludeExts(exts []string) GitLoaderOption
WithExcludeExts sets file extensions to exclude from loading. Extensions can be provided with or without the leading dot.
func WithGeneratedCodeDetection ¶ added in v0.15.0
func WithGeneratedCodeDetection(enable bool) GitLoaderOption
WithGeneratedCodeDetection enables or disables detection of auto-generated code. When enabled, files detected as generated will be skipped.
func WithIncludeExts ¶ added in v0.2.0
func WithIncludeExts(exts []string) GitLoaderOption
WithIncludeExts sets file extensions to include in loading. If set, only files with these extensions will be loaded.
func WithLogger ¶ added in v0.2.0
func WithLogger(logger *slog.Logger) GitLoaderOption
WithLogger sets the logger for the loader.
func WithMaxMemoryBuffer ¶ added in v0.15.0
func WithMaxMemoryBuffer(bytes int64) GitLoaderOption
WithMaxMemoryBuffer sets the maximum memory buffer in bytes. Processing will pause when memory usage exceeds this limit.
func WithWorkerCount ¶ added in v0.15.0
func WithWorkerCount(count int) GitLoaderOption
WithWorkerCount sets the number of parallel workers for file processing.
type Loader ¶
type Loader interface {
// Load loads all documents from the source.
Load(ctx context.Context) ([]schema.Document, error)
// LoadAndProcessStream loads documents in batches and processes them.
LoadAndProcessStream(ctx context.Context, processFn func(ctx context.Context, docs []schema.Document) error) error
}
Loader is the interface for document loaders.
type RemoteGitRepoLoader ¶ added in v0.2.0
type RemoteGitRepoLoader struct {
RepoURL string
ParserRegistry parsers.ParserRegistry
Logger *slog.Logger
}
func NewRemoteGitRepoLoader ¶ added in v0.2.0
func NewRemoteGitRepoLoader(repoURL string, registry parsers.ParserRegistry, logger *slog.Logger) *RemoteGitRepoLoader