loader

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 20, 2026 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package loader provides a unified DocumentLoader interface and common file loaders for the RAG pipeline.

It bridges the gap between raw data sources (files, URLs, APIs) and the rag.Document type used by chunkers, retrievers, and vector stores. Each loader reads a specific format and produces []rag.Document with appropriate metadata.

Supported formats out of the box:

  • Plain text (.txt)
  • Markdown (.md)
  • CSV (.csv)
  • JSON / JSONL (.json, .jsonl)

Use LoaderRegistry to route loading by file extension:

registry := loader.NewLoaderRegistry()
docs, err := registry.Load(ctx, "/path/to/data.csv")

Custom loaders can be registered for any extension:

registry.Register(".xml", myXMLLoader)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ArxivSourceAdapter

type ArxivSourceAdapter struct {
	// contains filtered or unexported fields
}

ArxivSourceAdapter adapts sources.ArxivSource to the DocumentLoader interface. It searches arXiv papers by query and converts each result into a rag.Document.

func NewArxivSourceAdapter

func NewArxivSourceAdapter(source *sources.ArxivSource, maxResults int) *ArxivSourceAdapter

NewArxivSourceAdapter creates an adapter around an existing ArxivSource.

func (*ArxivSourceAdapter) Load

func (a *ArxivSourceAdapter) Load(ctx context.Context, source string) ([]rag.Document, error)

Load interprets source as a search query and returns matching papers as Documents.

func (*ArxivSourceAdapter) SupportedTypes

func (a *ArxivSourceAdapter) SupportedTypes() []string

SupportedTypes returns an empty slice; this adapter is query-based, not file-based.

type CSVLoader

type CSVLoader struct {
	// contains filtered or unexported fields
}

CSVLoader loads CSV files. Each row (or group of rows) becomes a Document. The first row is treated as a header.

func NewCSVLoader

func NewCSVLoader(config CSVLoaderConfig) *CSVLoader

NewCSVLoader creates a CSVLoader with the given config.

func (*CSVLoader) Load

func (l *CSVLoader) Load(ctx context.Context, source string) ([]rag.Document, error)

Load reads a CSV file and returns Documents.

func (*CSVLoader) SupportedTypes

func (l *CSVLoader) SupportedTypes() []string

SupportedTypes returns the extensions handled by CSVLoader.

type CSVLoaderConfig

type CSVLoaderConfig struct {
	// Delimiter is the field separator. Defaults to ','.
	Delimiter rune
	// RowsPerDocument controls how many rows are grouped into a single Document.
	// 0 or 1 means each row becomes its own Document.
	RowsPerDocument int
	// ContentColumns lists column names (from the header) to include in Document.Content.
	// If empty, all columns are concatenated.
	ContentColumns []string
}

CSVLoaderConfig configures the CSV loader.

type DocumentLoader

type DocumentLoader interface {
	// Load reads the source and returns documents.
	// source is typically a file path, but loaders may interpret it as a URL or query.
	Load(ctx context.Context, source string) ([]rag.Document, error)

	// SupportedTypes returns the file extensions this loader handles (e.g. ".txt", ".md").
	SupportedTypes() []string
}

DocumentLoader is the unified interface for loading documents from any source.

type GitHubSourceAdapter

type GitHubSourceAdapter struct {
	// contains filtered or unexported fields
}

GitHubSourceAdapter adapts sources.GitHubSource to the DocumentLoader interface. It searches GitHub repos by query and converts each result into a rag.Document.

func NewGitHubSourceAdapter

func NewGitHubSourceAdapter(source *sources.GitHubSource, maxResults int) *GitHubSourceAdapter

NewGitHubSourceAdapter creates an adapter around an existing GitHubSource.

func (*GitHubSourceAdapter) Load

func (a *GitHubSourceAdapter) Load(ctx context.Context, source string) ([]rag.Document, error)

Load interprets source as a search query and returns matching repos as Documents.

func (*GitHubSourceAdapter) SupportedTypes

func (a *GitHubSourceAdapter) SupportedTypes() []string

SupportedTypes returns an empty slice; this adapter is query-based, not file-based.

type JSONLoader

type JSONLoader struct {
	// contains filtered or unexported fields
}

JSONLoader loads JSON (single object or array) and JSONL files.

func NewJSONLoader

func NewJSONLoader(config JSONLoaderConfig) *JSONLoader

NewJSONLoader creates a JSONLoader.

func (*JSONLoader) Load

func (l *JSONLoader) Load(ctx context.Context, source string) ([]rag.Document, error)

Load reads a JSON or JSONL file and returns Documents.

func (*JSONLoader) SupportedTypes

func (l *JSONLoader) SupportedTypes() []string

SupportedTypes returns the extensions handled by JSONLoader.

type JSONLoaderConfig

type JSONLoaderConfig struct {
	// ContentField is the JSON field name to use as Document.Content.
	// If empty, the entire JSON object is serialized as content.
	ContentField string
	// IDField is the JSON field name to use as Document.ID.
	// If empty, a path-based ID is generated.
	IDField string
}

JSONLoaderConfig configures the JSON/JSONL loader.

type LoaderRegistry

type LoaderRegistry struct {
	// contains filtered or unexported fields
}

LoaderRegistry routes Load calls to the appropriate DocumentLoader based on file extension.

func NewLoaderRegistry

func NewLoaderRegistry() *LoaderRegistry

NewLoaderRegistry creates a registry pre-populated with the built-in loaders.

func (*LoaderRegistry) Load

func (r *LoaderRegistry) Load(ctx context.Context, source string) ([]rag.Document, error)

Load determines the loader from the source's file extension and delegates to it.

func (*LoaderRegistry) Register

func (r *LoaderRegistry) Register(ext string, loader DocumentLoader)

Register adds or replaces a loader for the given file extension. ext should include the leading dot (e.g. ".pdf").

func (*LoaderRegistry) SupportedTypes

func (r *LoaderRegistry) SupportedTypes() []string

SupportedTypes returns all registered extensions, sorted.

type MarkdownLoader

type MarkdownLoader struct{}

MarkdownLoader loads Markdown files, splitting by top-level headings. Each heading section becomes a separate Document with the heading preserved in metadata. If the file has no headings, the entire content is returned as a single Document.

func NewMarkdownLoader

func NewMarkdownLoader() *MarkdownLoader

NewMarkdownLoader creates a MarkdownLoader.

func (*MarkdownLoader) Load

func (l *MarkdownLoader) Load(ctx context.Context, source string) ([]rag.Document, error)

Load reads a Markdown file and splits it into Documents by heading.

func (*MarkdownLoader) SupportedTypes

func (l *MarkdownLoader) SupportedTypes() []string

SupportedTypes returns the extensions handled by MarkdownLoader.

type TextLoader

type TextLoader struct{}

TextLoader loads plain text files as a single Document.

func NewTextLoader

func NewTextLoader() *TextLoader

NewTextLoader creates a TextLoader.

func (*TextLoader) Load

func (l *TextLoader) Load(ctx context.Context, source string) ([]rag.Document, error)

Load reads a text file and returns it as a single Document.

func (*TextLoader) SupportedTypes

func (l *TextLoader) SupportedTypes() []string

SupportedTypes returns the extensions handled by TextLoader.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL