Documentation
¶
Overview ¶
Package loader provides a unified DocumentLoader interface and common file loaders for the RAG pipeline.
It bridges the gap between raw data sources (files, URLs, APIs) and the rag.Document type used by chunkers, retrievers, and vector stores. Each loader reads a specific format and produces []rag.Document with appropriate metadata.
Supported formats out of the box:
- Plain text (.txt)
- Markdown (.md)
- CSV (.csv)
- JSON / JSONL (.json, .jsonl)
Use LoaderRegistry to route loading by file extension:
registry := loader.NewLoaderRegistry() docs, err := registry.Load(ctx, "/path/to/data.csv")
Custom loaders can be registered for any extension:
registry.Register(".xml", myXMLLoader)
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ArxivSourceAdapter ¶
type ArxivSourceAdapter struct {
// contains filtered or unexported fields
}
ArxivSourceAdapter adapts sources.ArxivSource to the DocumentLoader interface. It searches arXiv papers by query and converts each result into a rag.Document.
func NewArxivSourceAdapter ¶
func NewArxivSourceAdapter(source *sources.ArxivSource, maxResults int) *ArxivSourceAdapter
NewArxivSourceAdapter creates an adapter around an existing ArxivSource.
func (*ArxivSourceAdapter) Load ¶
Load interprets source as a search query and returns matching papers as Documents.
func (*ArxivSourceAdapter) SupportedTypes ¶
func (a *ArxivSourceAdapter) SupportedTypes() []string
SupportedTypes returns an empty slice; this adapter is query-based, not file-based.
type CSVLoader ¶
type CSVLoader struct {
// contains filtered or unexported fields
}
CSVLoader loads CSV files. Each row (or group of rows) becomes a Document. The first row is treated as a header.
func NewCSVLoader ¶
func NewCSVLoader(config CSVLoaderConfig) *CSVLoader
NewCSVLoader creates a CSVLoader with the given config.
func (*CSVLoader) SupportedTypes ¶
SupportedTypes returns the extensions handled by CSVLoader.
type CSVLoaderConfig ¶
type CSVLoaderConfig struct {
// Delimiter is the field separator. Defaults to ','.
Delimiter rune
// RowsPerDocument controls how many rows are grouped into a single Document.
// 0 or 1 means each row becomes its own Document.
RowsPerDocument int
// ContentColumns lists column names (from the header) to include in Document.Content.
// If empty, all columns are concatenated.
ContentColumns []string
}
CSVLoaderConfig configures the CSV loader.
type DocumentLoader ¶
type DocumentLoader interface {
// Load reads the source and returns documents.
// source is typically a file path, but loaders may interpret it as a URL or query.
Load(ctx context.Context, source string) ([]rag.Document, error)
// SupportedTypes returns the file extensions this loader handles (e.g. ".txt", ".md").
SupportedTypes() []string
}
DocumentLoader is the unified interface for loading documents from any source.
type GitHubSourceAdapter ¶
type GitHubSourceAdapter struct {
// contains filtered or unexported fields
}
GitHubSourceAdapter adapts sources.GitHubSource to the DocumentLoader interface. It searches GitHub repos by query and converts each result into a rag.Document.
func NewGitHubSourceAdapter ¶
func NewGitHubSourceAdapter(source *sources.GitHubSource, maxResults int) *GitHubSourceAdapter
NewGitHubSourceAdapter creates an adapter around an existing GitHubSource.
func (*GitHubSourceAdapter) Load ¶
Load interprets source as a search query and returns matching repos as Documents.
func (*GitHubSourceAdapter) SupportedTypes ¶
func (a *GitHubSourceAdapter) SupportedTypes() []string
SupportedTypes returns an empty slice; this adapter is query-based, not file-based.
type JSONLoader ¶
type JSONLoader struct {
// contains filtered or unexported fields
}
JSONLoader loads JSON (single object or array) and JSONL files.
func NewJSONLoader ¶
func NewJSONLoader(config JSONLoaderConfig) *JSONLoader
NewJSONLoader creates a JSONLoader.
func (*JSONLoader) SupportedTypes ¶
func (l *JSONLoader) SupportedTypes() []string
SupportedTypes returns the extensions handled by JSONLoader.
type JSONLoaderConfig ¶
type JSONLoaderConfig struct {
// ContentField is the JSON field name to use as Document.Content.
// If empty, the entire JSON object is serialized as content.
ContentField string
// IDField is the JSON field name to use as Document.ID.
// If empty, a path-based ID is generated.
IDField string
}
JSONLoaderConfig configures the JSON/JSONL loader.
type LoaderRegistry ¶
type LoaderRegistry struct {
// contains filtered or unexported fields
}
LoaderRegistry routes Load calls to the appropriate DocumentLoader based on file extension.
func NewLoaderRegistry ¶
func NewLoaderRegistry() *LoaderRegistry
NewLoaderRegistry creates a registry pre-populated with the built-in loaders.
func (*LoaderRegistry) Load ¶
Load determines the loader from the source's file extension and delegates to it.
func (*LoaderRegistry) Register ¶
func (r *LoaderRegistry) Register(ext string, loader DocumentLoader)
Register adds or replaces a loader for the given file extension. ext should include the leading dot (e.g. ".pdf").
func (*LoaderRegistry) SupportedTypes ¶
func (r *LoaderRegistry) SupportedTypes() []string
SupportedTypes returns all registered extensions, sorted.
type MarkdownLoader ¶
type MarkdownLoader struct{}
MarkdownLoader loads Markdown files, splitting by top-level headings. Each heading section becomes a separate Document with the heading preserved in metadata. If the file has no headings, the entire content is returned as a single Document.
func NewMarkdownLoader ¶
func NewMarkdownLoader() *MarkdownLoader
NewMarkdownLoader creates a MarkdownLoader.
func (*MarkdownLoader) SupportedTypes ¶
func (l *MarkdownLoader) SupportedTypes() []string
SupportedTypes returns the extensions handled by MarkdownLoader.
type TextLoader ¶
type TextLoader struct{}
TextLoader loads plain text files as a single Document.
func (*TextLoader) SupportedTypes ¶
func (l *TextLoader) SupportedTypes() []string
SupportedTypes returns the extensions handled by TextLoader.