loader

package

v0.1.0 Latest Latest Go to latest Published: Feb 20, 2026 License: MIT Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/BaSui01/agentflow

Links

Open Source Insights

Documentation ¶

Overview ¶

Package loader provides a unified DocumentLoader interface and common file loaders for the RAG pipeline.

It bridges the gap between raw data sources (files, URLs, APIs) and the rag.Document type used by chunkers, retrievers, and vector stores. Each loader reads a specific format and produces []rag.Document with appropriate metadata.

Supported formats out of the box:

Plain text (.txt)
Markdown (.md)
CSV (.csv)
JSON / JSONL (.json, .jsonl)

Use LoaderRegistry to route loading by file extension:

registry := loader.NewLoaderRegistry()
docs, err := registry.Load(ctx, "/path/to/data.csv")

Custom loaders can be registered for any extension:

registry.Register(".xml", myXMLLoader)

Index ¶

type ArxivSourceAdapter
- func NewArxivSourceAdapter(source *sources.ArxivSource, maxResults int) *ArxivSourceAdapter
- func (a *ArxivSourceAdapter) Load(ctx context.Context, source string) ([]rag.Document, error)
- func (a *ArxivSourceAdapter) SupportedTypes() []string
type CSVLoader
- func NewCSVLoader(config CSVLoaderConfig) *CSVLoader
- func (l *CSVLoader) Load(ctx context.Context, source string) ([]rag.Document, error)
- func (l *CSVLoader) SupportedTypes() []string
type CSVLoaderConfig
type DocumentLoader
type GitHubSourceAdapter
- func NewGitHubSourceAdapter(source *sources.GitHubSource, maxResults int) *GitHubSourceAdapter
- func (a *GitHubSourceAdapter) Load(ctx context.Context, source string) ([]rag.Document, error)
- func (a *GitHubSourceAdapter) SupportedTypes() []string
type JSONLoader
- func NewJSONLoader(config JSONLoaderConfig) *JSONLoader
- func (l *JSONLoader) Load(ctx context.Context, source string) ([]rag.Document, error)
- func (l *JSONLoader) SupportedTypes() []string
type JSONLoaderConfig
type LoaderRegistry
- func NewLoaderRegistry() *LoaderRegistry
- func (r *LoaderRegistry) Load(ctx context.Context, source string) ([]rag.Document, error)
- func (r *LoaderRegistry) Register(ext string, loader DocumentLoader)
- func (r *LoaderRegistry) SupportedTypes() []string
type MarkdownLoader
- func NewMarkdownLoader() *MarkdownLoader
- func (l *MarkdownLoader) Load(ctx context.Context, source string) ([]rag.Document, error)
- func (l *MarkdownLoader) SupportedTypes() []string
type TextLoader
- func NewTextLoader() *TextLoader
- func (l *TextLoader) Load(ctx context.Context, source string) ([]rag.Document, error)
- func (l *TextLoader) SupportedTypes() []string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type ArxivSourceAdapter ¶

type ArxivSourceAdapter struct {
	// contains filtered or unexported fields
}

ArxivSourceAdapter adapts sources.ArxivSource to the DocumentLoader interface. It searches arXiv papers by query and converts each result into a rag.Document.

func NewArxivSourceAdapter ¶

func NewArxivSourceAdapter(source *sources.ArxivSource, maxResults int) *ArxivSourceAdapter

NewArxivSourceAdapter creates an adapter around an existing ArxivSource.

func (*ArxivSourceAdapter) Load ¶

func (a *ArxivSourceAdapter) Load(ctx context.Context, source string) ([]rag.Document, error)

Load interprets source as a search query and returns matching papers as Documents.

func (*ArxivSourceAdapter) SupportedTypes ¶

func (a *ArxivSourceAdapter) SupportedTypes() []string

SupportedTypes returns an empty slice; this adapter is query-based, not file-based.

type CSVLoader ¶

type CSVLoader struct {
	// contains filtered or unexported fields
}

CSVLoader loads CSV files. Each row (or group of rows) becomes a Document. The first row is treated as a header.

func NewCSVLoader ¶

func NewCSVLoader(config CSVLoaderConfig) *CSVLoader

NewCSVLoader creates a CSVLoader with the given config.

func (*CSVLoader) Load ¶

func (l *CSVLoader) Load(ctx context.Context, source string) ([]rag.Document, error)

Load reads a CSV file and returns Documents.

func (*CSVLoader) SupportedTypes ¶

func (l *CSVLoader) SupportedTypes() []string

SupportedTypes returns the extensions handled by CSVLoader.

type CSVLoaderConfig ¶

type CSVLoaderConfig struct {
	// Delimiter is the field separator. Defaults to ','.
	Delimiter rune
	// RowsPerDocument controls how many rows are grouped into a single Document.
	// 0 or 1 means each row becomes its own Document.
	RowsPerDocument int
	// ContentColumns lists column names (from the header) to include in Document.Content.
	// If empty, all columns are concatenated.
	ContentColumns []string
}

CSVLoaderConfig configures the CSV loader.

type DocumentLoader ¶

type DocumentLoader interface {
	// Load reads the source and returns documents.
	// source is typically a file path, but loaders may interpret it as a URL or query.
	Load(ctx context.Context, source string) ([]rag.Document, error)

	// SupportedTypes returns the file extensions this loader handles (e.g. ".txt", ".md").
	SupportedTypes() []string
}

DocumentLoader is the unified interface for loading documents from any source.

type GitHubSourceAdapter ¶

type GitHubSourceAdapter struct {
	// contains filtered or unexported fields
}

GitHubSourceAdapter adapts sources.GitHubSource to the DocumentLoader interface. It searches GitHub repos by query and converts each result into a rag.Document.

func NewGitHubSourceAdapter ¶

func NewGitHubSourceAdapter(source *sources.GitHubSource, maxResults int) *GitHubSourceAdapter

NewGitHubSourceAdapter creates an adapter around an existing GitHubSource.

func (*GitHubSourceAdapter) Load ¶

func (a *GitHubSourceAdapter) Load(ctx context.Context, source string) ([]rag.Document, error)

Load interprets source as a search query and returns matching repos as Documents.

func (*GitHubSourceAdapter) SupportedTypes ¶

func (a *GitHubSourceAdapter) SupportedTypes() []string

SupportedTypes returns an empty slice; this adapter is query-based, not file-based.

type JSONLoader ¶

type JSONLoader struct {
	// contains filtered or unexported fields
}

JSONLoader loads JSON (single object or array) and JSONL files.

func NewJSONLoader ¶

func NewJSONLoader(config JSONLoaderConfig) *JSONLoader

NewJSONLoader creates a JSONLoader.

func (*JSONLoader) Load ¶

func (l *JSONLoader) Load(ctx context.Context, source string) ([]rag.Document, error)

Load reads a JSON or JSONL file and returns Documents.

func (*JSONLoader) SupportedTypes ¶

func (l *JSONLoader) SupportedTypes() []string

SupportedTypes returns the extensions handled by JSONLoader.

type JSONLoaderConfig ¶

type JSONLoaderConfig struct {
	// ContentField is the JSON field name to use as Document.Content.
	// If empty, the entire JSON object is serialized as content.
	ContentField string
	// IDField is the JSON field name to use as Document.ID.
	// If empty, a path-based ID is generated.
	IDField string
}

JSONLoaderConfig configures the JSON/JSONL loader.

type LoaderRegistry ¶

type LoaderRegistry struct {
	// contains filtered or unexported fields
}

LoaderRegistry routes Load calls to the appropriate DocumentLoader based on file extension.

func NewLoaderRegistry ¶

func NewLoaderRegistry() *LoaderRegistry

NewLoaderRegistry creates a registry pre-populated with the built-in loaders.

func (*LoaderRegistry) Load ¶

func (r *LoaderRegistry) Load(ctx context.Context, source string) ([]rag.Document, error)

Load determines the loader from the source's file extension and delegates to it.

func (*LoaderRegistry) Register ¶

func (r *LoaderRegistry) Register(ext string, loader DocumentLoader)

Register adds or replaces a loader for the given file extension. ext should include the leading dot (e.g. ".pdf").

func (*LoaderRegistry) SupportedTypes ¶

func (r *LoaderRegistry) SupportedTypes() []string

SupportedTypes returns all registered extensions, sorted.

type MarkdownLoader ¶

type MarkdownLoader struct{}

MarkdownLoader loads Markdown files, splitting by top-level headings. Each heading section becomes a separate Document with the heading preserved in metadata. If the file has no headings, the entire content is returned as a single Document.

func NewMarkdownLoader ¶

func NewMarkdownLoader() *MarkdownLoader

NewMarkdownLoader creates a MarkdownLoader.

func (*MarkdownLoader) Load ¶

func (l *MarkdownLoader) Load(ctx context.Context, source string) ([]rag.Document, error)

Load reads a Markdown file and splits it into Documents by heading.

func (*MarkdownLoader) SupportedTypes ¶

func (l *MarkdownLoader) SupportedTypes() []string

SupportedTypes returns the extensions handled by MarkdownLoader.

type TextLoader ¶

type TextLoader struct{}

TextLoader loads plain text files as a single Document.

func NewTextLoader ¶

func NewTextLoader() *TextLoader

NewTextLoader creates a TextLoader.

func (*TextLoader) Load ¶

func (l *TextLoader) Load(ctx context.Context, source string) ([]rag.Document, error)

Load reads a text file and returns it as a single Document.

func (*TextLoader) SupportedTypes ¶

func (l *TextLoader) SupportedTypes() []string

SupportedTypes returns the extensions handled by TextLoader.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL