base

package
v1.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 17, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ContentExtractor

type ContentExtractor func(ctx context.Context, r io.Reader) (string, error)

ContentExtractor is a simple function signature that takes an io.Reader and returns the full text. This is used for legacy parsers or file types that cannot be easily streamed (like Docx/PDF).

type GenericStreamWrapper

type GenericStreamWrapper struct {
	// contains filtered or unexported fields
}

GenericStreamWrapper wraps a full-read extractor into a dataprep.Parser (streaming interface). It's a bridge for legacy parsers.

func NewGenericStreamWrapper

func NewGenericStreamWrapper(name string, types []string, extractor ContentExtractor) *GenericStreamWrapper

func (*GenericStreamWrapper) GetSupportedTypes

func (w *GenericStreamWrapper) GetSupportedTypes() []string

func (*GenericStreamWrapper) ParseStream

func (w *GenericStreamWrapper) ParseStream(ctx context.Context, r io.Reader, metadata map[string]any) (<-chan *entity.Document, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL