ingest

package
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 11, 2026 License: Apache-2.0 Imports: 12 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	BatchSize      int
	ContentField   string
	IDField        string
	MetadataFields []string // specific fields to extract
	AllMetadata    bool     // extract all fields except content/id
	Limit          int      // max records to ingest, 0 = unlimited
}

func DefaultConfig

func DefaultConfig() *Config

type Processor

type Processor struct {
	// contains filtered or unexported fields
}

func NewProcessor

func NewProcessor(cfg *Config) *Processor

func (*Processor) ProcessJSONL

func (p *Processor) ProcessJSONL(filePath string) (<-chan *Record, <-chan error)

ProcessJSONL reads a JSONL file and streams records through the provided channel. The channel is closed when processing is complete or an unrecoverable error occurs.

func (*Processor) ProcessJSONLFull

func (p *Processor) ProcessJSONLFull(filePath string) ([]*Record, error)

ProcessJSONLFull reads entire file and returns all records (for small files).

func (*Processor) ProcessParquet

func (p *Processor) ProcessParquet(filePath string) (<-chan *Record, <-chan error)

ProcessParquet reads a Parquet file and streams records through the provided channel. It uses parquet-go's GenericReader to read rows as map[string]any, then converts each row to a Record using the same extractRecord logic as JSONL.

type Record

type Record struct {
	ID       string
	Content  string
	Metadata map[string]any
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL