Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct {
BatchSize int
ContentField string
IDField string
MetadataFields []string // specific fields to extract
AllMetadata bool // extract all fields except content/id
Limit int // max records to ingest, 0 = unlimited
}
func DefaultConfig ¶
func DefaultConfig() *Config
type Processor ¶
type Processor struct {
// contains filtered or unexported fields
}
func NewProcessor ¶
func (*Processor) ProcessJSONL ¶
ProcessJSONL reads a JSONL file and streams records through the provided channel. The channel is closed when processing is complete or an unrecoverable error occurs.
func (*Processor) ProcessJSONLFull ¶
ProcessJSONLFull reads entire file and returns all records (for small files).
func (*Processor) ProcessParquet ¶
ProcessParquet reads a Parquet file and streams records through the provided channel. It uses parquet-go's GenericReader to read rows as map[string]any, then converts each row to a Record using the same extractRecord logic as JSONL.
Click to show internal directories.
Click to hide internal directories.