Documentation
¶
Overview ¶
Package io defines the I/O pipeline framework for Pulse: Reader/Writer interfaces, schema inference, and job types (ImportJob, ExportJob, ConvertJob).
Format-specific adapters live in sub-packages (csv, tsv).
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ErrStopIteration ¶
func ErrStopIteration() error
ErrStopIteration returns the stop iteration sentinel for use by readers.
Types ¶
type ConvertJob ¶
type ConvertJob struct {
Source Reader
Target Writer
Schema *encoding.Schema
KeepPulseAt string // optional: also write intermediate .pulse
SampleRows int
FS afero.Fs
// Includes restricts the export half to the named schema fields,
// in schema order. Nil / empty means write every field. The
// intermediate .pulse file (when KeepPulseAt is set) always
// carries the full schema — projection is an output-time overlay,
// not an on-disk schema change. See ExportJob.Includes.
Includes []string
// Labels apply to the export phase only — the import side reads
// raw source bytes and has no use for label translation. See
// ExportJob.Labels.
Labels []*types.LabelBinding
// LabelResolver carries the runtime resolver. See
// ExportJob.LabelResolver.
LabelResolver LabelResolver
}
ConvertJob chains import and export with no intermediate file on disk (unless KeepPulseAt is set).
func NewConvertJob ¶
func NewConvertJob(source Reader, target Writer) *ConvertJob
NewConvertJob creates a ConvertJob with default settings.
func (*ConvertJob) Predict ¶
func (j *ConvertJob) Predict(ctx context.Context) (*PredictReport, error)
Predict validates the conversion without writing.
func (*ConvertJob) Run ¶
func (j *ConvertJob) Run(ctx context.Context) (*ConvertReport, error)
Run executes the convert job, streaming from source to target. If KeepPulseAt is set, also writes an intermediate .pulse file.
type ConvertReport ¶
type ConvertReport struct {
RowsConverted int
Schema *encoding.Schema
RowErrors []RowError
LabelWarnings []LabelWarning
}
ConvertReport summarizes the result of a convert operation.
type ExportJob ¶
type ExportJob struct {
Source string // input .pulse path
Target Writer
FS afero.Fs
// Includes restricts the export to the named source-schema fields,
// in source-schema order. Nil / empty means export every field
// (prior behaviour). Names must match Schema.Fields[i].Name
// exactly. Unknown names return PULSE_EXPORT_FIELD_UNKNOWN with
// the offending name + list of known fields. Label augment
// siblings ("<field>_label") are emitted only for included source
// fields; replace mode applies to included fields as before.
Includes []string
// Labels rewrites or augments categorical column values during
// export using embedder-registered label tables. Bindings name a
// categorical field plus a label table; mode=replace overwrites
// the column value, mode=augment emits a sibling "<field>_label"
// column. See types.LabelBinding for semantics. Nil/empty means
// the exporter writes raw resolved categorical values, matching
// pre-label behaviour.
Labels []*types.LabelBinding
// LabelResolver carries the runtime resolver built from Labels +
// the pulse Service's registered LabelTables. The pulse.Export
// facade builds the resolver and sets this field; callers using
// io.ExportJob directly without a Pulse instance must supply a
// satisfying implementation (typically wrapping
// processing.BuildLabelResolver). Nil means no label translation.
LabelResolver LabelResolver
}
ExportJob converts a .pulse file into tabular output.
func NewExportJob ¶
NewExportJob creates an ExportJob.
type ExportReport ¶
type ExportReport struct {
RowsExported int
RowErrors []RowError
LabelWarnings []LabelWarning
}
ExportReport summarizes the result of an export operation.
type ImportJob ¶
type ImportJob struct {
Source Reader
Target string // output .pulse path
Schema *encoding.Schema
SampleRows int // default 500, min 50
FS afero.Fs
}
ImportJob converts tabular source data into a .pulse file.
func NewImportJob ¶
NewImportJob creates an ImportJob with default settings.
type ImportReport ¶
ImportReport summarizes the result of an import operation.
type InferenceWarning ¶
InferenceWarning records a non-fatal observation during inference.
func InferSchema ¶
InferSchema samples up to sampleRows rows from reader and proposes a Schema. If sampleRows <= 0, defaultSampleRows is used. The minimum is minSampleRows.
type LabelResolver ¶ added in v0.10.1
type LabelResolver interface {
// Has reports whether the resolver carries any binding for the
// named field.
Has(field string) bool
// Mode returns the binding mode for the named field. Empty when
// no binding exists.
Mode(field string) types.LabelMode
// Apply translates a raw value through the binding for the named
// field. See processing.LabelResolver.Apply for semantics.
Apply(field, raw string) (out string, sibling string, ok bool)
// FieldsWithAugment returns the field names that should emit a
// "<field>_label" sibling column.
FieldsWithAugment() []string
// Warnings flushes the per-binding collision and miss summaries
// once the job finishes.
Warnings() []LabelWarning
}
LabelResolver is the io-package projection of the processing-layer label resolver. ExportJob / ConvertJob consume the interface so io stays free of the processing dependency.
processing.LabelResolver satisfies this interface as-is; the pulse-package facade builds the resolver from the user-facing LabelBinding slice + LabelTable registry and hands the value to the job via ExportJob.LabelResolver.
type LabelWarning ¶ added in v0.10.1
LabelWarning is the io-package projection of one resolver diagnostic record. ExportReport surfaces these for the caller; the CLI / MCP envelope wrapper promotes them to envelope warnings.
type PredictReport ¶
type PredictReport struct {
Schema *encoding.Schema
EstimatedRows int
Warnings []InferenceWarning
}
PredictReport summarizes a validation-only run.
type Reader ¶
type Reader interface {
// ReadHeader returns column names from the source.
ReadHeader() ([]string, error)
// ReadRows streams rows; calls fn for each row.
// The context controls cancellation.
ReadRows(ctx context.Context, fn func(row []string) error) error
// Close releases underlying resources.
Close() error
}
Reader reads tabular data from a source format.
type ResetReader ¶
ResetReader is an optional interface for readers that support rewinding to the beginning. This is needed for schema inference followed by import.
type SchemaAwareWriter ¶ added in v0.2.0
SchemaAwareWriter is an optional extension of Writer for targets that emit native typed columns (Arrow, Parquet, Excel) and want the source .pulse schema to drive column-type selection. ExportJob calls SetPulseSchema before WriteHeader on writers that implement this interface, then passes typed values through WriteRow:
- encoding.Decimal128 for decimal128 / nullable_decimal128 columns
- canonical strings for narrow types (current behavior)
Writers that do not implement SchemaAwareWriter receive only canonical strings, which is the prior text-only export contract.
Directories
¶
| Path | Synopsis |
|---|---|
|
Package arrow provides Arrow IPC (Feather V2) import and export for the pulse I/O pipeline, plus shared Arrow<->Pulse type-mapping helpers used by both this package and io/parquet.
|
Package arrow provides Arrow IPC (Feather V2) import and export for the pulse I/O pipeline, plus shared Arrow<->Pulse type-mapping helpers used by both this package and io/parquet. |
|
Package csv provides CSV format adapters for the Pulse I/O pipeline.
|
Package csv provides CSV format adapters for the Pulse I/O pipeline. |
|
Package excel provides Excel import and export for the pulse I/O pipeline.
|
Package excel provides Excel import and export for the pulse I/O pipeline. |
|
Package format dispatches tabular Reader construction by format identifier, sitting between the io/ interface definitions and the per-format leaf packages (io/csv, io/tsv, io/ndjson, io/jsonarray, io/parquet, io/arrow, io/excel).
|
Package format dispatches tabular Reader construction by format identifier, sitting between the io/ interface definitions and the per-format leaf packages (io/csv, io/tsv, io/ndjson, io/jsonarray, io/parquet, io/arrow, io/excel). |
|
Package jsonarray provides JSON-array import and export for the pulse I/O pipeline.
|
Package jsonarray provides JSON-array import and export for the pulse I/O pipeline. |
|
Package jsonshared holds value coercion helpers shared by the ndjson and jsonarray packages.
|
Package jsonshared holds value coercion helpers shared by the ndjson and jsonarray packages. |
|
Package ndjson provides NDJSON (newline-delimited JSON) import and export for the pulse I/O pipeline.
|
Package ndjson provides NDJSON (newline-delimited JSON) import and export for the pulse I/O pipeline. |
|
Package parquet provides Parquet import and export for the pulse I/O pipeline.
|
Package parquet provides Parquet import and export for the pulse I/O pipeline. |
|
Package tsv provides TSV import and export for the pulse I/O pipeline.
|
Package tsv provides TSV import and export for the pulse I/O pipeline. |