arrow

package

v0.8.3 Latest Latest Go to latest Published: May 18, 2026 License: MIT Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/frankbardon/pulse

Links

Open Source Insights

Documentation ¶

Overview ¶

Package arrow provides Arrow IPC (Feather V2) import and export for the pulse I/O pipeline, plus shared Arrow<->Pulse type-mapping helpers used by both this package and io/parquet.

Format scope:

This package implements the Arrow IPC FILE format (random-access, footer-indexed). The standard file extensions are .arrow and .feather; Feather V2 is identical to the Arrow IPC file format on disk.
The streaming variant (Arrow IPC stream, .arrows) is intentionally NOT supported. The file format is sufficient for batch import/export and trivially seekable, which lets the reader iterate batches without speculative buffering.

Memory model:

The reader materializes the full table into memory at init time, mirroring the policy in io/parquet. Files larger than available memory are out of scope for v1; users that need bounded streaming should use NDJSON or CSV.

Type policy:

All values written by Writer are encoded as Arrow String. Type-promotion on write is intentionally deferred until a real consumer demands it; this matches the parquet writer policy and keeps the export path simple.

Index ¶

Constants
func AppendDecimal128(b array.Builder, f encoding.Field, v any) error
func FieldFromPulse(f encoding.Field) arrow.Field
func FormatValue(arr arrow.Array, idx int) string
func PulseFieldFromArrow(af arrow.Field) (encoding.Field, bool)
func TypeFromPulse(ft encoding.FieldType) arrow.DataType
func TypeToPulse(dt arrow.DataType, nullable bool) encoding.FieldType
type Reader
- func NewReader(fs afero.Fs, path string) *Reader
- func NewReaderFromBytes(data []byte) *Reader
type Writer
- func NewWriter(fs afero.Fs, path string) *Writer
- func NewWriterToBuffer() *Writer

Constants ¶

View Source

const (
	PulseTypeMetadataKey = "pulse:type"
)

Pulse field metadata key carried as an Arrow Field.Metadata pair so a reader can recover the original Pulse type for fields that map to a generic Arrow type. Decimal128 carries its precision/scale natively in arrow.Decimal128Type and does not need an extension marker.

Variables ¶

This section is empty.

Functions ¶

func AppendDecimal128 ¶

func AppendDecimal128(b array.Builder, f encoding.Field, v any) error

AppendDecimal128 appends a decimal value to an Arrow Decimal128 builder, accepting either an encoding.Decimal128 typed value or a canonical decimal string at the field's scale. nil and "" append a null. Used by the Arrow IPC and Parquet writers.

func FieldFromPulse ¶

func FieldFromPulse(f encoding.Field) arrow.Field

FieldFromPulse builds an Arrow Field for a Pulse schema entry, including per-type details (decimal128 precision/scale).

func FormatValue ¶

func FormatValue(arr arrow.Array, idx int) string

FormatValue renders a single element of an Arrow array as a string suitable for emission through the pio.Reader interface. Lifted from the original io/parquet implementation. Null handling is the caller's responsibility: FormatValue assumes idx is a non-null position.

Formatting rules:

Integers: base-10, unsigned where possible.
Float32 / Float64: shortest round-trippable decimal at the given precision (strconv 'f', precision -1).
Boolean: lowercase "true" / "false".
String: the raw value verbatim.
Date32: rendered as YYYY-MM-DD using days-since-epoch.
Date64: rendered as YYYY-MM-DD; the sub-day milliseconds are dropped.
Dictionary with a string value type: the resolved string label.
Anything else: falls back to fmt-formatting GetOneForMarshal, which yields the Arrow library's canonical text representation.

func PulseFieldFromArrow ¶

func PulseFieldFromArrow(af arrow.Field) (encoding.Field, bool)

PulseFieldFromArrow inverts FieldFromPulse for decimal128 columns. Returns (field, true) if a Pulse decimal type was identified; (zero, false) otherwise — the caller should fall back to TypeToPulse.

func TypeFromPulse ¶

func TypeFromPulse(ft encoding.FieldType) arrow.DataType

TypeFromPulse maps a Pulse FieldType to an Arrow DataType for use when constructing an Arrow schema for export. Lifted from the original io/parquet implementation.

Mapping rules:

Unsigned integer widths map to the matching Arrow primitive.
Float widths map to the matching Arrow primitive.
Date maps to Arrow Date32 (days-since-epoch), the more compact of the two Arrow date types and the natural fit for Pulse's storage.
All bool variants (packed and nullable) collapse to Arrow Boolean; nullability is carried by the field's Nullable flag, not the data type.
All categorical widths collapse to Arrow String. Writers that want dictionary encoding configure that as a column-encoding hint at write time, not as a data-type choice.
Anything unrecognized falls back to Float64.

func TypeToPulse ¶

func TypeToPulse(dt arrow.DataType, nullable bool) encoding.FieldType

TypeToPulse maps an Arrow data type (with its nullability flag) to a Pulse FieldType. Lifted from the original io/parquet implementation so both the Arrow IPC reader and the Parquet reader share a single source of truth.

Mapping rules:

Signed and unsigned integers of the same width collapse to Pulse's unsigned type at that width (Pulse has no signed integer types).
Nullable variants are chosen for u8/u16 widths and for bool when the Arrow field is marked nullable.
Both Date32 and Date64 collapse to FieldTypeDate; Pulse stores dates as days-since-epoch and discards the sub-day precision Date64 carries.
String and binary (in both standard and large variants), and any dictionary-encoded type, collapse to FieldTypeCategoricalU8. The import pipeline upgrades to wider categorical widths when the dictionary overflows.
Anything unrecognized falls back to FieldTypeF64. This is a deliberate conservative default: it lets unknown numeric Arrow types load without error, at the cost of precision for unusual cases.

Types ¶

type Reader ¶

type Reader struct {
	// contains filtered or unexported fields
}

Reader reads Arrow IPC (Feather V2) data and implements pio.Reader and pio.ResetReader. The full file is materialized in memory at init time — see the package doc for the rationale.

func NewReader ¶

func NewReader(fs afero.Fs, path string) *Reader

NewReader creates an Arrow IPC reader that loads from the given filesystem path on first read. The file is not opened until ReadHeader, ReadRows, or InferPulseSchema is called.

func NewReaderFromBytes ¶

func NewReaderFromBytes(data []byte) *Reader

NewReaderFromBytes creates an Arrow IPC reader over an in-memory byte slice. The slice is not copied; callers must not mutate it after handing it to the reader.

func (*Reader) Close ¶

func (r *Reader) Close() error

Close releases all materialized record batches and the underlying file reader. Safe to call multiple times.

func (*Reader) InferPulseSchema ¶

func (r *Reader) InferPulseSchema() (*encoding.Schema, error)

InferPulseSchema builds a Pulse schema from the Arrow file's schema using the shared TypeToPulse mapping.

func (*Reader) ReadHeader ¶

func (r *Reader) ReadHeader() ([]string, error)

ReadHeader returns column names from the Arrow schema.

func (*Reader) ReadRows ¶

func (r *Reader) ReadRows(ctx context.Context, fn func(row []string) error) error

ReadRows iterates rows across all materialized record batches, calling fn for each row. Null cells produce empty strings.

func (*Reader) Reset ¶

func (r *Reader) Reset() error

Reset rewinds the reader to the beginning. The next Read call will re-open the underlying file and re-materialize batches.

type Writer ¶

type Writer struct {
	// contains filtered or unexported fields
}

Writer writes tabular data as Arrow IPC (Feather V2), accumulating rows into an Arrow RecordBuilder and flushing in batch-sized chunks to bound peak memory.

func NewWriter ¶

func NewWriter(fs afero.Fs, path string) *Writer

NewWriter creates an Arrow IPC writer targeting a filesystem path. The file is materialized in an internal buffer and flushed to fs on Close.

func NewWriterToBuffer ¶

func NewWriterToBuffer() *Writer

NewWriterToBuffer creates an Arrow IPC writer that writes to an internal buffer accessible via Bytes after Close.

func (*Writer) Bytes ¶

func (w *Writer) Bytes() []byte

Bytes returns the buffered Arrow IPC output. Only meaningful after Close.

func (*Writer) Close ¶

func (w *Writer) Close() error

Close flushes any pending batch, closes the underlying Arrow IPC writer, and writes the buffered file to fs if a path was configured. If no header was written, Close is a no-op and Bytes returns an empty slice. If a header was written but no rows, an empty record batch is emitted so the file is structurally valid.

func (*Writer) SetPulseSchema ¶

func (w *Writer) SetPulseSchema(s *encoding.Schema)

SetPulseSchema records the source .pulse schema so subsequent initWriter can build native typed Arrow columns. Implements pio.SchemaAwareWriter.

func (*Writer) WriteHeader ¶

func (w *Writer) WriteHeader(columns []string) error

WriteHeader records the column names. The Arrow schema and writer are constructed lazily on the first WriteRow so that callers can WriteHeader and immediately Close to produce a schema-only file.

func (*Writer) WriteRow ¶

func (w *Writer) WriteRow(values []any) error

WriteRow writes a single row of values. With a Pulse schema set the row dispatches per-column to the native typed builder for that column's Pulse type. Without a schema every value is stringified via fmt.Sprintf and appended to a StringBuilder. Buffered rows flush in defaultRecordBatchSize chunks; batch boundaries fall between rows.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL