Documentation
¶
Overview ¶
Package arrow provides Arrow IPC (Feather V2) import and export for the pulse I/O pipeline, plus shared Arrow<->Pulse type-mapping helpers used by both this package and io/parquet.
Format scope:
- This package implements the Arrow IPC FILE format (random-access, footer-indexed). The standard file extensions are .arrow and .feather; Feather V2 is identical to the Arrow IPC file format on disk.
- The streaming variant (Arrow IPC stream, .arrows) is intentionally NOT supported. The file format is sufficient for batch import/export and trivially seekable, which lets the reader iterate batches without speculative buffering.
Memory model:
The reader materializes the full table into memory at init time, mirroring the policy in io/parquet. Files larger than available memory are out of scope for v1; users that need bounded streaming should use NDJSON or CSV.
Type policy:
All values written by Writer are encoded as Arrow String. Type-promotion on write is intentionally deferred until a real consumer demands it; this matches the parquet writer policy and keeps the export path simple.
Index ¶
- Constants
- func AppendDecimal128(b array.Builder, f encoding.Field, v any) error
- func FieldFromPulse(f encoding.Field) arrow.Field
- func FormatValue(arr arrow.Array, idx int) string
- func PulseFieldFromArrow(af arrow.Field) (encoding.Field, bool)
- func TypeFromPulse(ft encoding.FieldType) arrow.DataType
- func TypeToPulse(dt arrow.DataType) encoding.FieldType
- type Reader
- type Writer
Constants ¶
const (
PulseTypeMetadataKey = "pulse:type"
)
Pulse field metadata key carried as an Arrow Field.Metadata pair so a reader can recover the original Pulse type for fields that map to a generic Arrow type. Decimal128 carries its precision/scale natively in arrow.Decimal128Type and does not need an extension marker.
Variables ¶
This section is empty.
Functions ¶
func AppendDecimal128 ¶
AppendDecimal128 appends a decimal value to an Arrow Decimal128 builder, accepting either an encoding.Decimal128 typed value or a canonical decimal string at the field's scale. nil and "" append a null. Used by the Arrow IPC and Parquet writers.
func FieldFromPulse ¶
FieldFromPulse builds an Arrow Field for a Pulse schema entry, including per-type details (decimal128 precision/scale). Arrow's nullable flag is driven by the Pulse field's Nullable attribute.
func FormatValue ¶
FormatValue renders a single element of an Arrow array as a string suitable for emission through the pio.Reader interface. Lifted from the original io/parquet implementation. Null handling is the caller's responsibility: FormatValue assumes idx is a non-null position.
Formatting rules:
- Integers: base-10, unsigned where possible.
- Float32 / Float64: shortest round-trippable decimal at the given precision (strconv 'f', precision -1).
- Boolean: lowercase "true" / "false".
- String: the raw value verbatim.
- Date32: rendered as YYYY-MM-DD using days-since-epoch.
- Date64: rendered as YYYY-MM-DD; the sub-day milliseconds are dropped.
- Dictionary with a string value type: the resolved string label.
- Anything else: falls back to fmt-formatting GetOneForMarshal, which yields the Arrow library's canonical text representation.
func PulseFieldFromArrow ¶
PulseFieldFromArrow inverts FieldFromPulse for decimal128 columns. Returns (field, true) if a Pulse decimal type was identified; (zero, false) otherwise — the caller should fall back to TypeToPulse.
func TypeFromPulse ¶
TypeFromPulse maps a Pulse FieldType to an Arrow DataType for use when constructing an Arrow schema for export. Nullability is carried by the arrow.Field's Nullable flag, not the data type.
func TypeToPulse ¶
TypeToPulse maps an Arrow data type to a Pulse FieldType. Nullability is carried separately by encoding.Field.Nullable (driven by the Arrow Field's nullable flag at the call site).
Mapping rules:
- Signed and unsigned integers of the same width collapse to Pulse's unsigned type at that width (Pulse has no signed integer types).
- Both Date32 and Date64 collapse to FieldTypeDate; Pulse stores dates as days-since-epoch and discards the sub-day precision Date64 carries.
- String and binary (in both standard and large variants), and any dictionary-encoded type, collapse to FieldTypeCategoricalU8. The import pipeline upgrades to wider categorical widths when the dictionary overflows.
- Anything unrecognized falls back to FieldTypeF64.
Types ¶
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader reads Arrow IPC (Feather V2) data and implements pio.Reader and pio.ResetReader. The full file is materialized in memory at init time — see the package doc for the rationale.
func NewReader ¶
NewReader creates an Arrow IPC reader that loads from the given filesystem path on first read. The file is not opened until ReadHeader, ReadRows, or InferPulseSchema is called.
func NewReaderFromBytes ¶
NewReaderFromBytes creates an Arrow IPC reader over an in-memory byte slice. The slice is not copied; callers must not mutate it after handing it to the reader.
func (*Reader) Close ¶
Close releases all materialized record batches and the underlying file reader. Safe to call multiple times.
func (*Reader) InferPulseSchema ¶
InferPulseSchema builds a Pulse schema from the Arrow file's schema using the shared TypeToPulse mapping.
func (*Reader) ReadHeader ¶
ReadHeader returns column names from the Arrow schema.
type Writer ¶
type Writer struct {
// contains filtered or unexported fields
}
Writer writes tabular data as Arrow IPC (Feather V2), accumulating rows into an Arrow RecordBuilder and flushing in batch-sized chunks to bound peak memory.
func NewWriter ¶
NewWriter creates an Arrow IPC writer targeting a filesystem path. The file is materialized in an internal buffer and flushed to fs on Close.
func NewWriterToBuffer ¶
func NewWriterToBuffer() *Writer
NewWriterToBuffer creates an Arrow IPC writer that writes to an internal buffer accessible via Bytes after Close.
func (*Writer) Close ¶
Close flushes any pending batch, closes the underlying Arrow IPC writer, and writes the buffered file to fs if a path was configured. If no header was written, Close is a no-op and Bytes returns an empty slice. If a header was written but no rows, an empty record batch is emitted so the file is structurally valid.
func (*Writer) SetPulseSchema ¶
SetPulseSchema records the source .pulse schema so subsequent initWriter can build native typed Arrow columns. Implements pio.SchemaAwareWriter.
func (*Writer) WriteHeader ¶
WriteHeader records the column names. The Arrow schema and writer are constructed lazily on the first WriteRow so that callers can WriteHeader and immediately Close to produce a schema-only file.
func (*Writer) WriteRow ¶
WriteRow writes a single row of values. With a Pulse schema set the row dispatches per-column to the native typed builder for that column's Pulse type. Without a schema every value is stringified via fmt.Sprintf and appended to a StringBuilder. Buffered rows flush in defaultRecordBatchSize chunks; batch boundaries fall between rows.