arrowagg

package
v3.6.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2025 License: AGPL-3.0 Imports: 5 Imported by: 0

Documentation

Overview

Package arrowagg provides utilities for aggregating Apache Arrow data structures.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Arrays

type Arrays struct {
	// contains filtered or unexported fields
}

Arrays allows for aggregating a set of [arrow.Array]s together into a new, combined array.

func NewArrays

func NewArrays(mem memory.Allocator, dt arrow.DataType) *Arrays

NewArrays creates a new Arrays that aggregates a set of arrays of the same data type. The data type of incoming arrays is not checked until calling Arrays.Aggregate.

func (*Arrays) Aggregate

func (a *Arrays) Aggregate() (arrow.Array, error)

Aggregate all appended arrays into a single array. The returned array must be Release'd after use. If no arrays have been appended, Aggregate returns a zero-length array.

Aggregate returns an error if any of the appended arrays do not match the data type passed to NewArrays.

After calling Aggregate, a is reset and can be reused to append more arrays. This reset is done even if Aggregate returns an error.

func (*Arrays) Append

func (a *Arrays) Append(arr arrow.Array)

Append appends the entirety of the given array to the builder. The data type of arr is not checked until calling Arrays.Aggregate.

func (*Arrays) AppendNulls

func (a *Arrays) AppendNulls(n int)

AppendNulls appends n null values to the builer.

func (*Arrays) AppendSlice

func (a *Arrays) AppendSlice(arr arrow.Array, i, j int64)

AppendSlice appends a slice of the given array to the builder. The data type of arr is not checked until calling Arrays.Aggregate.

func (*Arrays) Len

func (a *Arrays) Len() int

Len returns the total number of rows currently appended to the builder.

func (*Arrays) Reset

func (a *Arrays) Reset()

Reset releases all arrays currently appended to a and resets it for reuse.

type Mapper

type Mapper struct {
	// contains filtered or unexported fields
}

Mapper is a utility for quickly finding the index of common fields in a schema.

Mapper caches mappings to speed up repeated lookups. Caches are cleared as Mapper.RemoveSchema is called, or when calling [Mapping.Reset].

func NewMapper

func NewMapper(target []arrow.Field) *Mapper

NewMapper creates a new Mapper that locates the target fields in a schema.

func (*Mapper) FieldIndex

func (m *Mapper) FieldIndex(schema *arrow.Schema, targetIndex int) int

FieldIndex returns the index of the targetIndex'd field in the schema, or -1 if it doesn't exist. targetIndex corresponds to the index of the field in the target slice passed to NewMapper.

FieldIndex returns -1 if targetIndex is out of bounds for the target slice passed to NewMapper.

func (*Mapper) RemoveSchema

func (m *Mapper) RemoveSchema(schema *arrow.Schema)

RemoveSchema removes an individual schema from the mapper.

func (*Mapper) Reset

func (m *Mapper) Reset()

Reset resets the mapper, immediately clearing any cached mappings.

type Records

type Records struct {
	// contains filtered or unexported fields
}

Records allows for aggregating a set of [arrow.Record]s together into a single, combined record with a combined schema.

The returned record will have a schema composed of the union of all fields from the input records. Fields will be placed in the order in which they are first seen. When aggregating records, fields that do not exist in the source record will be filled with null values in the output record.

func NewRecords

func NewRecords(mem memory.Allocator) *Records

NewRecords creates a new Records that aggregates a set of records.

func (*Records) Aggregate

func (r *Records) Aggregate() (arrow.Record, error)

Aggregate all appended records into a single record. The returned record must be Release'd after use. If no records have been appended, Aggregate returns an error.

The returned record will have a schema composed of the union of all fields from the input records, sorted by the order in which each field was first seen.

Fields that do not exist in source records will be filled with null values in the output record.

After calling Aggregate, r is reset and can be reused to append more arrays. This reset is done even if Aggregate returns an error.

func (*Records) Append

func (r *Records) Append(rec arrow.Record)

Append appends the entirety of rec to r. If record contains a field that has not been seen before, it will be added to the schema of r.

Fields are compared based on endianness, name, type, and metadata; two fields that are equal except for metadata are treated as different.

Any metadata from rec.Schema().Metadata() is ignored when computing the combined schema for r.

rec is Retaine'd by this method until calling Records.Aggregate.

Append panics if encountering an unlikely hash collision between two different schemas.

func (*Records) AppendSlice

func (r *Records) AppendSlice(rec arrow.Record, i, j int64)

AppendSlice behaves like Records.Append but instead appends a slice of the input record from i to j.

func (*Records) Reset

func (r *Records) Reset()

Reset releases all resources held by r and clears its state, allowing it to be reused for aggregating new records.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL