bufarrow

package module
v0.5.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 4, 2025 License: Apache-2.0 Imports: 23 Imported by: 2

README ΒΆ

bufarrow 🦬

Go Reference

Go library to build Apache Arrow records from Protocol Buffers

Features

  • generate Arrow and Parquet schemas from Protobuf structs
  • augment Protobuf data with custom fields
  • build Arrow records from Protobuf
  • build Arrow records with normalized flattened Protobuf

πŸš€ Install

Using bufarrow is easy. First, use go get to install the latest version of the library.

go get -u github.com/loicalleyne/bufarrow@latest

πŸ’‘ Usage

You can import bufarrow using:

import "github.com/loicalleyne/bufarrow"

πŸ’« Show your support

Give a ⭐️ if this project helped you! Feedback and PRs welcome.

License

Bufarrow is released under the Apache 2.0 license. See LICENCE.txt

Documentation ΒΆ

Index ΒΆ

Constants ΒΆ

This section is empty.

Variables ΒΆ

View Source
var ErrMxDepth = errors.New("max depth reached, either the message is deeply nested or a circular dependency was introduced")
View Source
var ErrPathNotFound = errors.New("path not found")

Functions ΒΆ

This section is empty.

Types ΒΆ

type Cardinality ΒΆ added in v0.5.1

type Cardinality protoreflect.Cardinality

Cardinality determines whether a field is optional, required, or repeated.

const (
	Optional Cardinality = 1 // appears zero or one times
	Required Cardinality = 2 // appears exactly one time; invalid with Proto3
	Repeated Cardinality = 3 // appears zero or more times
)

Constants as defined by the google.protobuf.Cardinality enumeration.

func (*Cardinality) Get ΒΆ added in v0.5.1

type CustomField ΒΆ added in v0.5.1

type CustomField struct {
	// Name must not conflict with existing proto.Message field names.
	Name string `toml:"name"`
	// Supported types:
	// 		BOOL	bool
	// 		BYTES	[]byte
	//		STRING	string
	//		INT64	int64
	//		FLOAT64	float64
	Type FieldType `toml:"type"`
	// FieldCardinality is a type alias of protoreflect.Cardinality.
	// Cardinality determines whether a field is optional, required, or repeated.
	// const (
	// 	Optional Cardinality = 1 // appears zero or one times
	// 	Required Cardinality = 2 // appears exactly one time; invalid with Proto3
	// 	Repeated Cardinality = 3 // appears zero or more times
	// )
	// Constants as defined by the google.protobuf.Cardinality enumeration.
	FieldCardinality Cardinality `toml:"field_cardinality"`
	// IsPacked reports whether repeated primitive numeric kinds should be
	// serialized using a packed encoding.
	// If true, then it implies Cardinality is Repeated.
	IsPacked bool `toml:"is_packed"`
}

type FieldType ΒΆ added in v0.5.1

type FieldType fieldType
const (
	BOOL    FieldType = "bool"
	BYTES   FieldType = "[]byte"
	STRING  FieldType = "string"
	INT64   FieldType = "int64"
	FLOAT64 FieldType = "float64"
)

type Opt ΒΆ added in v0.5.0

type Opt struct {
	// contains filtered or unexported fields
}

type Option ΒΆ added in v0.5.0

type Option func(config)

func WithCustomFields ΒΆ added in v0.5.1

func WithCustomFields(c []CustomField) Option

WithCustomFields adds user-defined fields to the message schema which can be populated with AppendWithCustom().

func WithNormalizer ΒΆ added in v0.5.0

func WithNormalizer(fields, aliases []string, failOnRangeError bool) Option

WithNormalizer configures the scalars to add to a flat Arrow Record suitable for efficient aggregation. Fields should be specified by their path (field names separated by a period ie. 'field1.field2.field3'). The Arrow field types of the selected fields will be used to build the new schema. If coaslescing data between multiple fields of the same type, specify only one of the paths. List fields should have an index to retrieve specified, otherwise defaults to all elements; ranges are not yet implemented. Current functionality is limited to valitating the fields/aliases match in `New()β€œ, and `NormalizerBuilder()` returning an `*arrow.RecordBuilder` to be used externally to append data, and NewNormalizerRecord() to get an `arrow.Record` from the normalizer RecordBuilder. Future development may include Append methods that accept protopath operations to normalize protobuf messages in-flight internally to the package. failOnRangeError indicates whether to fail on a list[start:end] where end > len(list). TODO

type Schema ΒΆ

type Schema[T proto.Message] struct {
	// contains filtered or unexported fields
}

func New ΒΆ

func New[T proto.Message](mem memory.Allocator, opts ...Option) (schema *Schema[T], err error)

New returns a new bufarrow.Schema. Options include WithNormalizer and WithCustomFields. WithNormalizer creates a separate Arrow record whilst WithCustomFields expands the schema of the proto.Message used as the type parameter.

func (*Schema[T]) Append ΒΆ

func (s *Schema[T]) Append(value T)

Append appends protobuf value to the schema builder. This method is not safe for concurrent use.

func (*Schema[T]) AppendWithCustom ΒΆ added in v0.5.1

func (s *Schema[T]) AppendWithCustom(value T, c ...any) error

AppendWithCustom appends protobuf value and custom field values to the schema builder. This method is not safe for concurrent use. The number of custom field values must match the number of custom fields. Supported types:

bool
[]byte
string
int64
float64

func (*Schema[T]) Clone ΒΆ added in v0.4.0

func (s *Schema[T]) Clone(mem memory.Allocator) (schema *Schema[T], err error)

Clone returns an identical bufarrow.Schema. Use in concurrency scenarios as Schema methods are not concurrency safe.

func (*Schema[T]) FieldNames ΒΆ added in v0.3.0

func (s *Schema[T]) FieldNames() []string

FieldNames returns top-level field names

func (*Schema[T]) NewNormalizerRecord ΒΆ added in v0.5.0

func (s *Schema[T]) NewNormalizerRecord() arrow.Record

NewNormalizerRecord returns buffered builder value as an arrow.Record. The builder is reset and can be reused to build new records.

func (*Schema[T]) NewRecord ΒΆ

func (s *Schema[T]) NewRecord() arrow.Record

NewRecord returns buffered builder value as an arrow.Record. The builder is reset and can be reused to build new records.

func (*Schema[T]) NormalizerBuilder ΒΆ added in v0.5.0

func (s *Schema[T]) NormalizerBuilder() *array.RecordBuilder

NormalizerBuilder returns the Normalizer's Arrow array.RecordBuilder, to be used to append normalized data.

func (*Schema[T]) Parquet ΒΆ

func (s *Schema[T]) Parquet() *schema.Schema

Parquet returns schema as parquet schema

func (*Schema[T]) Proto ΒΆ

func (s *Schema[T]) Proto(r arrow.Record, rows []int) []T

Proto decodes rows and returns them as proto messages.

func (*Schema[T]) ReadParquet ΒΆ

func (s *Schema[T]) ReadParquet(ctx context.Context, r parquet.ReaderAtSeeker, columns []int) (arrow.Record, error)

ReadParquet specified columns from parquet source r and returns an Arrow record. The returned record must be released by the caller.

func (*Schema[T]) Release ΒΆ

func (s *Schema[T]) Release()

Release releases the reference on the message builder

func (*Schema[T]) Schema ΒΆ

func (s *Schema[T]) Schema() *arrow.Schema

Schema returns schema as arrow schema

func (*Schema[T]) WriteParquet ΒΆ

func (s *Schema[T]) WriteParquet(w io.Writer) error

WriteParquet writes Parquet to an io.Writer

func (*Schema[T]) WriteParquetRecords ΒΆ

func (s *Schema[T]) WriteParquetRecords(w io.Writer, records ...arrow.Record) error

WriteParquetRecords write one or many Arrow records to parquet

Directories ΒΆ

Path Synopsis
gen

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL