geoarrow

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 19, 2026 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package geoarrow provides zero-allocation WKB→GeoArrow conversion for Apache Arrow record batches.

Usage:

converter := geoarrow.NewConverter(reader, geoarrow.WithBufferSize(2))
defer converter.Release()
// use converter as array.RecordReader

Index

Constants

View Source
const (
	ExtMultiPoint      = "geoarrow.multipoint"
	ExtMultiLineString = "geoarrow.multilinestring"
	ExtMultiPolygon    = "geoarrow.multipolygon"
)

GeoArrow extension type names.

Variables

View Source
var (
	// MultiPoint: List<FixedSizeList<Float64>[2]>
	MultiPointType = arrow.ListOf(CoordType)

	// MultiLineString: List<List<FixedSizeList<Float64>[2]>>
	MultiLineStringType = arrow.ListOf(arrow.ListOf(CoordType))

	// MultiPolygon: List<List<List<FixedSizeList<Float64>[2]>>>
	MultiPolygonType = arrow.ListOf(arrow.ListOf(arrow.ListOf(CoordType)))
)

GeoArrow nested types — always Multi* for schema consistency.

View Source
var CoordType = arrow.FixedSizeListOf(2, arrow.PrimitiveTypes.Float64)

CoordType is FixedSizeList(2, Float64) — interleaved XY coordinates.

Functions

func ArrowTypeForGeo

func ArrowTypeForGeo(gt GeoType) (arrow.DataType, string)

ArrowTypeForGeo returns the Arrow DataType and extension name for a GeoType.

func GeoArrowField

func GeoArrowField(name string, gt GeoType, extensionMeta string) (arrow.Field, error)

GeoArrowField creates a new Arrow Field with GeoArrow extension metadata. extensionMeta is the original ARROW:extension:metadata value (passed through as-is). If empty, defaults to {"srid":4326}.

func NewProjector

func NewProjector(source array.RecordReader, cols map[string]bool) array.RecordReader

NewProjector creates a RecordReader that outputs only the named columns. Columns not found in the schema are silently ignored. If all columns are present or cols is empty, returns source unchanged (no wrapper).

Types

type Converter

type Converter struct {
	// contains filtered or unexported fields
}

Converter wraps an array.RecordReader and converts WKB geometry columns to native GeoArrow format on-the-fly. It implements array.RecordReader.

The source schema should already be flattened (Struct/Map/Union expanded to top-level columns) before passing to the Converter. This ensures that geometry columns nested inside structs are accessible by column index.

Multiple geometry columns per schema are supported — each is detected and converted independently (e.g., one column can be points, another polygons).

Conversion is pipelined: up to bufferSize batches are converted ahead in parallel goroutines while the consumer reads previous results. All goroutines respect the provided context for cancellation.

func NewConverter

func NewConverter(source array.RecordReader, opts ...Option) *Converter

NewConverter creates a new WKB→GeoArrow converting RecordReader.

It reads batches from source, detects WKB geometry columns (by ARROW:extension:name metadata), and converts them to native GeoArrow (MultiPoint/MultiLineString/MultiPolygon) columns.

The source should already be flattened — if geometry is inside a struct, flatten first so the Converter can find it by column index.

Multiple geometry columns are supported; each is converted independently.

The conversion runs in a background pipeline with bufferSize goroutines. The output schema is determined after the first batch is converted (geometry column types are auto-detected from WKB content).

func (*Converter) Err

func (c *Converter) Err() error

func (*Converter) Next

func (c *Converter) Next() bool

func (*Converter) Record

func (c *Converter) Record() arrow.RecordBatch

Record is a deprecated alias for RecordBatch.

func (*Converter) RecordBatch

func (c *Converter) RecordBatch() arrow.RecordBatch

func (*Converter) Release

func (c *Converter) Release()

func (*Converter) Retain

func (c *Converter) Retain()

func (*Converter) Schema

func (c *Converter) Schema() *arrow.Schema

type GeoType

type GeoType int

GeoType represents the canonical geometry type for a column.

const (
	GeoTypeUnknown GeoType = iota
	GeoTypePoint
	GeoTypeLine
	GeoTypePolygon
)

func ConvertBatch

func ConvertBatch(
	rec arrow.RecordBatch,
	col GeometryColumn,
	geoType GeoType,
	mem memory.Allocator,
) (arrow.RecordBatch, GeoType, error)

convertBatch replaces WKB binary column(s) with native GeoArrow column(s). Returns a new RecordBatch (caller must Release). The geoType is auto-detected on first non-null geometry; subsequent calls should pass the same geoType.

type GeometryColumn

type GeometryColumn struct {
	Name          string
	Index         int
	SRID          int
	Format        string // "WKB", "GeoJSON", "H3Cell"
	ExtensionMeta string // original ARROW:extension:metadata (passed through)
}

GeometryColumn describes a geometry column to convert.

func DetectGeometryColumns

func DetectGeometryColumns(schema *arrow.Schema) []GeometryColumn

DetectGeometryColumns finds geometry columns in an Arrow schema by checking ARROW:extension:name metadata.

type Option

type Option func(*converterConfig)

Option configures the Converter.

func WithAllocator

func WithAllocator(mem memory.Allocator) Option

WithAllocator sets the memory allocator for Arrow arrays.

func WithBufferSize

func WithBufferSize(n int) Option

WithBufferSize sets the number of batches to buffer ahead. This equals the number of goroutines doing conversion in parallel. Default: 1 (no parallelism, but still pipelined).

func WithColumns

func WithColumns(cols []GeometryColumn) Option

WithColumns overrides auto-detection and specifies which columns to convert.

func WithContext

func WithContext(ctx context.Context) Option

WithContext sets the context for cancellation of the background pipeline.

type Projector

type Projector struct {
	// contains filtered or unexported fields
}

Projector wraps a RecordReader and projects (selects) only the requested columns. Implements array.RecordReader.

func (*Projector) Err

func (p *Projector) Err() error

func (*Projector) Next

func (p *Projector) Next() bool

func (*Projector) Record

func (p *Projector) Record() arrow.RecordBatch

func (*Projector) RecordBatch

func (p *Projector) RecordBatch() arrow.RecordBatch

func (*Projector) Release

func (p *Projector) Release()

func (*Projector) Retain

func (p *Projector) Retain()

func (*Projector) Schema

func (p *Projector) Schema() *arrow.Schema

type StringReplacer

type StringReplacer struct {
	// contains filtered or unexported fields
}

StringReplacer wraps a RecordReader and replaces geometry columns (WKB binary or native GeoArrow) with a Utf8 string column containing "{geometry}" for display in viewers that don't support binary/nested types.

Implements array.RecordReader.

func NewStringReplacer

func NewStringReplacer(source array.RecordReader) *StringReplacer

NewStringReplacer creates a RecordReader that replaces geometry columns with "{geometry}". It auto-detects geometry columns by ARROW:extension:name metadata (geoarrow.wkb, ogc.wkb, geoarrow.point, geoarrow.multi*, etc.).

func (*StringReplacer) Err

func (r *StringReplacer) Err() error

func (*StringReplacer) Next

func (r *StringReplacer) Next() bool

func (*StringReplacer) Record

func (r *StringReplacer) Record() arrow.RecordBatch

func (*StringReplacer) RecordBatch

func (r *StringReplacer) RecordBatch() arrow.RecordBatch

func (*StringReplacer) Release

func (r *StringReplacer) Release()

func (*StringReplacer) Retain

func (r *StringReplacer) Retain()

func (*StringReplacer) Schema

func (r *StringReplacer) Schema() *arrow.Schema

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL