parquet

package
v0.2.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 9, 2026 License: Apache-2.0 Imports: 9 Imported by: 1

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Decode

func Decode(r io.ReaderAt, size int64) ([]cloudevent.RawEvent, error)

Decode reads a parquet file from r and returns the decoded CloudEvents. The size parameter must be the total size of the parquet data in bytes.

func Encode

func Encode(w io.Writer, events []cloudevent.RawEvent, objectKey string, opts ...Option) (map[int]string, error)

Encode writes events as Snappy-compressed Parquet to w. Each event is assigned an index key in the format "objectKey#rowOffset". The returned map contains the event index to index key mapping.

func IsParquetRef

func IsParquetRef(indexKey string) bool

IsParquetRef returns true if the index key references a row within a parquet file, indicated by the presence of a '#' separator.

func ParseIndexKey

func ParseIndexKey(indexKey string) (objectKey string, rowOffset int64, err error)

ParseIndexKey splits a parquet index key in the format "objectKey#rowOffset" into its component parts.

func SeekToRow

func SeekToRow(r io.ReaderAt, size int64, rowIndex int64) (cloudevent.RawEvent, error)

SeekToRow retrieves a single row by index without reading the entire file. The rowIndex must be in the range [0, numRows). For multiple seeks on the same file, use OpenReader instead to avoid re-parsing the parquet footer on each call.

Types

type EncoderConfig

type EncoderConfig struct {
	// MaxRowsPerRowGroup controls how many rows are written per row group.
	// Zero means use the parquet-go default.
	MaxRowsPerRowGroup int64

	// PageBufferSize controls the page buffer size in bytes.
	// Zero means use the parquet-go default.
	PageBufferSize int

	// WriteBufferSize controls the write buffer size in bytes.
	// Zero means use the parquet-go default.
	WriteBufferSize int
}

EncoderConfig holds tunable parameters for the parquet encoder.

type Option

type Option func(*EncoderConfig)

Option is a functional option for configuring the parquet encoder.

func WithMaxRowsPerRowGroup

func WithMaxRowsPerRowGroup(n int64) Option

WithMaxRowsPerRowGroup sets the maximum number of rows per row group.

func WithPageBufferSize

func WithPageBufferSize(n int) Option

WithPageBufferSize sets the page buffer size in bytes.

func WithWriteBufferSize

func WithWriteBufferSize(n int) Option

WithWriteBufferSize sets the write buffer size in bytes.

type ParquetRow

type ParquetRow struct {
	Subject         string    `parquet:"subject"`
	Time            time.Time `parquet:"time,timestamp(millisecond)"`
	Type            string    `parquet:"type"`
	ID              string    `parquet:"id"`
	Source          string    `parquet:"source"`
	Producer        string    `parquet:"producer"`
	DataContentType string    `parquet:"data_content_type"`
	DataVersion     string    `parquet:"data_version"`
	Extras          string    `parquet:"extras"`
	Data            *string   `parquet:"data,optional"`
	DataBase64      []byte    `parquet:"data_base64,optional"`
}

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader wraps an opened parquet file so that multiple row seeks reuse the parsed file metadata (footer) instead of re-reading it from the underlying io.ReaderAt on every call.

func OpenReader

func OpenReader(r io.ReaderAt, size int64) (*Reader, error)

OpenReader opens a parquet file and returns a Reader for repeated row access. Call Close when done to release resources.

func (*Reader) Close

func (pr *Reader) Close() error

Close releases the underlying parquet reader resources.

func (*Reader) NumRows

func (pr *Reader) NumRows() int64

NumRows returns the total number of rows in the parquet file.

func (*Reader) SeekToRow

func (pr *Reader) SeekToRow(rowIndex int64) (cloudevent.RawEvent, error)

SeekToRow retrieves a single row by index. The rowIndex must be in the range [0, NumRows()).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL