parquet

package

v0.2.0 Latest Latest Go to latest Published: Mar 7, 2026 License: Apache-2.0 Imports: 9 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/DIMO-Network/cloudevent

Links

Open Source Insights

Documentation ¶

Index ¶

func Decode(r io.ReaderAt, size int64) ([]cloudevent.RawEvent, error)
func Encode(w io.Writer, events []cloudevent.RawEvent, objectKey string, opts ...Option) (map[int]string, error)
func IsParquetRef(indexKey string) bool
func ParseIndexKey(indexKey string) (objectKey string, rowOffset int64, err error)
func SeekToRow(r io.ReaderAt, size int64, rowIndex int64) (cloudevent.RawEvent, error)
type EncoderConfig
type Option
type ParquetRow
type Reader
- func OpenReader(r io.ReaderAt, size int64) (*Reader, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Decode ¶

func Decode(r io.ReaderAt, size int64) ([]cloudevent.RawEvent, error)

Decode reads a parquet file from r and returns the decoded CloudEvents. The size parameter must be the total size of the parquet data in bytes.

func Encode ¶

func Encode(w io.Writer, events []cloudevent.RawEvent, objectKey string, opts ...Option) (map[int]string, error)

Encode writes events as Snappy-compressed Parquet to w. Each event is assigned an index key in the format "objectKey#rowOffset". The returned map contains the event index to index key mapping.

func IsParquetRef ¶

func IsParquetRef(indexKey string) bool

IsParquetRef returns true if the index key references a row within a parquet file, indicated by the presence of a '#' separator.

func ParseIndexKey ¶

func ParseIndexKey(indexKey string) (objectKey string, rowOffset int64, err error)

ParseIndexKey splits a parquet index key in the format "objectKey#rowOffset" into its component parts.

func SeekToRow ¶

func SeekToRow(r io.ReaderAt, size int64, rowIndex int64) (cloudevent.RawEvent, error)

SeekToRow retrieves a single row by index without reading the entire file. The rowIndex must be in the range [0, numRows). For multiple seeks on the same file, use OpenReader instead to avoid re-parsing the parquet footer on each call.

Types ¶

type EncoderConfig ¶

type EncoderConfig struct {
	// MaxRowsPerRowGroup controls how many rows are written per row group.
	// Zero means use the parquet-go default.
	MaxRowsPerRowGroup int64

	// PageBufferSize controls the page buffer size in bytes.
	// Zero means use the parquet-go default.
	PageBufferSize int

	// WriteBufferSize controls the write buffer size in bytes.
	// Zero means use the parquet-go default.
	WriteBufferSize int
}

EncoderConfig holds tunable parameters for the parquet encoder.

type Option ¶

type Option func(*EncoderConfig)

Option is a functional option for configuring the parquet encoder.

func WithMaxRowsPerRowGroup ¶

func WithMaxRowsPerRowGroup(n int64) Option

WithMaxRowsPerRowGroup sets the maximum number of rows per row group.

func WithPageBufferSize ¶

func WithPageBufferSize(n int) Option

WithPageBufferSize sets the page buffer size in bytes.

func WithWriteBufferSize ¶

func WithWriteBufferSize(n int) Option

WithWriteBufferSize sets the write buffer size in bytes.

type ParquetRow ¶

type ParquetRow struct {
	SpecVersion     string    `parquet:"specversion"`
	Type            string    `parquet:"type"`
	Source          string    `parquet:"source"`
	Subject         string    `parquet:"subject"`
	ID              string    `parquet:"id"`
	Time            time.Time `parquet:"time,timestamp(millisecond)"`
	DataContentType string    `parquet:"data_content_type"`
	DataVersion     string    `parquet:"data_version"`
	Producer        string    `parquet:"producer"`
	Extras          string    `parquet:"extras"`
	Data            *string   `parquet:"data,optional"`
	DataBase64      []byte    `parquet:"data_base64,optional"`
}

ParquetRow is the row layout for storing CloudEvents in Parquet files. Field order matches the JSON Event Format: specversion, type, source, subject, id, time, datacontenttype, then extension attributes, then data/data_base64. See https://github.com/cloudevents/spec/blob/main/cloudevents/formats/json-format.md

type Reader ¶

type Reader struct {
	// contains filtered or unexported fields
}

Reader wraps an opened parquet file so that multiple row seeks reuse the parsed file metadata (footer) instead of re-reading it from the underlying io.ReaderAt on every call.

func OpenReader ¶

func OpenReader(r io.ReaderAt, size int64) (*Reader, error)

OpenReader opens a parquet file and returns a Reader for repeated row access. Call Close when done to release resources.

func (*Reader) Close ¶

func (pr *Reader) Close() error

Close releases the underlying parquet reader resources.

func (*Reader) NumRows ¶

func (pr *Reader) NumRows() int64

NumRows returns the total number of rows in the parquet file.

func (*Reader) SeekToRow ¶

func (pr *Reader) SeekToRow(rowIndex int64) (cloudevent.RawEvent, error)

SeekToRow retrieves a single row by index. The rowIndex must be in the range [0, NumRows()).

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL