Documentation
¶
Index ¶
- func Decode(r io.ReaderAt, size int64) ([]cloudevent.RawEvent, error)
- func Encode(w io.Writer, events []cloudevent.RawEvent, objectKey string, opts ...Option) (map[int]string, error)
- func IsParquetRef(indexKey string) bool
- func ParseIndexKey(indexKey string) (objectKey string, rowOffset int64, err error)
- func SeekToRow(r io.ReaderAt, size int64, rowIndex int64) (cloudevent.RawEvent, error)
- type EncoderConfig
- type Option
- type ParquetRow
- type Reader
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Decode ¶
Decode reads a parquet file from r and returns the decoded CloudEvents. The size parameter must be the total size of the parquet data in bytes.
func Encode ¶
func Encode(w io.Writer, events []cloudevent.RawEvent, objectKey string, opts ...Option) (map[int]string, error)
Encode writes events as Snappy-compressed Parquet to w. Each event is assigned an index key in the format "objectKey#rowOffset". The returned map contains the event index to index key mapping.
func IsParquetRef ¶
IsParquetRef returns true if the index key references a row within a parquet file, indicated by the presence of a '#' separator.
func ParseIndexKey ¶
ParseIndexKey splits a parquet index key in the format "objectKey#rowOffset" into its component parts.
Types ¶
type EncoderConfig ¶
type EncoderConfig struct {
// MaxRowsPerRowGroup controls how many rows are written per row group.
// Zero means use the parquet-go default.
MaxRowsPerRowGroup int64
// PageBufferSize controls the page buffer size in bytes.
// Zero means use the parquet-go default.
PageBufferSize int
// WriteBufferSize controls the write buffer size in bytes.
// Zero means use the parquet-go default.
WriteBufferSize int
}
EncoderConfig holds tunable parameters for the parquet encoder.
type Option ¶
type Option func(*EncoderConfig)
Option is a functional option for configuring the parquet encoder.
func WithMaxRowsPerRowGroup ¶
WithMaxRowsPerRowGroup sets the maximum number of rows per row group.
func WithPageBufferSize ¶
WithPageBufferSize sets the page buffer size in bytes.
func WithWriteBufferSize ¶
WithWriteBufferSize sets the write buffer size in bytes.
type ParquetRow ¶
type ParquetRow struct {
SpecVersion string `parquet:"specversion"`
Type string `parquet:"type"`
Source string `parquet:"source"`
Subject string `parquet:"subject"`
ID string `parquet:"id"`
Time time.Time `parquet:"time,timestamp(millisecond)"`
DataContentType string `parquet:"data_content_type"`
DataVersion string `parquet:"data_version"`
Producer string `parquet:"producer"`
Extras string `parquet:"extras"`
Data *string `parquet:"data,optional"`
DataBase64 []byte `parquet:"data_base64,optional"`
}
ParquetRow is the row layout for storing CloudEvents in Parquet files. Field order matches the JSON Event Format: specversion, type, source, subject, id, time, datacontenttype, then extension attributes, then data/data_base64. See https://github.com/cloudevents/spec/blob/main/cloudevents/formats/json-format.md
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader wraps an opened parquet file so that multiple row seeks reuse the parsed file metadata (footer) instead of re-reading it from the underlying io.ReaderAt on every call.
func OpenReader ¶
OpenReader opens a parquet file and returns a Reader for repeated row access. Call Close when done to release resources.