Documentation
¶
Overview ¶
Package parquetreader provides utilities for reading individual rows from Parquet files stored on S3-compatible object storage.
Index key format (produced by the DPS Benthos output dimo_parquet_writer):
[{full_uri}|]{object_key}#{row_offset}
Full URI form: s3://bucket/prefix/.../file.parquet#row (bucket and key are parsed). Relative form: prefix/.../file.parquet#row (caller must supply bucket). Use ParseIndexKey to split into bucket (if present), object key, and 0-based row offset. Use IsParquetRef to distinguish from legacy JSON object keys (no '#').
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func IsParquetRef ¶
IsParquetRef returns true if the given index_key uses the new Parquet reference format (contains #). Legacy keys (individual S3 JSON files) do not contain #.
Types ¶
type IndexKeyRef ¶
type IndexKeyRef struct {
// Bucket is set when index_key is a full s3://bucket/key#row URI; otherwise empty.
Bucket string
// ObjectKey is the S3 object key (path) of the Parquet file, without bucket.
ObjectKey string
// RowOffset is the 0-based row index within the Parquet file.
RowOffset int
}
IndexKeyRef holds the parsed components of a Parquet index_key reference.
func ParseIndexKey ¶
func ParseIndexKey(indexKey string) (IndexKeyRef, error)
ParseIndexKey parses an index_key string into bucket (if s3:// URI), object key, and row offset. Supports: "s3://bucket/key.parquet#row" (sets Bucket) or "key.parquet#row" (Bucket empty).
type ObjectGetter ¶
type ObjectGetter interface {
GetObject(ctx context.Context, params *s3.GetObjectInput, optFns ...func(*s3.Options)) (*s3.GetObjectOutput, error)
}
ObjectGetter is an interface for fetching objects from S3-compatible storage. *s3.Client implements it; use this for testing or alternate backends.
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader reads individual row payloads from Parquet files on S3-compatible storage. Schema is compatible with CloudEvent Parquet files written by dimo_parquet_writer.
func New ¶
func New(objGetter ObjectGetter) *Reader
New returns a Reader that uses the given ObjectGetter (e.g. *s3.Client).