benchmark-go

command

v0.1.0 Latest Latest Go to latest Published: Aug 11, 2025 License: MIT Imports: 16 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/seriallink/datamaster

Links

Open Source Insights

Documentation ¶

Overview ¶

This worker implements the ECS benchmark task for high-performance CSV-to-Parquet conversion in Go.

This benchmark is designed to evaluate the ingestion speed, memory usage, and overall efficiency of a Go-based pipeline compared to alternatives like AWS Glue or EMR Serverless.

Key features:

Reads compressed CSV (.csv.gz) files from S3
Parses records into strongly typed structs (bronze.Review) using concurrent workers
Writes Parquet output using the parquet-go library with Snappy compression and batching
Uploads the final Parquet file and a detailed benchmark result (.json) back to S3

The benchmark is triggered as an ECS Fargate task and controlled via environment variables:

BENCHMARK_BUCKET: S3 bucket used for input and output
BENCHMARK_INPUT: Path to the input CSV file (.csv.gz)
BENCHMARK_OUTPUT: Path to write the Parquet file
BENCHMARK_RESULT: Path to write the benchmark summary in JSON format

The benchmark output includes CSV read time, Parquet write time, total task duration, and memory usage.

This code serves both as a performance reference and a validation tool for the ingestion layer.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL