benchmark-go

command
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 11, 2025 License: MIT Imports: 16 Imported by: 0

Documentation

Overview

This worker implements the ECS benchmark task for high-performance CSV-to-Parquet conversion in Go.

This benchmark is designed to evaluate the ingestion speed, memory usage, and overall efficiency of a Go-based pipeline compared to alternatives like AWS Glue or EMR Serverless.

Key features:

  • Reads compressed CSV (.csv.gz) files from S3
  • Parses records into strongly typed structs (bronze.Review) using concurrent workers
  • Writes Parquet output using the parquet-go library with Snappy compression and batching
  • Uploads the final Parquet file and a detailed benchmark result (.json) back to S3

The benchmark is triggered as an ECS Fargate task and controlled via environment variables:

  • BENCHMARK_BUCKET: S3 bucket used for input and output
  • BENCHMARK_INPUT: Path to the input CSV file (.csv.gz)
  • BENCHMARK_OUTPUT: Path to write the Parquet file
  • BENCHMARK_RESULT: Path to write the benchmark summary in JSON format

The benchmark output includes CSV read time, Parquet write time, total task duration, and memory usage.

This code serves both as a performance reference and a validation tool for the ingestion layer.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL