sparkeks

command

v1.4.1 Latest Latest Go to latest Published: Jan 29, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

Spark EKS Plugin

The Spark EKS plugin enables submitting Spark jobs to an AWS EKS cluster using the Kubeflow Spark Operator. It supports IAM role assumption, S3 integration for queries/results/logs, and custom Spark application templates.

Features

Submit Spark jobs to EKS via Kubeflow Spark Operator
Upload SQL queries to S3 and fetch results from S3
Collect Spark pod logs and store them in S3
Supports custom SparkApplication YAML templates
IAM role assumption for cross-account EKS access
Configurable Spark job resources and properties

Configuration

Cluster Context

{
  "name": "my-eks-cluster",
  "context": {
    "role_arn": "arn:aws:iam::ACCOUNT:role/EKSAccessRole",
    "region": "us-west-2",
    "image": "myrepo/spark:latest",
    "spark_application_file": "/path/to/spark-application.yaml",
    "properties": {
      "spark.hadoop.fs.s3a.access.key": "...",
      "spark.hadoop.fs.s3a.secret.key": "..."
    }
  }
}

Job Context

{
  "query": "SELECT * FROM my_table",
  "properties": {
    "spark.driver.memory": "2g",
    "spark.executor.instances": "2"
  },
  "return_result": true
}

Command Context

{
  "queries_uri": "s3://mybucket/queries",
  "results_uri": "s3://mybucket/results",
  "logs_uri": "s3://mybucket/logs",
  "wrapper_uri": "s3://mybucket/wrapper.py",
  "properties": {
    "spark.some.config": "value"
  },
  "kube_namespace": "default"
}

Usage

Submit a job using the API:

{
  "name": "run-spark-query",
  "version": "0.0.1",
  "command_criteria": [
    "type:sparkeks"
  ],
  "cluster_criteria": [
    "data:prod"
  ],
  "context": {
    "query": "SELECT * from table limit 10;"
  }
}

Result Format

If return_result is true, the plugin fetches results from S3 and returns them in tabular format:

{
  "columns": [
    {"name": "col1", "type": "string"},
    {"name": "col2", "type": "int"}
  ],
  "data": [
    ["foo", 1],
    ["bar", 2]
  ]
}

Testing

Local Docker: Set AWS credentials in your environment or docker-compose.yml.
ECS Production: The plugin uses the ECS task role for AWS authentication.

Notes

To use custom SparkApplication templates, provide the file path in spark_application_file.
S3 URIs must be accessible by the Spark job and the plugin.
For troubleshooting, check logs in the specified S3 logs URI.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

sparkeks.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL