Documentation
¶
Overview ¶
Package spark provides shared utilities for Spark-based execution environments such as EMR Serverless and Dataproc Serverless.
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var BruinExcludes = []string{
"README.md",
".bruin.yml",
"pipeline.yml",
"pipeline.yaml",
}
BruinExcludes contains files that should be excluded from Spark context packages.
View Source
var DirExcludes = []*regexp.Regexp{ regexp.MustCompile(`(^|[/\\])\.venv([/\\]|$)`), regexp.MustCompile(`(^|[/\\])venv([/\\]|$)`), regexp.MustCompile(`^logs([/\\]|$)`), regexp.MustCompile(`^\.git([/\\]|$)`), }
DirExcludes contains regex patterns for directories that should be excluded from Spark context packages.
Functions ¶
func PackageContext ¶
PackageContext creates a zip archive from the given filesystem, suitable for Spark execution. It's a modified version of zip.AddFS() with:
- Exclusion of Bruin configuration files and virtual environments
- Automatic creation of __init__.py files in directories for Python package support
Spark requires directories to contain __init__.py to be treated as packages.
Types ¶
This section is empty.
Click to show internal directories.
Click to hide internal directories.