PySpark Pipeline Framework

Configuration-driven PySpark pipeline framework with HOCON configuration, resilience patterns, lifecycle hooks, and streaming support.

Build batch and streaming data pipelines using composable components, HOCON configuration files, and a rich set of operational features including retry policies, circuit breakers, data quality checks, audit trails, secrets management, and checkpoint/resume.

Note

Scala/JVM users may also be interested in spark-pipeline-framework, the Scala implementation using PureConfig and Typesafe Config.

Supported Versions:

  • Python 3.10 – 3.13

  • Apache Spark 3.4+ (optional runtime dependency)

User Guide

Indices and tables