PySpark Pipeline Framework¶

Configuration-driven PySpark pipeline framework with HOCON configuration, resilience patterns, lifecycle hooks, and streaming support.

Build batch and streaming data pipelines using composable components, HOCON configuration files, and a rich set of operational features including retry policies, circuit breakers, data quality checks, audit trails, secrets management, and checkpoint/resume.

Note

Scala/JVM users may also be interested in spark-pipeline-framework, the Scala implementation using PureConfig and Typesafe Config.

Supported Versions:

Python 3.10 – 3.13
Apache Spark 3.4+ (optional runtime dependency)

Getting Started

Getting Started
Scope & Design

User Guide

Components
Configuration
Configuration Validation
Streaming Pipelines
Resilience
Lifecycle Hooks
Data Quality Checks
Audit Trail
Secrets Management
Schema Contracts
Checkpoint & Resume
Scala Migration Guide

Indices and tables¶

Index
Module Index