Resilience¶
pyspark-pipeline-framework provides two resilience patterns for handling transient failures: retry with exponential backoff and circuit breaker. Both can be configured per-component in HOCON or used programmatically.
Retry Policy¶
Configure per-component retries with exponential backoff and optional jitter:
components: [
{
name: "flaky_source"
component_type: source
class_path: "my.module.FlakySource"
retry {
max_attempts: 3
initial_delay_seconds: 1.0
max_delay_seconds: 30.0
backoff_multiplier: 2.0
}
}
]
Retry parameters:
Parameter |
Default |
Description |
|---|---|---|
|
3 |
Maximum number of attempts (including the first) |
|
1.0 |
Delay before the first retry |
|
60.0 |
Upper bound on delay between retries |
|
2.0 |
Multiply delay by this factor after each retry |
Programmatic Usage¶
from pyspark_pipeline_framework.core.resilience.retry import RetryExecutor
executor = RetryExecutor(
max_attempts=3,
initial_delay=1.0,
max_delay=30.0,
backoff_multiplier=2.0,
)
result = executor.execute(my_function, args=(arg1,))
Circuit Breaker¶
Prevent repeated calls to a failing component. The circuit breaker tracks consecutive failures and opens the circuit when a threshold is reached, rejecting calls until a timeout expires:
components: [
{
name: "external_api"
component_type: source
class_path: "my.module.ApiSource"
circuit_breaker {
failure_threshold: 5
timeout_seconds: 60.0
}
}
]
Circuit breaker parameters:
Parameter |
Default |
Description |
|---|---|---|
|
5 |
Consecutive failures before opening the circuit |
|
60.0 |
Seconds to wait before allowing a test request |
State machine:
┌──────────┐ failure_threshold ┌──────────┐ timeout expires ┌───────────┐
│ CLOSED │ ─────────────────→ │ OPEN │ ────────────────→ │ HALF_OPEN │
└──────────┘ └──────────┘ └───────────┘
↑ │
│ success │
└────────────────────────────────────────────────────────────────┘
failure → back to OPEN
Programmatic Usage¶
from pyspark_pipeline_framework.core.resilience.circuit_breaker import (
CircuitBreaker,
)
breaker = CircuitBreaker(failure_threshold=5, timeout_seconds=60.0)
try:
result = breaker.call(my_function)
except CircuitBreakerOpenError:
# Circuit is open, handle gracefully
pass
Combining Retry and Circuit Breaker¶
Use both together for maximum resilience. The runner applies retry first, then circuit breaker:
components: [
{
name: "external_api"
component_type: source
class_path: "my.module.ApiSource"
retry {
max_attempts: 3
initial_delay_seconds: 1.0
}
circuit_breaker {
failure_threshold: 5
timeout_seconds: 60.0
}
}
]
See Also¶
Lifecycle Hooks - Lifecycle hooks for monitoring retries
Checkpoint & Resume - Resume failed pipelines