Session (runtime.session)¶
Wrapper¶
Spark session lifecycle management.
- class pyspark_pipeline_framework.runtime.session.wrapper.SparkSessionWrapper(config=None)[source]¶
Bases:
objectManages SparkSession lifecycle for pipeline execution.
Supports: - Local mode (master=”local[*]”) - Cluster mode (master=”yarn”, “spark://…”) - Spark Connect (connect_string=”sc://…”)
Example
>>> wrapper = SparkSessionWrapper.get_or_create(config) >>> df = wrapper.spark.read.parquet("data.parquet") >>> wrapper.stop()
# Or as context manager: >>> with SparkSessionWrapper(config) as wrapper: … df = wrapper.spark.read.parquet(“data.parquet”)
- Parameters:
config (SparkConfig | None)
- __init__(config=None)[source]¶
Initialize wrapper with optional config.
- Parameters:
config (SparkConfig | None) – Spark configuration. If None, uses defaults with app_name=”pyspark-pipeline”.
- Return type:
None
- classmethod get_or_create(config=None)[source]¶
Get singleton instance or create new one.
Thread-safe singleton access. First call creates the instance, subsequent calls return the same instance (ignoring config).
- Parameters:
config (SparkConfig | None) – Spark configuration for first initialization.
- Returns:
The singleton SparkSessionWrapper instance.
- Return type:
- classmethod reset()[source]¶
Reset singleton instance.
Stops any owned session and clears the singleton. Primarily used for testing.
- Return type:
None
- property spark: SparkSession¶
Get or create SparkSession.
Lazily creates the session on first access. Thread-safe.
- Returns:
Active SparkSession instance.
- property spark_context: Any¶
Get SparkContext from session.
Note: Not available when using Spark Connect mode.
- Returns:
SparkContext instance.
- Raises:
RuntimeError – If using Spark Connect (no SparkContext available).
- property sql_context: Any¶
Get SQLContext from session.
Deprecated since version SQLContext: is deprecated since Spark 2.0. Use SparkSession directly. This property may be removed in future versions.
- Returns:
SQLContext instance.
- Raises:
RuntimeError – If using Spark Connect or PySpark 4.0+.