Session (runtime.session)

Wrapper

Spark session lifecycle management.

class pyspark_pipeline_framework.runtime.session.wrapper.SparkSessionWrapper(config=None)[source]

Bases: object

Manages SparkSession lifecycle for pipeline execution.

Supports: - Local mode (master=”local[*]”) - Cluster mode (master=”yarn”, “spark://…”) - Spark Connect (connect_string=”sc://…”)

Example

>>> wrapper = SparkSessionWrapper.get_or_create(config)
>>> df = wrapper.spark.read.parquet("data.parquet")
>>> wrapper.stop()

# Or as context manager: >>> with SparkSessionWrapper(config) as wrapper: … df = wrapper.spark.read.parquet(“data.parquet”)

Parameters:

config (SparkConfig | None)

__init__(config=None)[source]

Initialize wrapper with optional config.

Parameters:

config (SparkConfig | None) – Spark configuration. If None, uses defaults with app_name=”pyspark-pipeline”.

Return type:

None

classmethod get_or_create(config=None)[source]

Get singleton instance or create new one.

Thread-safe singleton access. First call creates the instance, subsequent calls return the same instance (ignoring config).

Parameters:

config (SparkConfig | None) – Spark configuration for first initialization.

Returns:

The singleton SparkSessionWrapper instance.

Return type:

SparkSessionWrapper

classmethod reset()[source]

Reset singleton instance.

Stops any owned session and clears the singleton. Primarily used for testing.

Return type:

None

property spark: SparkSession

Get or create SparkSession.

Lazily creates the session on first access. Thread-safe.

Returns:

Active SparkSession instance.

property spark_context: Any

Get SparkContext from session.

Note: Not available when using Spark Connect mode.

Returns:

SparkContext instance.

Raises:

RuntimeError – If using Spark Connect (no SparkContext available).

property sql_context: Any

Get SQLContext from session.

Deprecated since version SQLContext: is deprecated since Spark 2.0. Use SparkSession directly. This property may be removed in future versions.

Returns:

SQLContext instance.

Raises:

RuntimeError – If using Spark Connect or PySpark 4.0+.

property is_connect_mode: bool

Check if using Spark Connect mode.

set_spark_session(spark)[source]

Inject an existing SparkSession.

Use when running in an existing Spark environment (e.g., Databricks, EMR notebooks).

Parameters:

spark (SparkSession) – Existing SparkSession to use.

Return type:

None

stop()[source]

Stop SparkSession if we own it.

Return type:

None