Audit (`core.audit`)¶

Types¶

Audit event types and models.

class pyspark_pipeline_framework.core.audit.types.AuditAction(*values)[source]¶

Bases: str, Enum

Standard audit actions.

PIPELINE_STARTED = 'pipeline_started'¶

PIPELINE_COMPLETED = 'pipeline_completed'¶

PIPELINE_FAILED = 'pipeline_failed'¶

COMPONENT_STARTED = 'component_started'¶

COMPONENT_COMPLETED = 'component_completed'¶

COMPONENT_FAILED = 'component_failed'¶

COMPONENT_RETRIED = 'component_retried'¶

SECRET_ACCESSED = 'secret_accessed'¶

CONFIG_LOADED = 'config_loaded'¶

class pyspark_pipeline_framework.core.audit.types.AuditStatus(*values)[source]¶

Bases: str, Enum

Audit event status.

SUCCESS = 'success'¶

FAILURE = 'failure'¶

RETRY = 'retry'¶

WARNING = 'warning'¶

class pyspark_pipeline_framework.core.audit.types.AuditEvent(action, actor, resource, status, timestamp=<factory>, metadata=<factory>, trace_id=None)[source]¶

Bases: object

A single audit event.

Contains all information about an auditable action.

Parameters:

action (AuditAction | str) – The action that occurred (enum or custom string).
actor (str) – Who performed the action (e.g. runner name, component name).
resource (str) – What was acted upon (e.g. pipeline name, component class).
status (AuditStatus) – Outcome of the action.
timestamp (datetime) – When the event occurred.
metadata (dict[str, str]) – Additional key-value data.
trace_id (str | None) – Correlation ID for distributed tracing.

action: AuditAction | str¶

actor: str¶

resource: str¶

status: AuditStatus¶

timestamp: datetime¶

metadata: dict[str, str]¶

trace_id: str | None = None¶

to_dict()[source]¶

Convert to a JSON-serializable dictionary.

Return type:: dict[str, Any]

Sinks¶

Audit event sinks for emitting audit trails.

class pyspark_pipeline_framework.core.audit.sinks.AuditSink[source]¶

Bases: ABC

Base class for audit event sinks.

abstractmethod emit(event)[source]¶

Emit a single audit event.

Parameters:: event (AuditEvent)
Return type:: None

emit_all(events)[source]¶

Emit multiple audit events.

Parameters:: events (list[AuditEvent])
Return type:: None

close()[source]¶

Close the sink and release resources.

Return type:: None

class pyspark_pipeline_framework.core.audit.sinks.LoggingAuditSink(logger_name='ppf.audit')[source]¶

Bases: AuditSink

Emit audit events to Python logging.

Parameters:: logger_name (str) – Logger name to use. Defaults to "ppf.audit".

emit(event)[source]¶

Emit a single audit event.

Parameters:: event (AuditEvent)
Return type:: None

class pyspark_pipeline_framework.core.audit.sinks.FileAuditSink(path)[source]¶

Bases: AuditSink

Emit audit events to a JSON-lines file.

The file is opened lazily on the first emit() call.

Parameters:: path (str | Path) – Path to the audit log file.

emit(event)[source]¶

Emit a single audit event.

Parameters:: event (AuditEvent)
Return type:: None

close()[source]¶

Close the sink and release resources.

Return type:: None

class pyspark_pipeline_framework.core.audit.sinks.CompositeAuditSink(*sinks)[source]¶

Bases: AuditSink

Fan out audit events to multiple sinks.

Exceptions raised by individual sinks are caught and logged so that one failing sink does not prevent others from receiving events.

Parameters:: sinks (AuditSink) – One or more audit sinks to fan out to.

emit(event)[source]¶

Emit a single audit event.

Parameters:: event (AuditEvent)
Return type:: None

close()[source]¶

Close the sink and release resources.

Return type:: None

Filters¶

Configuration filter for redacting sensitive values.

class pyspark_pipeline_framework.core.audit.filters.ConfigFilter[source]¶

Bases: object

Filter sensitive values from configuration dictionaries.

Uses substring matching against common sensitive key patterns.

classmethod scrub(data, replacement='***REDACTED***')[source]¶

Recursively scrub sensitive values from data.

Keys whose lowercase form contains any pattern in SENSITIVE_PATTERNS are replaced with replacement.

Parameters:

data (dict[str, Any])
replacement (str)

Return type:

dict[str, Any]

Audit (core.audit)¶

Types¶

Sinks¶

Filters¶

Audit (`core.audit`)¶