Audit (core.audit)

Types

Audit event types and models.

class pyspark_pipeline_framework.core.audit.types.AuditAction(*values)[source]

Bases: str, Enum

Standard audit actions.

PIPELINE_STARTED = 'pipeline_started'
PIPELINE_COMPLETED = 'pipeline_completed'
PIPELINE_FAILED = 'pipeline_failed'
COMPONENT_STARTED = 'component_started'
COMPONENT_COMPLETED = 'component_completed'
COMPONENT_FAILED = 'component_failed'
COMPONENT_RETRIED = 'component_retried'
SECRET_ACCESSED = 'secret_accessed'
CONFIG_LOADED = 'config_loaded'
class pyspark_pipeline_framework.core.audit.types.AuditStatus(*values)[source]

Bases: str, Enum

Audit event status.

SUCCESS = 'success'
FAILURE = 'failure'
RETRY = 'retry'
WARNING = 'warning'
class pyspark_pipeline_framework.core.audit.types.AuditEvent(action, actor, resource, status, timestamp=<factory>, metadata=<factory>, trace_id=None)[source]

Bases: object

A single audit event.

Contains all information about an auditable action.

Parameters:
  • action (AuditAction | str) – The action that occurred (enum or custom string).

  • actor (str) – Who performed the action (e.g. runner name, component name).

  • resource (str) – What was acted upon (e.g. pipeline name, component class).

  • status (AuditStatus) – Outcome of the action.

  • timestamp (datetime) – When the event occurred.

  • metadata (dict[str, str]) – Additional key-value data.

  • trace_id (str | None) – Correlation ID for distributed tracing.

action: AuditAction | str
actor: str
resource: str
status: AuditStatus
timestamp: datetime
metadata: dict[str, str]
trace_id: str | None = None
to_dict()[source]

Convert to a JSON-serializable dictionary.

Return type:

dict[str, Any]

Sinks

Audit event sinks for emitting audit trails.

class pyspark_pipeline_framework.core.audit.sinks.AuditSink[source]

Bases: ABC

Base class for audit event sinks.

abstractmethod emit(event)[source]

Emit a single audit event.

Parameters:

event (AuditEvent)

Return type:

None

emit_all(events)[source]

Emit multiple audit events.

Parameters:

events (list[AuditEvent])

Return type:

None

close()[source]

Close the sink and release resources.

Return type:

None

class pyspark_pipeline_framework.core.audit.sinks.LoggingAuditSink(logger_name='ppf.audit')[source]

Bases: AuditSink

Emit audit events to Python logging.

Parameters:

logger_name (str) – Logger name to use. Defaults to "ppf.audit".

emit(event)[source]

Emit a single audit event.

Parameters:

event (AuditEvent)

Return type:

None

class pyspark_pipeline_framework.core.audit.sinks.FileAuditSink(path)[source]

Bases: AuditSink

Emit audit events to a JSON-lines file.

The file is opened lazily on the first emit() call.

Parameters:

path (str | Path) – Path to the audit log file.

emit(event)[source]

Emit a single audit event.

Parameters:

event (AuditEvent)

Return type:

None

close()[source]

Close the sink and release resources.

Return type:

None

class pyspark_pipeline_framework.core.audit.sinks.CompositeAuditSink(*sinks)[source]

Bases: AuditSink

Fan out audit events to multiple sinks.

Exceptions raised by individual sinks are caught and logged so that one failing sink does not prevent others from receiving events.

Parameters:

sinks (AuditSink) – One or more audit sinks to fan out to.

emit(event)[source]

Emit a single audit event.

Parameters:

event (AuditEvent)

Return type:

None

close()[source]

Close the sink and release resources.

Return type:

None

Filters

Configuration filter for redacting sensitive values.

class pyspark_pipeline_framework.core.audit.filters.ConfigFilter[source]

Bases: object

Filter sensitive values from configuration dictionaries.

Uses substring matching against common sensitive key patterns.

classmethod scrub(data, replacement='***REDACTED***')[source]

Recursively scrub sensitive values from data.

Keys whose lowercase form contains any pattern in SENSITIVE_PATTERNS are replaced with replacement.

Parameters:
Return type:

dict[str, Any]