Audit Trail¶
Record an immutable audit trail of pipeline execution for compliance and debugging. Audit events capture component starts, completions, failures, and configuration details.
Basic Usage¶
from pyspark_pipeline_framework.core.audit import (
LoggingAuditSink, FileAuditSink, CompositeAuditSink,
)
from pyspark_pipeline_framework.runner import (
AuditHooks, CompositeHooks, LoggingHooks,
SimplePipelineRunner,
)
audit_sink = CompositeAuditSink(
LoggingAuditSink(),
FileAuditSink("/var/log/pipeline-audit.jsonl"),
)
hooks = CompositeHooks(LoggingHooks(), AuditHooks(audit_sink))
runner = SimplePipelineRunner(config, hooks=hooks)
result = runner.run()
Audit Sinks¶
Sink |
Description |
|---|---|
|
Writes audit events to the Python logger |
|
Appends audit events as JSON lines to a file |
|
Fans out events to multiple sinks |
Audit Events¶
Each event includes:
Field |
Description |
|---|---|
|
|
|
|
|
ISO 8601 timestamp |
|
Name of the pipeline |
|
Name of the component (if applicable) |
|
Execution duration in milliseconds |
|
Additional context (error messages, config, etc.) |
Configuration Filtering¶
Use ConfigFilter to redact sensitive values before they appear in audit
events:
from pyspark_pipeline_framework.core.audit.filters import ConfigFilter
# Redact keys containing "password", "secret", or "key"
filter = ConfigFilter(redact_patterns=["password", "secret", "key"])
sink = FileAuditSink("/var/log/audit.jsonl", config_filter=filter)
See Also¶
Lifecycle Hooks - Lifecycle hooks
Data Quality Checks - Data quality checks
Secrets Management - Secrets management