Executing a DELETE and running VACUUM physically removes personal data from your Databricks lakehouse. That is the act of erasure. What produces defensible technical documentation of that act is a separate problem, and Unity Catalog’s system tables are the primary tool for solving it.
Under DPDP Rule 8(3), all Data Fiduciaries must retain personal data, associated traffic data, and processing logs for a minimum of one year from the date of processing, for the purposes specified in the Seventh Schedule. The Data Protection Board can request these under Rule 23 during an investigation. These logs are not operational telemetry. They are technical records that support a regulatory inquiry.
This article builds the complete Unity Catalog audit evidence layer for DPDP compliance: the three-system-table architecture, illustrative query patterns that document deletion for an inquiry, the column lineage tool for erasure scope discovery, and the one-year immutable archival pipeline that satisfies Rule 8(3).
In the five-artifact DPDP Erasure Evidence Package defined in the Hub article (DPDP Retention and Erasure on Databricks: How to Prove Deletion and Audit Evidence), Evidence Artifact 3 is the Unity Catalog audit log export. This spoke builds that artifact from first principles.
Section 1: The Unity Catalog System Table Architecture for Compliance
Three Tables, Three Evidence Roles
Unity Catalog exposes three system tables that together constitute the technical audit and lineage record for a DPDP erasure event.
| System Table | What It Captures | DPDP Evidence Role |
|---|---|---|
| system.access.audit | Every data access, DML operation, and administrative event with identity, IP, timestamp, and action name | Primary deletion and VACUUM execution record; who performed what action and when |
| system.access.table_lineage | Every read/write event on Unity Catalog tables with compute context (job, notebook, pipeline) | Confirms which tables were involved in the deletion pipeline; supports cross-layer propagation evidence |
| system.access.column_lineage | Read/write events at column level for supported compute entities, tracing source-to-target flows | Scope-discovery input for identifying downstream PII-derived columns before VACUUM; subject to known limitations |
1.1 Important: system.access.audit Is in Public Preview
As of publication, system.access.audit is labeled Public Preview across all Databricks cloud platforms (AWS, Azure, GCP). This has direct implications for compliance architecture:
- Public Preview features are subject to schema adaptations and behavioral changes without the guarantees of a Generally Available product.
- Regional availability is not uniform across all Databricks deployments.
- Public Preview features do not carry the SLA commitments of production GA services.
For these reasons, compliance teams should not rely solely on system.access.audit as the primary evidence substrate. The traditional account-level JSON audit log delivery path (routing structured JSON logs directly to S3, ADLS, or GCS via the Databricks audit log delivery API or Azure Diagnostic Settings) provides a more stable, production-grade supplement. Run both architectures in parallel: use the system table for operational querying and evidence assembly, and the direct delivery path for the immutable long-term archival record.
1.2 The system.access.audit Schema for Compliance Practitioners
The compliance-relevant fields in system.access.audit:
- event_time: the timestamp of the action. The DPB’s clock for a specific event runs against this value.
- user_identity: a JSON struct containing the email or service principal ID that executed the action.
- action_name: the name of the specific operation logged. Exact values depend on how the operation was submitted and whether verbose audit logging is enabled.
- request_params: a JSON map containing operation-specific details. Exact keys vary by action type and service.
- source_ip_address: the origin IP of the request.
- audit_level: either ‘ACCOUNT_LEVEL’ (Unity Catalog metadata and DML operations) or ‘WORKSPACE_LEVEL’ (compute, notebooks, SQL warehouses).
1.3 The Account-Level vs Workspace-Level Distinction
Unity Catalog DML operations (DELETE, VACUUM, table creates and drops on UC-managed tables) are logged at audit_level = 'ACCOUNT_LEVEL', with workspace_id = 0. The originating workspace is in request_params.workspace_id. Compute-level operations (notebook execution, SQL warehouse queries) are logged at audit_level = 'WORKSPACE_LEVEL'.
A compliance query filtering only for workspace-level events will silently miss all Unity Catalog DML records. Filter for both:
-- Account-level events: Unity Catalog DML operations
WHERE audit_level = 'ACCOUNT_LEVEL'
AND service_name = 'unityCatalog'
-- Workspace-level compute events: who ran the job
WHERE audit_level = 'WORKSPACE_LEVEL'
AND service_name IN ('databrickssql', 'notebook')
1.4 Enabling System Tables and Granting Access
system.access.audit is not automatically queryable. An account admin must enable system table schemas for the metastore. Once enabled, grant SELECT using the explicit Unity Catalog ON TABLE syntax:
GRANT SELECT ON TABLE system.access.audit
TO `compliance-service-principal@company.com`;
GRANT SELECT ON TABLE system.access.table_lineage
TO `compliance-service-principal@company.com`;
GRANT SELECT ON TABLE system.access.column_lineage
TO `compliance-service-principal@company.com`;
Enable these permissions proactively. If grants are not in place when a deletion event occurs, that event’s audit record may be present in the system table but unavailable to the compliance team at the time of an inquiry.
Section 2: Illustrative Deletion Evidence Query Patterns
A Query Library for DPDP Erasure Documentation
The following query patterns illustrate how to extract deletion-related evidence from system.access.audit. They are designed as starting points for exploration and adaptation, not as guaranteed production filters with fixed action name strings.
Critical note on action names: Interactive DML operations executed via Databricks notebooks or Databricks SQL frequently appear in audit logs as commandSubmit or commandFinish events with the actual SQL text embedded in request_params, rather than as discrete action names like ‘delete’ or ‘vacuum’. The exact values logged depend on how the operation was submitted, the workspace configuration, and whether verbose audit logging is enabled.
Run this discovery query in your workspace first to understand the actual action names and event structures generated for your specific deletion and VACUUM operations:
-- Discovery query: find all audit events in a time window after a known deletion
SELECT DISTINCT
service_name,
action_name,
audit_level,
COUNT(*) AS event_count
FROM system.access.audit
WHERE event_time BETWEEN '2025-06-10T02:00:00' AND '2025-06-10T05:00:00'
GROUP BY service_name, action_name, audit_level
ORDER BY event_count DESC;
Use the output of this discovery query to calibrate the action_name filters in the patterns below before building production evidence pipelines.
2.1 Pattern 1: The Deletion Action Record
SELECT
event_time,
user_identity:email AS executed_by,
service_name,
action_name,
request_params:tableName AS table_name,
request_params:numDeletedRows AS rows_deleted,
source_ip_address,
response:statusCode AS status_code
FROM system.access.audit
WHERE (
action_name = 'delete'
OR (action_name = 'commandSubmit'
AND request_params:commandText LIKE '%DELETE%customer_profiles%')
)
AND event_time >= '2025-06-10T00:00:00'
AND event_time < '2025-06-11T00:00:00'
ORDER BY event_time ASC;
The event_time is the timestamp. The executed_by identity links the action to an accountable principal. The status_code confirms whether the operation succeeded.
2.2 Pattern 2: The VACUUM Execution Record
SELECT
event_time,
user_identity:email AS executed_by,
action_name,
request_params:tableName AS table_name,
request_params:numDeletedFiles AS files_deleted,
request_params:numVacuumedDirectories AS directories_vacuumed,
source_ip_address,
response:statusCode AS status_code
FROM system.access.audit
WHERE (
action_name IN ('vacuum', 'vacuumEnd')
OR (action_name = 'commandSubmit'
AND request_params:commandText LIKE '%VACUUM%customer_profiles%')
)
AND event_time >= '2025-06-10T03:00:00'
ORDER BY event_time ASC;
Validate that the numDeletedFiles field is populated in your workspace’s audit output for VACUUM events. For some configurations, this detail may appear only in the Delta transaction log DESCRIBE HISTORY rather than in system.access.audit.
2.3 Pattern 3: The Complete Chronological Chain for One Erasure Request
SELECT
event_time,
user_identity:email AS actor,
service_name,
action_name,
COALESCE(
request_params:tableName,
request_params:commandText
) AS operation_context,
response:statusCode AS status
FROM system.access.audit
WHERE event_time BETWEEN '2025-06-08T14:00:00' AND '2025-06-10T04:00:00'
AND (
request_params:tableName IN (
'catalog.bronze.customer_raw',
'catalog.silver.customer_profiles',
'catalog.gold.customer_segments'
)
OR request_params:commandText LIKE '%customer_profiles%'
OR request_params:commandText LIKE '%dp-00142%'
)
ORDER BY event_time ASC;
2.4 Pattern 4: Post-Erasure No-Access Verification
This pattern is most effective when verbose audit logging is enabled for the workspace. Verbose logging captures the full command text in commandSubmit events, enabling searches for specific Data Principal identifiers in queries run after erasure.
SELECT
event_time,
user_identity:email AS accessor,
action_name,
request_params:commandText AS query_text,
source_ip_address
FROM system.access.audit
WHERE event_time > '2025-06-10T03:15:18' -- After VACUUM END timestamp
AND service_name IN ('databrickssql', 'notebook')
AND (
request_params:commandText LIKE '%customer_profiles%'
OR request_params:commandText LIKE '%dp-00142%'
)
ORDER BY event_time ASC;
-- Zero rows: no post-erasure access recorded in audit logs
-- (limited to events captured by verbose audit logging)
2.5 Pattern 5: The 48-Hour Notification Confirmation
For Third Schedule platform entities, DPDP Rule 8 requires 48 hours’ notice before inactivity-based erasure. This pattern joins the erasure_requests registry with the audit table to produce evidence that the notification interval was honored before the DELETE was executed.
SELECT
er.request_id,
er.data_principal_id,
er.notification_sent,
er.erasure_executed,
TIMESTAMPDIFF(HOUR, er.notification_sent, er.erasure_executed)
AS hours_elapsed,
a.event_time AS audit_delete_time,
a.user_identity:email AS executed_by,
a.source_ip_address
FROM compliance.dpdp.erasure_requests er
JOIN system.access.audit a
ON a.event_time > er.notification_sent
AND a.event_time < TIMESTAMPADD(DAY, 4, er.notification_sent)
AND (
a.action_name = 'delete'
OR (a.action_name = 'commandSubmit'
AND a.request_params:commandText LIKE '%DELETE%customer_profiles%')
)
WHERE er.request_id = 'ER-2025-00142'
ORDER BY a.event_time;
-- hours_elapsed should be >= 48 to confirm the notification window was honored
Section 3: Using Column Lineage to Determine the Erasure Scope
Column Lineage as a Scope-Discovery Input
Before executing cross-layer VACUUM, you must know every downstream table that contains columns derived from the PII fields you are deleting. Running VACUUM on Bronze without having deleted PII from a Gold aggregate that was computed from Bronze email or phone fields is a compliance failure: the personal data still exists in derived form.
system.access.column_lineage records read/write events at column level for supported Databricks compute entities. It is a powerful scope-discovery input, not an authoritative or complete map of all derived PII in all circumstances. Known limitations:
- Delta Live Tables and Lakeflow Declarative Pipelines: these capture table-level lineage only. Column-level lineage is not recorded for transformations run within DLT or Lakeflow pipelines. Supplement with manual data mapping review for these compute types.
- Path-based table references: tables referenced by cloud storage path rather than full
catalog.schema.table_namelose column-level mapping. All PII tables should be referenced by catalog name in compliance-relevant workloads. - External compute systems: lineage from Spark jobs run outside Databricks Unity Catalog governance may not appear in
system.access.column_lineage.
Treat the output of the column lineage query as the starting point for erasure scope definition, and verify it alongside a manual data mapping review before executing cross-layer VACUUM.
This lineage discovery is also the input to the medallion purging order. Executing physical VACUUM purges in Gold-first, Silver-then-Bronze order is the recommended default pattern for this evidence-preserving workflow, unless lineage and downstream recomputation are otherwise proven for your specific architecture. For the complete VACUUM orchestration that follows this scope query, see Managing Delta Lake VACUUM and Time Travel for DPDP Right to Erasure Compliance.
3.1 The PII Column Downstream Scope Query
SELECT DISTINCT
cl.target_table_full_name AS downstream_table,
cl.target_column_name AS derived_column,
cl.source_table_full_name AS source_table,
cl.source_column_name AS source_pii_column,
cl.entity_type AS compute_type
FROM system.access.column_lineage cl
WHERE cl.source_table_full_name =
'catalog.bronze.customer_raw'
AND cl.source_column_name IN (
'email', 'phone', 'aadhaar_hash', 'pan_hash'
)
ORDER BY
cl.target_table_full_name,
cl.target_column_name;
Every table in the result set should be included in the DELETE and VACUUM scope, subject to manual verification that the lineage captured reflects the current state of all pipelines.
3.2 Table Lineage for Cross-Workspace Erasure Scope
When a Databricks account has multiple workspaces, PII may flow through one and be read in another. Since Unity Catalog is account-scoped, system.access.table_lineage captures cross-workspace flows:
SELECT
tl.source_table_full_name,
tl.target_table_full_name,
tl.workspace_id,
tl.entity_type,
tl.event_time
FROM system.access.table_lineage tl
WHERE tl.source_table_full_name =
'catalog.bronze.customer_raw'
AND tl.event_time >= '2025-01-01'
ORDER BY tl.event_time DESC;
Any workspace_id in the results represents a separate erasure obligation. That workspace must also execute DELETE and VACUUM against the tables it consumed from the source.
Section 4: Building the One-Year Immutable Log Archive
The Native Retention Window and Why an Archive Is Still Required
system.access.audit natively retains each individual event for 365 days from that specific event’s own timestamp, providing a full year of native availability per event. However, building a dedicated immutable export pipeline remains mandatory for three critical reasons:
First, system.access.audit is in Public Preview. Schema changes, regional availability gaps, or feature deprecations can occur without the guarantees of a GA product. A compliance substrate that depends on a preview-status table carries inherent architectural risk for long-term regulatory evidence.
Second, the system table alone provides no tamper-evidence guarantee. An immutable export backed by S3 Object Lock, Azure Immutable Blob, or GCS Bucket Lock creates a write-once archive that cannot be altered retroactively, providing a stronger chain of custody for regulatory purposes.
Third, the traditional JSON audit log delivery path (direct delivery to cloud storage) is a more stable, production-grade substrate that operates independently of system table availability. Combining both architectures provides operational queryability through the Delta table and maximum durability through the object-locked delivery path.
4.1 Architectural Archive Options
| Option | Access Pattern | Queryability | DPDP Suitability |
|---|---|---|---|
| Option A: Append-only archive Delta table backed by object-locked storage | SQL queryable from Databricks | Full SQL with all system table query patterns; ideal for on-demand evidence assembly | Recommended for operational compliance querying |
| Option B: Direct JSON delivery to S3/ADLS with Object Lock | JSON files in cloud storage; requires Spark or Athena to query | Maximum tamper-evidence guarantee; independent of Public Preview system table | Best for long-term immutable regulatory archival; more stable production substrate |
Combine both in production: Option A for operational querying and evidence assembly, Option B as the immutable legal archive that is independent of system table availability.
4.2 The Daily Export Pipeline
-- Create archive table (appendOnly appropriate here: write-once event log)
CREATE TABLE IF NOT EXISTS compliance.logs.uc_audit_archive
USING DELTA
TBLPROPERTIES ('delta.appendOnly' = 'true')
AS SELECT * FROM system.access.audit WHERE 1=0;
-- Daily incremental export: run as compliance-service-principal
INSERT INTO compliance.logs.uc_audit_archive
SELECT *
FROM system.access.audit
WHERE event_date = CURRENT_DATE - INTERVAL 1 DAY;
Back this archive table to S3 Object Lock (Compliance mode, 366-day minimum), Azure Immutable Blob Storage, or GCS Bucket Lock. Build retry logic and alerting for missed days.
4.3 DENY Grants to Prevent Archive Modification
Use DENY MODIFY rather than individual delete grants. DENY MODIFY comprehensively blocks unauthorized INSERT, UPDATE, DELETE, and MERGE operations. Validate privilege syntax in your workspace before deploying, as behavior can vary by securable and metastore setup:
DENY MODIFY ON TABLE compliance.logs.uc_audit_archive
TO `data_engineers`;
DENY MODIFY ON TABLE compliance.logs.uc_audit_archive
TO `workspace_admins`;
Even an accidental modification attempt by an admin generates a DENY event in system.access.audit, preserving the chain of custody.
Section 5: Assembling the DPB Inquiry Response Package
5.1 What the Response Must Contain
A formal information request under Rule 23 requires technical documentation answering six questions:
- Was the Data Principal’s personal data held and under what stated purpose?
- Was the 48-hour notification dispatched and when?
- Was the DELETE executed by a verified identity?
- Was VACUUM executed after DELETE, confirming physical removal from the primary storage layer?
- Did any access to the deleted data occur after VACUUM completed?
- Are all processing logs retained and retrievable?
5.2 Python SDK Automation Template
This script runs evidence queries and compiles a single timestamped JSON report. Adapt the action_name filters based on your workspace’s discovery query output:
from databricks.sdk import WorkspaceClient
import json
def build_evidence_package(request_id, start_ts, end_ts):
deletion_q = (
"SELECT event_time, user_identity:email AS executed_by,"
" service_name, action_name,"
" COALESCE(request_params:tableName,"
" request_params:commandText) AS operation_context,"
" response:statusCode AS status_code"
" FROM system.access.audit"
f" WHERE event_time BETWEEN '{start_ts}' AND '{end_ts}'"
" AND (action_name = 'delete'"
" OR (action_name = 'commandSubmit'"
" AND request_params:commandText LIKE '%DELETE%'))"
)
evidence = {
"request_id": request_id,
"generated_at": str(spark.sql(
"SELECT current_timestamp()"
).collect()[0][0]),
"deletion_record": spark.sql(deletion_q).toPandas().to_dict("records"),
}
output_path = (
f"/dbfs/compliance-archive/evidence/"
f"{request_id}/audit_evidence.json"
)
with open(output_path, "w") as f:
json.dump(evidence, f, indent=2, default=str)
print(f"Evidence package written to {output_path}")
return output_path
5.3 Chain of Custody via Self-Referential Auditing
The execution of this script by the compliance service principal generates its own audit event in system.access.audit. The write to the output path, the notebook run, and the SQL queries are all logged, creating a self-referential chain: the evidence package documents the deletion, and the package’s creation is itself auditable from the same system.
Section 6: Common Audit Log Compliance Mistakes
Treating the native retention as sufficient without an immutable export. While each event natively retains for 365 days, system.access.audit is in Public Preview. Build a daily export to an immutable archive to protect against schema changes and provide a tamper-evident substrate.
Not enabling system table access before an investigation opens. Account admin grants and SELECT permissions must be in place before deletion events occur.
Filtering only workspace-level events in compliance queries. Unity Catalog DML events are at audit_level = 'ACCOUNT_LEVEL'. Workspace-only filters miss them entirely.
Assuming discrete action_name values will always match. Interactive operations via notebooks or SQL warehouses often appear under commandSubmit with command text in parameters. Run the discovery query first to understand your workspace’s actual event signatures.
Not enabling verbose audit logging for PII tables. Standard logging does not capture query text. Verbose mode is required for the post-erasure no-access verification pattern.
Treating column_lineage as authoritative and complete erasure scope. Use it as a scope-discovery input and verify alongside manual data mapping. Known gaps: DLT/Lakeflow pipelines and path-based references.
Writing the archive under a human identity. Use a dedicated service principal. Human-identity writes cannot be clearly distinguished from potential tampering events in a later audit.
Using DENY DELETE instead of DENY MODIFY on the audit archive. DENY MODIFY is the correct and comprehensive control. It blocks INSERT, UPDATE, DELETE, and MERGE in a single grant.
Section 7: Implementation Checklist: Unity Catalog Audit Logs for DPDP
- Enable system table access as account admin for the metastore.
- Grant
SELECT ON TABLE(not justON) forsystem.access.audit,system.access.table_lineage, andsystem.access.column_lineageto the compliance service principal. - Enable verbose audit logging for all workspaces processing PII data to capture command text in audit records.
- Run the discovery query against
system.access.auditafter a test deletion to identify the actualaction_namevalues and event structures in your workspace. - Set up the traditional JSON audit log delivery path to S3/ADLS/GCS as a stable supplementary archive alongside the system table.
- Create the archive table (
compliance.logs.uc_audit_archive) withdelta.appendOnly = 'true'. - Build the daily export Databricks Workflow with retry logic and alerting for missed days.
- Apply
DENY MODIFY ON TABLEto the archive table for all non-service-principal identities. Validate privilege syntax in your workspace before deploying. - Back the archive to object-locked storage: S3 Object Lock (Compliance mode), Azure Immutable Blob, or GCS Bucket Lock with 366-day minimum retention.
- Before every erasure: run the column_lineage scope query (Section 3.1) and verify the result against a manual data mapping review.
- After every erasure: run the evidence query patterns and assemble the audit evidence JSON report using the SDK script in Section 5.2.
- Store the evidence JSON in the per-request compliance archive path.
- Verify self-referential audit: confirm the evidence package creation event is queryable in
system.access.audit. - Retain each per-request report for one year from the date of the deletion event.
Section 8: Conclusion
System Tables Are Now Technical Evidence Infrastructure
Under India’s DPDP Rules 2025, notified November 13, 2025 with full applicability from May 13, 2027, Unity Catalog system tables are no longer passive infrastructure metrics. They are the technical evidence infrastructure that supports provable compliance with DPDP erasure obligations. The query patterns in this article, combined with the immutable archive architecture and the column lineage scope tool, produce a structured, audit-supporting evidence set for every erasure event.
Building this infrastructure correctly means: daily export pipelines backed by tamper-evident storage, DENY MODIFY grants that prevent archive modification, column lineage queries verified against manual data mapping, and an automated evidence report that assembles under a traceable service principal identity.
Sinki.ai’s Data Erasure solution covers the complete DPDP deletion lifecycle
Natively inside your Databricks workspace — automated pre-flight lineage discovery, multi-layer deletion verification, structured JSON audit evidence generation, and immutable archival storage pipelines for long-term compliance assurance.
Disclaimer: This article provides technical architecture and implementation guidance only and does not constitute formal legal advice. Organizations should consult qualified legal counsel to assess their specific compliance obligations under the DPDP Act 2023 and applicable sectoral regulations.
Frequently Asked Questions
system.access.audit records every data access, DML operation, and administrative event in your Databricks account with identity, IP, timestamp, and action details. This table is in Public Preview. The exact action_name values vary by how the operation was submitted and whether verbose audit logging is enabled. Run the discovery query in Section 2 to identify specific event signatures in your workspace before building production evidence pipelines.
system.access.audit natively retains each event for 365 days from that specific event’s own timestamp, providing a full year of native availability. However, building a dedicated immutable export pipeline remains mandatory because the system table is in Public Preview (subject to schema changes and regional gaps), provides no tamper-evidence guarantee, and relies on a preview feature as the sole compliance substrate. The traditional JSON delivery path to object-locked cloud storage provides a more stable, production-grade supplement.
Unity Catalog DML operations are logged at audit_level = 'ACCOUNT_LEVEL' with workspace_id = 0. Compute-level operations (notebooks, SQL warehouses) are at audit_level = 'WORKSPACE_LEVEL'. A compliance query filtering only workspace-level events misses all Unity Catalog DML records entirely.
Query system.access.column_lineage filtering on source PII columns. The result set returns downstream tables and columns derived from those sources. Treat this as a scope-discovery input, not an authoritative complete map. Known gaps include DLT/Lakeflow pipelines (table-level only) and path-based table references (lose column mapping). Verify alongside a manual data mapping review.
Verbose audit logging captures the full command text in commandSubmit and commandFinish events. Without verbose mode, interactive DML operations appear in audit logs with an action name but without the SQL text. Enable verbose logging for all workspaces processing PII data before building the post-erasure no-access verification pattern.
Combine two options. Option A: create an append-only Delta table backed by object-locked storage and run a daily Workflow exporting the previous day’s audit events. Apply DENY MODIFY ON TABLE to all non-service-principal identities. Option B: configure account-level audit log delivery to S3 or ADLS with S3 Object Lock in Compliance mode, Azure Immutable Blob, or GCS Bucket Lock at 366-day minimum retention. Option B is the more stable substrate because it is independent of the Public Preview system table.
Five patterns cover the core DPDP evidence requirement: (1) the Deletion Action Record, (2) the VACUUM Execution Record, (3) the Complete Chronological Chain, (4) the Post-Erasure No-Access Verification (requires verbose logging), and (5) the 48-Hour Notification Confirmation. Run the discovery query first to calibrate action_name values for your workspace before using these patterns in production.
The Hub article defines a five-artifact evidence package. Artifact 3 (this article) provides a second, independent technical record of the deletion event from the access control layer, corroborating the transaction history evidence in Artifact 2 (DESCRIBE HISTORY) and creating a two-layer chain of technical documentation.