DPDP Retention & Erasure on Databricks: Audit Proof

Your legal team has signed off on the DPDP compliance policy. Now someone has to make the data platform actually comply. That someone is the data engineer.

India’s Digital Personal Data Protection Rules 2025 were notified on November 13, 2025. The substantive compliance obligations (covering erasure, Data Principal rights, and breach notification) come into force 18 months from notification, landing on May 13, 2027. The gap between today and that date is not breathing room. It is the engineering runway for building deletion workflows, evidence archives, and retention clocks that will support the Data Protection Board of India in a live investigation.

This article is the blueprint. It maps every DPDP erasure obligation directly to Databricks architecture, exposes the compliance traps that no other guide addresses (including the deletion vector purge failure that leaves PII physically intact after a logical delete), and builds a structured, audit-supporting evidence package from first principles.

Whether you run a fintech lakehouse on AWS, an e-commerce platform on Azure, or a SaaS product on GCP, if your Databricks workspaces hold personal data of Indian residents, the obligations below apply to you.


Section 1 : The Stakes

Why DPDP Erasure Is Now a Data Engineering Problem

DPDP is not GDPR with an Indian flag. Several obligations are structurally different and technically harder to automate.

The first is the 48-hour pre-erasure notification. For covered platforms in the Third Schedule (e-commerce, online gaming, and social media entities above specified user thresholds), before scheduled automated erasure due to user inactivity, a Data Fiduciary must notify the affected Data Principal at least 48 hours in advance. For a Databricks shop running automated VACUUM pipelines on large platforms, this means the erasure pipeline cannot be a simple cron deletion. It must be a two-phase orchestrated workflow: notify, wait, then physically purge.

The second difference is the inactivity-triggered erasure deadline defined in the Third Schedule. For e-commerce entities with over two crore users, online gaming intermediaries with over fifty lakh users, and social media intermediaries with over two crore users, the purpose of processing is deemed to be no longer served if the Data Principal has not engaged for three years. These entities must then erase the data. This is an erasure deadline, not a minimum retention guarantee.

The third is the mandatory one-year log retention under Rule 8(3). All Data Fiduciaries must retain personal data, associated traffic data, and processing logs for a minimum of one year from the date of processing, for the purposes specified in the Seventh Schedule. These logs are legal evidence in a DPB investigation.

Under the Schedule to the DPDP Act 2023, failure to implement reasonable security safeguards can attract penalties up to Rs 250 crore. Failure to notify the Board or Data Principals of a personal data breach carries up to Rs 200 crore.


Section 2 : The Legal Foundation

What DPDP Actually Requires You to Do

2.1 The Retention Obligation (Section 8(7) and Rule 8)

Personal data must be erased as soon as the purpose for which it was collected is no longer being served. Three triggers fire this obligation:

  • Consent withdrawal: The Data Principal revokes consent and no overriding legal basis exists.
  • Purpose completion: The specified processing purpose has been fulfilled and data is no longer needed.
  • Inactivity threshold: For Third Schedule entities, the purpose is deemed no longer served when the Data Principal has not engaged within the prescribed period.

Absent a valid legal basis, explicit consent, or overriding sectoral and statutory retention requirements (such as legal holds), a purpose-end trigger mandates immediate erasure.

2.2 The Third Schedule: Inactivity-Triggered Erasure Deadlines

SectorUser Volume ThresholdDeemed Purpose-End (Erasure Deadline)
E-Commerce2 crore+ users3 years from last transaction or login
Online Gaming50 lakh+ users3 years from last login
Social Media2 crore+ users3 years from last login
General (all other fiduciaries)Any sizeWhen actual purpose ends or consent withdrawn

These periods define when erasure must occur, not how long data must be kept. Sectoral laws (RBI, SEBI, GST) can mandate longer retention and override the DPDP erasure trigger for regulated records.

2.3 The 48-Hour Pre-Erasure Notification (Rule 8)

For Third Schedule entities triggering inactivity-based erasure, a Data Fiduciary must notify the Data Principal at least 48 hours in advance. Automated erasure jobs must operate in two phases: Phase 1 identifies records at threshold, inserts into the erasure queue, and dispatches notification. Phase 2 executes only after 48 hours elapse without re-engagement. Both phases must be timestamped and logged.

2.4 The Log Retention Obligation (Rule 8(3))

Rule 8(3) requires all Data Fiduciaries to retain personal data, associated traffic data, and processing logs for a minimum of one year from the date of processing, for the purposes specified in the Seventh Schedule. After this one-year period, the data and logs must themselves be erased unless a longer retention period is required by law. The Data Protection Board can request these logs under Rule 23 during an investigation.

2.5 Data Processor Liability and the 90-Day DSAR Window

Data Fiduciaries must ensure their Data Processors also delete personal data within DPDP timelines and provide documented proof of deletion. Data Principal erasure requests submitted under Section 12 must be addressed within 90 days (Rule 14).

2.6 The Erasure Documentation Requirement

Deletion dates and proof of erasure must be documented. Verifiable, datable, queryable evidence that the data was deleted, when it was deleted, and that the underlying storage was physically cleared.


Section 3 : The Technical Landscape

How Databricks Stores and Erases Data

3.1 The Soft Delete Problem

When you run a standard DELETE FROM statement on a Delta Lake table without deletion vectors enabled, Delta rewrites the affected data file groups entirely, producing new Parquet files that exclude the deleted rows. The old Parquet files remain physically intact on object storage. Delta Lake’s default data file retention threshold (delta.deletedFileRetentionDuration) is 7 days. The transaction log retention (delta.logRetentionDuration) defaults to 30 days. Until VACUUM runs and physically removes those old files, deleted row data remains on storage and is accessible via time travel:

-- PII in old Parquet versions is still readable via Time Travel
SELECT * FROM my_table TIMESTAMP AS OF '2025-06-01'
WHERE customer_id = 'dp-00142';

3.2 The Deletion Vector Compliance Trap

When deletion vectors are enabled, a DELETE writes a small sidecar bitmap file marking specific row positions as logically absent. The original Parquet file is completely untouched. After DELETE on a deletion-vector-enabled table, both the original Parquet files and the sidecar files remain actively referenced by the current state of the table, making them ineligible for standard VACUUM removal. REORG TABLE APPLY (PURGE) must be executed first to physically rewrite the data blocks and sever these active file references, producing new compacted files that exclude the deleted rows. Only after REORG are the older files unreferenced and eligible for removal by a subsequent zero-hour VACUUM.

Step 1: Execute the logical delete.

DELETE FROM catalog.schema.customer_profiles
WHERE customer_id = 'dp-00142';

Step 2: Run REORG TABLE APPLY (PURGE) to physically rewrite data files and sever active references.

REORG TABLE catalog.schema.customer_profiles
APPLY (PURGE);

Step 3: Run VACUUM to remove all unreferenced files.

-- Disable the safety check to allow immediate purge (use with caution in production)
SET spark.databricks.delta.retentionDurationCheck.enabled = false;

VACUUM catalog.schema.customer_profiles RETAIN 0 HOURS;

-- Re-enable after purge
SET spark.databricks.delta.retentionDurationCheck.enabled = true;

After this three-step sequence, no readable form of the deleted PII remains within the Delta table storage layer, provided that cloud object-store bucket versioning is disabled, shallow or deep table clones are addressed, and the Databricks disk cache is cleared on cluster restart. Object-store soft-delete features, automated cloud backups, raw ingestion environments (Kafka topics, S3 landing zones), and downstream exports are separate persistence vectors that must be addressed independently.

3.3 Standard Table Erasure Sequence (No Deletion Vectors)

-- Step 1: Logical delete (Delta rewrites affected file groups; old files remain on storage)
DELETE FROM catalog.schema.orders_bronze
WHERE customer_id = 'dp-00142';

-- Step 2: Configure retention to allow immediate physical purge
ALTER TABLE catalog.schema.orders_bronze
SET TBLPROPERTIES (
  'delta.logRetentionDuration'         = 'interval 1 year',
  'delta.deletedFileRetentionDuration' = 'interval 0 hours'
);

-- Step 3: Physical purge
SET spark.databricks.delta.retentionDurationCheck.enabled = false;
VACUUM catalog.schema.orders_bronze RETAIN 0 HOURS;
SET spark.databricks.delta.retentionDurationCheck.enabled = true;

3.4 The Erasure Registry Pattern

This table is modeled as a standard mutable Delta table secured via Unity Catalog access grants (not append-only, as the workflow updates status columns through the lifecycle):

CREATE TABLE IF NOT EXISTS compliance.dpdp.erasure_requests (
  request_id          STRING NOT NULL,
  data_principal_id   STRING NOT NULL,
  request_source      STRING,
  purpose_id          STRING,
  tables_affected     ARRAY<STRING>,
  request_timestamp   TIMESTAMP NOT NULL,
  notification_sent   TIMESTAMP,
  erasure_executed    TIMESTAMP,
  status              STRING,
  evidence_path       STRING,
  legal_hold          BOOLEAN DEFAULT FALSE,
  legal_hold_basis    STRING
)
USING DELTA;

GRANT SELECT ON TABLE compliance.dpdp.erasure_requests
  TO `compliance-investigators`;
GRANT MODIFY ON TABLE compliance.dpdp.erasure_requests
  TO `compliance-service-principal@company.com`;
-- Validate DENY privilege syntax in your workspace before deploying
DENY MODIFY ON TABLE compliance.dpdp.erasure_requests
  TO `data_engineers`;

Unity Catalog grants prevent unauthorized modifications while allowing the compliance workflow to update status, notification_sent, and erasure_executed as the lifecycle progresses.

3.5 Medallion Propagation: Cascading Deletes Across Bronze, Silver, and Gold

import dlt

dlt.apply_changes(
    target           = "silver_customer_profiles",
    source           = "STREAM(bronze_customer_raw)",
    keys             = ["customer_id"],
    sequence_by      = col("_commit_timestamp"),
    apply_as_deletes = expr("_change_type = 'delete'")
)

Delta Lake erasure covers only Databricks-managed storage. Personal data may also exist in Kafka topics, raw S3 or ADLS landing zones, and upstream operational databases. Those sources must be addressed separately. For the complete VACUUM configuration reference and streaming delete propagation patterns, read Managing Delta Lake VACUUM and Time Travel for DPDP Right to Erasure Compliance.


Section 4 : Retention Policy Architecture

Managing the Clock

4.1 Aligning Delta Properties to DPDP Obligations

The default delta.deletedFileRetentionDuration is 7 days. For DPDP PII tables requiring on-demand forced erasure, this must be overridden to zero hours. Do not apply global VACUUM schedules to compliance-sensitive tables. Manage retention at the table level, driven by the erasure registry.

4.2 Purpose-Based Retention Engine

ALTER TABLE catalog.schema.customer_profiles
SET TAGS (
  'dpdp_purpose'               = 'account_management',
  'dpdp_retention_years'       = '3',
  'dpdp_retention_start_event' = 'last_login',
  'dpdp_sector'                = 'ecommerce',
  'pii_class'                  = 'direct_identifier'
);

4.3 Handling Legal Hold Overrides

ALTER TABLE catalog.schema.transaction_records
SET TAGS (
  'legal_hold'       = 'true',
  'legal_hold_basis' = 'RBI_Master_Direction_KYC_2016',
  'legal_hold_expiry'= '2030-03-31'
);

The erasure workflow must check for legal_hold = 'true' before executing any deletion. Records matching this tag are skipped and the request is set to status 'held'.

4.4 The 48-Hour Notification Workflow

Phase 1: A daily job queries erasure_requests for 'pending' rows, dispatches notifications, and updates the notification_sent timestamp.

Phase 2: After the 48-hour offset elapses, the job executes DELETE, REORG, VACUUM, captures DESCRIBE HISTORY, and updates erasure_executed and status = 'completed'.


Section 5 : Generating Audit Evidence

Building a Structured, Defensible Evidence Package

5.1 What the Regulator Will Ask For

  1. Was the Data Principal’s personal data held, and for what stated purpose?
  2. What event triggered the erasure obligation?
  3. Was a 48-hour notification sent before deletion, and when?
  4. When was the DELETE operation executed and by which identity?
  5. Was the deletion physically complete, not merely logical?
  6. Are processing logs preserved for the mandatory one-year retention period?

5.2 Evidence Artifact 1: The Erasure Registry Query

SELECT
  request_id,
  SHA2(data_principal_id, 256)                       AS principal_hash,
  request_source, purpose_id, tables_affected,
  request_timestamp, notification_sent,
  TIMESTAMPDIFF(HOUR, request_timestamp, notification_sent)
                                                     AS hours_to_notification,
  erasure_executed,
  TIMESTAMPDIFF(HOUR, notification_sent, erasure_executed)
                                                     AS hours_from_notification_to_erasure,
  status, evidence_path
FROM compliance.dpdp.erasure_requests
WHERE data_principal_id = 'dp-00142'
  AND status = 'completed'
ORDER BY erasure_executed DESC;

5.3 Evidence Artifact 2: Delta DESCRIBE HISTORY

Delta Lake’s versioned, append-oriented transaction history records every operation with timestamps and operation parameters. DESCRIBE HISTORY exposes this as a queryable table:

DESCRIBE HISTORY catalog.schema.customer_profiles;
versiontimestampoperationoperationParameters
472025-06-10 03:12:44DELETEpredicates: [“customer_id = ‘dp-00142′”]
482025-06-10 03:14:02REORGapplyPurge: true
492025-06-10 03:15:18VACUUM ENDnumDeletedFiles: 3, numVacuumedDirectories: 1

5.4 Evidence Artifact 3: Unity Catalog Audit Logs

The system.access.audit table (currently in Public Preview) captures who ran what operation, on which resource, from which IP. Action name values should be validated in your workspace before building production pipelines:

SELECT event_time, user_identity:email AS executed_by,
       service_name, action_name,
       request_params:tableName AS table_name,
       source_ip_address, response:statusCode AS status_code
FROM system.access.audit
WHERE event_time BETWEEN '2025-06-10T03:00:00' AND '2025-06-10T04:00:00'
  AND request_params:tableName = 'catalog.schema.customer_profiles'
ORDER BY event_time ASC;

For the full query library, daily export pipeline, and immutable archive architecture, read How to Use Unity Catalog Audit Logs for DPDP Deletion and Audit Evidence.

5.5 Evidence Artifact 4: Storage-Level Verification

table_location = spark.sql(
    "DESCRIBE DETAIL catalog.schema.customer_profiles"
).select("location").collect()[0][0]

# List remaining files using native Databricks API
files = dbutils.fs.ls(table_location)
print(f"Storage verification at {table_location}:")
for f in files:
    print(f.path, f.size)

print(f"Verification timestamp: {spark.sql('SELECT current_timestamp()').collect()[0][0]}")

5.6 Assembling the DPDP Erasure Evidence Package

DPDP Erasure Evidence Package
==============================
Request ID        : ER-2025-00142
Data Principal    : dp-00142 [SHA-256 hash stored in registry]
Request Source    : Data Principal DSAR (Section 12)
Purpose ID        : account_management
Request Received  : 2025-06-08 14:30:00 UTC
Notification Sent : 2025-06-08 15:02:17 UTC (via email, logged)
Erasure Executed  : 2025-06-10 03:15:18 UTC
Hours Elapsed     : 60.2 hours (48-hour window: HONORED)

Tables Affected:
  - catalog.bronze.customer_raw
  - catalog.silver.customer_profiles
  - catalog.gold.customer_segments (rows recomputed)

Evidence Artifacts:
  [1] Erasure Registry Row   : compliance.dpdp.erasure_requests
  [2] Delta History Snapshot : s3://compliance-archive/history/ER-2025-00142/
  [3] UC Audit Log Export    : s3://compliance-archive/audit/ER-2025-00142/
  [4] Storage Verification   : s3://compliance-archive/verify/ER-2025-00142/
  [5] Notification Log       : s3://compliance-archive/notify/ER-2025-00142/

Log Retention Expiry  : 2026-06-08 (1-year from date of processing)
Retained By           : compliance-automation-service-principal

5.7 One-Year Immutable Log Architecture

Unity Catalog system tables have a 365-day native retention window per event. To satisfy Rule 8(3) and ensure logs are never at risk from schema changes or preview-feature gaps, an explicit archival pipeline is required:

  • Export system.access.audit to a dedicated Delta table (compliance.logs.uc_audit_archive) on a daily schedule.
  • Use Unity Catalog DENY MODIFY grants to comprehensively block unauthorized INSERT, UPDATE, DELETE, and MERGE operations by non-service-principal identities. Validate privilege syntax in your workspace before deploying, as behavior can vary by securable and metastore setup.
  • Back the archive to AWS S3 Object Lock (Compliance mode), Azure Immutable Blob Storage, or GCS Bucket Lock with 366-day minimum retention.

Section 6 : Pseudonymization

Does It Satisfy DPDP Erasure?

DPDP Section 12 grants Data Principals the explicit right to request erasure. Unlike GDPR’s Article 17, DPDP does not codify pseudonymization as an equivalent or acceptable substitute. For any explicit Data Principal erasure request, complete physical deletion is the only unambiguous path to compliance under the current DPDP framework.


Section 7 : Common Mistakes and How to Avoid Them

Mistake 1: Treating Delta DELETE as physical erasure. A DELETE rewrites affected file groups and marks old files as removed, but those old Parquet files remain on object storage until VACUUM runs.

Mistake 2: Missing the REORG step on deletion-vector tables. On deletion-vector tables, both the original Parquet files and the sidecar remain actively referenced. REORG must run first to sever those references before VACUUM can remove the old files.

Mistake 3: Forgetting raw files in upstream ingestion layers. Kafka topics, raw S3 landing zones, and upstream databases are not cleared by Delta VACUUM. Address these sources independently.

Mistake 4: Leaving derived personal data unmonitored in Gold tables. Use Unity Catalog column-level lineage to trace where every PII field propagates.

Mistake 5: Relying on manual processes for the 48-hour notification. Automate within Databricks Workflows with timestamped logging of every dispatched notification.

Mistake 6: Trusting ephemeral system tables without archiving them. system.access.audit is in Public Preview. Export and archive audit logs daily to an immutable external store.

Mistake 7: Applying a single global VACUUM schedule to all tables. Retention must be managed at the table level, driven by metadata.

Mistake 8: No documented deletion evidence from downstream Data Processor workspaces. Obtain proof of deletion from all Data Processors via contractual DPA clauses and API-level deletion confirmation.


Section 8: Implementation Checklist

DPDP Erasure Readiness on Databricks

  1. Tag all personal data in Unity Catalog with purpose, PII class, sector, and inactivity period.
  2. Document the processing purpose for every dataset with a defined start event.
  3. Identify your Third Schedule sector and configure inactivity-triggered erasure deadlines in table-level tags.
  4. Audit all tables for deletion vector status using DESCRIBE DETAIL.
  5. Create the erasure_requests registry table as a mutable, access-controlled Delta table secured via DENY grants.
  6. Build the two-phase erasure Workflow: Phase 1 sends 48-hour notifications; Phase 2 executes physical deletion.
  7. Configure delta.deletedFileRetentionDuration = 'interval 0 hours' on erasure-completed tables before VACUUM.
  8. Set up the daily audit log export pipeline from system.access.audit to an immutable archive backed by object-locked storage.
  9. Archive DESCRIBE HISTORY snapshots for each table at the time of erasure.
  10. Implement delete propagation from Bronze through Silver to Gold using CDF or Lakeflow apply_changes.
  11. Address upstream source deletion for Kafka topics, raw S3 or ADLS landing pads, and upstream databases.
  12. Apply legal hold tags (legal_hold=true) to all datasets subject to RBI, SEBI, or other sectoral retention obligations.
  13. Assemble the DPDP Erasure Evidence Package for every completed request using the five-artifact structure in Section 5.6.
  14. Test the evidence package against the inquiry question framework in Section 5.1 before enforcement goes live.

Section 9: Conclusion

Compliance Is an Engineering Deliverable, Not a Policy Document

DPDP erasure compliance on Databricks is not satisfied by a policy PDF or a legal sign-off. It is satisfied by an automated, auditable, evidence-generating data engineering system that operates continuously, scales with your Data Principal volume, and produces a structured evidence package before anyone ever asks for one.

Eliminate manual DPDP erasure workflows on Databricks

Sinki.ai delivers a fully automated data erasure framework that manages deletion requests, cross-layer purge execution, audit log collection, and compliance evidence generation across your Databricks environment.

Disclaimer: This article provides technical architecture and implementation guidance only and does not constitute formal legal advice. Organizations should consult qualified legal counsel to assess their specific compliance obligations under the DPDP Act 2023 and applicable sectoral regulations.


Frequently Asked Questions

What does DPDP Rule 8 require for data erasure on cloud platforms?

DPDP Rule 8 requires Data Fiduciaries to erase personal data as soon as the processing purpose is no longer served, whether due to consent withdrawal, purpose completion, or the inactivity-triggered deadline defined in the Third Schedule. For Third Schedule platform classes, the fiduciary must notify the Data Principal at least 48 hours before scheduled inactivity-based erasure. All Data Fiduciaries must also retain personal data, traffic data, and processing logs for a minimum of one year from the date of processing.

Is running Delta Lake DELETE sufficient for DPDP compliance?

No. A standard DELETE rewrites affected data file groups and marks old files as removed, but those old files remain on object storage until VACUUM physically removes them. For deletion-vector-enabled tables, only a sidecar bitmap is written, leaving the original Parquet data entirely intact. REORG TABLE APPLY (PURGE) followed by VACUUM is required for those tables.

What is the DPDP Third Schedule and how does it affect data retention in Databricks?

The Third Schedule defines inactivity-triggered erasure deadlines for specified fiduciary classes. E-commerce, online gaming, and social media entities above the specified thresholds must erase personal data when a Data Principal has been inactive for three years. This is an erasure deadline, not a minimum retention guarantee. Sectoral laws (RBI, SEBI, GST) can mandate longer retention for specific regulated data categories.

How do I build audit-supporting evidence for a DPDP erasure inquiry?

A structured DPDP Erasure Evidence Package includes five artifacts: the erasure registry row; a DESCRIBE HISTORY snapshot showing DELETE, REORG, and VACUUM operations in sequence; a Unity Catalog audit log export; a storage-level verification; and a timestamped notification log. This package is designed to support a DPB inquiry and should be reviewed by qualified legal counsel before reliance in a formal regulatory context.

What is the 48-hour pre-erasure notification requirement under DPDP?

Rule 8 of the DPDP Rules 2025 requires covered Data Fiduciaries to notify the Data Principal at least 48 hours before their personal data is scheduled for inactivity-based erasure. Erasure workflows must operate in two phases: a notification phase that dispatches the alert and records the timestamp, and a deletion phase that only executes after the 48-hour buffer has fully elapsed.

How long must I retain processing logs under DPDP?

Rule 8(3) requires all Data Fiduciaries to retain personal data, associated traffic data, and processing logs for a minimum of one year from the date of processing. After this period, the data and logs must themselves be erased unless a longer retention period is required by law. In Databricks, this means daily export of Unity Catalog system table events to an immutable archive backed by object-locked cloud storage with a minimum 366-day retention period.

Do deletion vectors in Delta Lake satisfy DPDP physical erasure requirements?

What audit evidence is required for DPDP compliance on Databricks?

A defensible DPDP compliance record requires: the Data Principal’s identity and erasure trigger; proof of 48-hour pre-erasure notification where applicable; the DELETE operation record from the Delta versioned transaction history; the REORG and VACUUM operation records confirming physical file removal; and an audit log record corroborating the identity and timestamp of the deletion command. These artifacts, retained for one year from the date of processing, constitute the complete technical audit trail. This architecture guidance does not substitute for legal review.

Managing Delta Lake VACUUM & Time Travel for DPDP Erasure

Delta Lake’s time travel feature is one of the most operationally powerful capabilities in the modern lakehouse. It is also, if managed without regulatory intent, a serious compliance liability under India’s Digital Personal Data Protection Act.

Every SQL DELETE you execute on a Delta table is a promise, not a delivery. The Parquet files containing that personal data remain physically intact on your cloud object storage until VACUUM removes them. Under DPDP Section 12 and Rule 8, the Data Principal’s right to erasure is not satisfied by a logical deletion record. It is satisfied only when the underlying bytes no longer exist on the primary storage layer covered by the Delta table.

Delta’s default data file retention threshold (delta.deletedFileRetentionDuration) is 7 days, which means old Parquet files containing deleted rows remain on storage and are eligible for VACUUM removal only after that window elapses. This default is built for operational recovery, not regulatory compliance. For DPDP PII tables, it must be managed deliberately.

This article delivers the complete VACUUM configuration reference for DPDP compliance: how to manage time travel windows without breaking operational needs, how to execute forced zero-hour purges safely, how the deletion vector default change in April 2025 creates a hidden compliance gap in Lakeflow pipelines, and how to build the per-table retention architecture that maps each legal obligation to a precise Delta property setting.

For the broader erasure architecture, the five-artifact evidence package, and the erasure registry control table pattern, see the Hub article: DPDP Retention and Erasure on Databricks: How to Prove Deletion and Audit Evidence.


Section 1 : Understanding What VACUUM Actually Does

The Delta File Lifecycle and the Two Properties That Govern It

1.1 The Delta Lake File Lifecycle

Every DML operation on a Delta table writes new Parquet files to object storage and records “add” and “remove” entries in the transaction log. When you run DELETE FROM customer_profiles WHERE customer_id = 'dp-00142' on a table without deletion vectors enabled, Delta rewrites the entire affected data file group, producing new Parquet files that exclude the deleted rows. The old Parquet files are marked as “removed” in the transaction log but remain physically intact on object storage.

VACUUM finds those “remove” entries older than the configured retention threshold and physically deletes the corresponding files from cloud storage. It records VACUUM START and VACUUM END entries in the transaction log. The VACUUM END entry, containing the count of deleted files, is the primary technical proof of physical erasure within the Delta table’s storage layer.

1.2 The Two Properties That Govern VACUUM

PropertyWhat It ControlsDefaultDPDP Setting
delta.deletedFileRetentionDurationMinimum age before VACUUM can physically delete removed data files7 days‘interval 0 hours’ on PII tables requiring on-demand forced erasure
delta.logRetentionDurationHow long Delta transaction log entries are retained30 days‘interval 1 year’ to satisfy DPDP Rule 8(3) one-year log retention

These properties have independent clocks. A critical constraint from Databricks Runtime 18.0: logRetentionDuration must be greater than or equal to deletedFileRetentionDuration. The 1-year and 0-hour combination is valid and is the correct DPDP configuration.

ALTER TABLE catalog.schema.customer_profiles
SET TBLPROPERTIES (
  'delta.logRetentionDuration'         = 'interval 1 year',
  'delta.deletedFileRetentionDuration' = 'interval 0 hours'
);

1.3 The VACUUM RETAIN Override Trap

The explicit RETAIN N HOURS clause in any VACUUM command overrides the table-level deletedFileRetentionDuration property. If a global maintenance job runs VACUUM table_name RETAIN 168 HOURS, the PII files that should have been purged at 0 hours are kept for another 7 days. The job completes without errors. The personal data remains on disk.

The fix: exclude PII and legal-hold tables from all global VACUUM jobs. Query Unity Catalog tags to build the exclusion list:

SELECT DISTINCT table_catalog || '.' || table_schema || '.' || table_name
FROM system.information_schema.table_tags
WHERE tag_name IN ('pii_class', 'legal_hold')
  AND tag_value IS NOT NULL;

Route these tables to a dedicated compliance VACUUM workflow instead.


Section 2 : Time Travel, Erasure, and the DPDP Compliance Tension

Why Time Travel Is Both the Problem and Part of the Solution

2.1 Time Travel as a Compliance Liability

Without a targeted compliance VACUUM, a time travel query returns deleted PII weeks after the erasure request was processed:

SELECT * FROM catalog.schema.customer_profiles
TIMESTAMP AS OF '2025-06-08 14:00:00'
WHERE customer_id = 'dp-00142';
-- Returns deleted PII if VACUUM has not run with 0-hour retention

Under DPDP Rule 14, Data Fiduciaries have 90 days to address erasure requests. PII remaining physically accessible via time travel during that window is a compliance failure.

2.2 Time Travel as a Compliance Asset

The Delta transaction log, preserved via logRetentionDuration, is the evidence chain that a DPB inquiry will require. DESCRIBE HISTORY returns every operation performed on a table with timestamps and operation parameters. The chronological sequence of DELETE, REORG TABLE, and VACUUM END entries constitutes the technical audit chain of custody.

DPDP Rule 8(3) requires one-year retention of processing logs from the date of processing. Setting logRetentionDuration = 'interval 1 year' directly satisfies this. The key architectural insight: set logRetentionDuration to 1 year (audit history preserved) while setting deletedFileRetentionDuration to 0 hours (immediate physical purge enabled). These clocks are independent.

2.3 Per-Table Compliance Retention Matrix

Table ClassificationLegal BasislogRetentionDurationdeletedFileRetentionDurationVACUUM Strategy
Core Customer PII (e-commerce, gaming, social media)DPDP Third Schedule (3-yr inactivity deadline) + Rule 8(3)1 year0 hours (on erasure)Dedicated on-demand compliance VACUUM. Excluded from global schedule.
Financial and KYC RecordsRBI KYC (5-yr minimum) + DPDP Rule 8(3)1 year180 daysLegal hold tag. Manual compliance VACUUM after hold is cleared.
Trade and Order RecordsSEBI (8-yr minimum) + DPDP Rule 8(3)1 year365 daysLegal hold tag. VACUUM scheduled only after hold expiry.
Gold Aggregates (non-PII)Operational only30 days (default)7 days (default)Standard Predictive Optimization or global schedule.
Operational Logs (non-personal)Operational7 days7 daysStandard global VACUUM schedule.

Apply legal hold tags in Unity Catalog to protect datasets from automated erasure:

ALTER TABLE catalog.schema.transaction_records
SET TAGS (
  'legal_hold'        = 'true',
  'legal_hold_basis'  = 'RBI_Master_Direction_KYC_2016',
  'legal_hold_expiry' = '2030-03-31'
);

Section 3 : The Safe VACUUM Execution Playbook for DPDP

Step-by-Step Production Procedure for Forced Erasure

Step 0: Audit table configuration

DESCRIBE DETAIL catalog.schema.customer_profiles;
SHOW TBLPROPERTIES catalog.schema.customer_profiles;

Confirm deletedFileRetentionDuration = 'interval 0 hours' and logRetentionDuration = 'interval 1 year' before proceeding.

Step 1: Set table properties for on-demand erasure

Use the ALTER TABLE statement shown in Section 1.2.

Step 2: Run VACUUM DRY RUN as a compliance pre-flight

DRY RUN is an audit-supporting artifact, not just an operational preview. The file manifest it returns proves which specific Parquet files were in the deletion queue before execution. Archive this output alongside the final VACUUM END entry.

VACUUM catalog.schema.customer_profiles DRY RUN;
-- Archive output to: s3://compliance-archive/dryrun/ER-2025-00142/preflight_manifest.json

Step 3: Forced zero-hour purge

VACUUM RETAIN 0 HOURS with the safety check disabled can delete uncommitted files from concurrent write transactions, causing data loss. Mandatory mitigation: pause all streaming and batch write jobs targeting the table before executing, then resume after.

-- Pause all concurrent writers before this block

SET spark.databricks.delta.retentionDurationCheck.enabled = false;
VACUUM catalog.schema.customer_profiles FULL RETAIN 0 HOURS;
SET spark.databricks.delta.retentionDurationCheck.enabled = true;

-- Resume ingestion pipelines after this block

Step 4: Capture DESCRIBE HISTORY as an audit-supporting artifact

DESCRIBE HISTORY catalog.schema.customer_profiles LIMIT 5;
-- Locate VACUUM END: operation, numDeletedFiles, numVacuumedDirectories, timestamp
-- Archive to: s3://compliance-archive/history/ER-2025-00142/vacuum_end_snapshot.json

What VACUUM FULL covers and what it does not

After VACUUM FULL RETAIN 0 HOURS completes, no readable form of the deleted PII remains within the Delta table’s managed storage layer on the primary object store, provided the following conditions hold:

  • Cloud object-store bucket versioning is disabled (enabled versioning retains deleted file versions separately from Delta’s view of storage).
  • Shallow or deep Delta table clones created before erasure are mapped and addressed independently.
  • The Databricks internal disk cache on running clusters may temporarily hold data from old file versions until a cluster restart occurs.
  • Automated cloud backup snapshots may retain copies outside Delta’s control.
  • Raw landing zone data (Kafka topic retention, S3 raw ingestion paths, upstream operational databases) is a completely separate persistence layer that must be addressed outside the Delta VACUUM workflow.

VACUUM FULL is not a guarantee of global erasure across all systems. It is the correct mechanism for physical erasure within the Delta table’s primary storage path.

VACUUM FULL vs VACUUM LITE

VACUUM LITE (DBR 16.1+, Public Preview) uses the transaction log rather than listing the full table directory. It is faster for routine non-PII maintenance. For compliance erasure, always use VACUUM FULL because VACUUM LITE fails with DELTA_CANNOT_VACUUM_LITE when the Delta log has been pruned. Since compliance tables carry long logRetentionDuration settings, that failure mode cannot be excluded. VACUUM FULL scans the actual directory regardless of log state.

-- Compliance erasure: always FULL
VACUUM catalog.schema.customer_profiles FULL RETAIN 0 HOURS;

-- Routine non-PII maintenance: LITE acceptable
VACUUM catalog.schema.analytics_gold LITE;

Section 4 : Deletion Vectors and VACUUM: The Complete Compliance Sequence

Why the April 2025 Default Change Creates an Invisible Gap

4.1 The Optimization Trap

Deletion vectors are a performance optimization that changes how DELETE operations work on Delta tables. When deletion vectors are enabled, a DELETE does not rewrite any data files. Instead, it writes a small sidecar bitmap file (the deletion vector) that marks specific row positions as logically absent. The original Parquet file, containing the personal data, is completely untouched on object storage.

After DELETE on a deletion-vector table, both the original Parquet files and the sidecar files remain actively referenced by the current state of the table, making them ineligible for standard VACUUM removal. Running DELETE followed by VACUUM FULL RETAIN 0 HOURS alone does not remove the original Parquet data from physical storage. REORG TABLE APPLY (PURGE) must execute first to physically rewrite the data blocks and sever these active file references, producing new compacted files that exclude the deleted rows.

As of April 28, 2025, deletion vectors are enabled by default for materialized views and streaming tables in Lakeflow Declarative Pipelines. Any team using Lakeflow for PII ingestion is now operating in deletion-vector territory by default.

4.2 The Mandatory Three-Step Compliance Sequence

Step 1: DELETE

DELETE FROM catalog.schema.customer_profiles
WHERE customer_id = 'dp-00142';

Step 2: REORG TABLE APPLY (PURGE) to physically rewrite data files and sever active references

-- Advanced tuning flag: validate against current runtime support before enabling
SET spark.databricks.delta.reorg.purgeMode = 'rows';

REORG TABLE catalog.schema.customer_profiles APPLY (PURGE);
-- REORG is idempotent: safe to run twice if the job fails partway through
-- The REORG completion timestamp is the binding anchor for VACUUM retention

Step 3: VACUUM FULL

SET spark.databricks.delta.retentionDurationCheck.enabled = false;
VACUUM catalog.schema.customer_profiles FULL RETAIN 0 HOURS;
SET spark.databricks.delta.retentionDurationCheck.enabled = true;

4.3 The Streaming and Materialized View Compliance Gap

Lakeflow Declarative Pipelines run VACUUM automatically within 24 hours of a streaming table or materialized view update (when Predictive Optimization is not enabled). This automated VACUUM does not run REORG TABLE APPLY (PURGE) first. For compliance tables with deletion vectors enabled, the automated Lakeflow VACUUM is insufficient. A dedicated orchestration workflow must trigger REORG followed by VACUUM FULL RETAIN 0 HOURS on all affected streaming tables, independent of Lakeflow’s automatic cycle.

4.4 Auditing Deletion Vector Status Across Unity Catalog

SELECT t.table_catalog, t.table_schema, t.table_name,
       p.tag_value AS pii_classification
FROM system.information_schema.tables     t
JOIN system.information_schema.table_tags p
  ON  t.table_catalog = p.catalog_name
  AND t.table_schema  = p.schema_name
  AND t.table_name    = p.table_name
WHERE p.tag_name = 'pii_class'
ORDER BY t.table_catalog, t.table_schema, t.table_name;
-- For each result: run DESCRIBE DETAIL and check tableFeatures for 'deletionVectors'

Section 5: Automating VACUUM for DPDP: Orchestration Architecture

Two Workflows, One Compliance Standard

5.1 Why Global VACUUM Schedules Are a Compliance Anti-Pattern

A single global VACUUM job creates three simultaneous problems for DPDP compliance. The RETAIN N HOURS clause overrides per-table property settings, leaving PII files intact despite correct table configuration. Global jobs cannot be safely quiesced around specific tables without halting broad ingestion. And global jobs run on a fixed schedule rather than responding to specific erasure events, breaking the per-request evidence chain needed for an inquiry response.

5.2 The Two-Workflow Architecture

Workflow A: Global Maintenance VACUUM runs daily, excludes all pii_class and legal_hold tagged tables, uses VACUUM LITE for performance, and is supplemented by Predictive Optimization where available.

Workflow B: Compliance Purge VACUUM runs on-demand, triggered by the erasure_requests registry. Task dependency chain:

Read erasure_requests: erasure window elapsed
Check tags: skip if legal_hold = 'true'
Pause streaming/batch write jobs on affected tables
VACUUM DRY RUN: archive manifest
If deletion vectors present: REORG TABLE APPLY (PURGE)
Disable safety check
VACUUM FULL RETAIN 0 HOURS
Re-enable safety check
Resume pipelines
DESCRIBE HISTORY: archive VACUUM END entry
Update erasure_requests: status = 'completed', evidence_path set

Each task has explicit dependencies. A failure at any step halts the workflow before proceeding.

5.3 Predictive Optimization: What It Provides and What It Does Not

Databricks Predictive Optimization automates VACUUM and other maintenance tasks for Unity Catalog managed tables as an asynchronous, cost-driven background process. Because it operates on its own schedule and optimization criteria, it cannot be relied upon as a deterministic substitute for on-demand, zero-hour compliance VACUUM execution. It also does not run REORG TABLE APPLY (PURGE) before VACUUM for deletion-vector tables, is not available in all Databricks cloud regions, and does not apply to external tables. Treat it as the maintenance layer for non-PII tables only.


Section 6 : Streaming Tables, CDF, and Delete Propagation

Completing the Medallion Erasure Chain

Delta Lake streaming tables are append-only by design. A DELETE on a Bronze source does not automatically propagate to a Silver streaming table. Use Change Data Feed with apply_changes in Lakeflow:

import dlt

dlt.apply_changes(
    target           = "silver_customer_profiles",
    source           = "STREAM(bronze_customer_raw)",
    keys             = ["customer_id"],
    sequence_by      = col("_commit_timestamp"),
    apply_as_deletes = expr("_change_type = 'delete'")
)

For downstream consumers that should not process delete events directly, add skipChangeCommits to prevent pipeline failures when physical blocks are removed upstream:

df = (spark.readStream
        .option("skipChangeCommits", "true")
        .table("silver_customer_profiles"))

Executing physical VACUUM purges in Gold-first, Silver-then-Bronze order is the recommended default pattern for this evidence-preserving workflow, unless lineage and downstream recomputation are otherwise proven for your specific architecture. If Silver is vacuumed before Gold has processed the deletion, Gold tables may contain PII rows whose lineage Silver can no longer confirm. Complete all layer deletions and verify propagation before running any VACUUM. Use Unity Catalog column-level lineage to identify every downstream table containing derived PII fields before initiating the multi-layer purge sequence. For the full audit log query framework and lineage trace procedure, see How to Use Unity Catalog Audit Logs for DPDP Deletion and Audit Evidence.


Section 7 : Common VACUUM Compliance Mistakes

Mistake 1: Assuming DELETE alone satisfies DPDP erasure. Without deletion vectors, DELETE rewrites affected file groups and marks old files as removed, but those old Parquet files remain on storage until VACUUM runs. Always follow DELETE with REORG (on deletion-vector tables) and VACUUM FULL.

Mistake 2: Missing REORG TABLE APPLY (PURGE) on deletion-vector tables. After DELETE on a deletion-vector table, both the original Parquet files and the sidecar remain actively referenced. REORG must run first to sever those references before VACUUM can remove the old files. Check for ‘deletionVectors’ in DESCRIBE DETAIL on every PII table.

Mistake 3: Allowing global VACUUM RETAIN to override per-table properties. Exclude all pii_class and legal_hold tables from global VACUUM jobs. Route them to Workflow B.

Mistake 4: Running VACUUM RETAIN 0 HOURS with concurrent writers active. Quiesce all concurrent write pipelines before disabling the safety check.

Mistake 5: Treating Predictive Optimization as a compliance VACUUM mechanism. It operates asynchronously and cannot deliver on-demand zero-hour purges. It also omits REORG for deletion-vector tables. Use it for non-PII maintenance only.

Mistake 6: Using VACUUM LITE for compliance erasure. VACUUM LITE fails with DELTA_CANNOT_VACUUM_LITE on pruned logs. Always use VACUUM FULL for forced erasure.

Mistake 7: Skipping VACUUM DRY RUN. The pre-flight file manifest is an audit-supporting artifact. Archive it before every live purge.

Mistake 8: Not archiving DESCRIBE HISTORY after VACUUM. The VACUUM END entry is a primary technical record of physical erasure. Capture and archive it immediately before the log ages out.


Section 8 : Implementation Checklist: VACUUM for DPDP Compliance

  1. Audit deletion vector status on every PII table using DESCRIBE DETAIL. Identify which require the three-step sequence.
  2. Set logRetentionDuration = 'interval 1 year' on all compliance tables to satisfy Rule 8(3).
  3. Set deletedFileRetentionDuration = 'interval 0 hours' on PII tables requiring on-demand forced erasure.
  4. Verify the DBR 18.0 constraintlogRetentionDuration must be greater than or equal to deletedFileRetentionDuration. The 1-year and 0-hour combination is valid.
  5. Exclude PII-tagged and legal-hold-tagged tables from the global VACUUM maintenance schedule.
  6. Apply legal hold tags in Unity Catalog for all RBI, SEBI, or sectoral retention-governed tables.
  7. Build Workflow B as a Databricks Workflow triggered by the erasure_requests registry, with all 11 task dependencies in Section 5.2.
  8. Set spark.databricks.delta.reorg.purgeMode = 'rows' on large deletion-vector tables before REORG as an advanced tuning option; validate against current runtime support before enabling.
  9. Run VACUUM DRY RUN first and archive the manifest for every erasure request.
  10. Quiesce all concurrent write pipelines before running VACUUM RETAIN 0 HOURS with the safety check disabled.
  11. Always use VACUUM FULL for compliance erasure operations, never LITE.
  12. Archive DESCRIBE HISTORY immediately after every compliance VACUUM run.
  13. Re-enable spark.databricks.delta.retentionDurationCheck.enabled after every compliance VACUUM session.
  14. For Lakeflow streaming tables and materialized views: build a standalone REORG + VACUUM compliance workflow outside Lakeflow’s automatic cycle.
  15. Use Unity Catalog column-level lineage to map all downstream Gold tables containing derived PII fields before starting the cross-layer VACUUM sequence.

Section 9 : Conclusion

Every Retention Setting Is Now a Legal Metric

Delta Lake retention properties, time travel windows, and VACUUM execution modes are now legal compliance instruments under India’s DPDP Rules 2025, notified November 13, 2025 with full applicability from May 13, 2027. The configuration decisions in this article translate directly into whether your organization can produce defensible technical structures in response to a DPB inquiry.

Sinki.ai’s Data Erasure solution is designed to automate this entire lifecycle on Databricks. It handles the two-phase 48-hour notification workflow, deletion vector detection and REORG orchestration, cross-layer medallion cascade purging, session safety override management, and automated DPDP evidence package generation as a managed service on your existing environment. Explore the full architecture at sinki.ai/solutions/data-erasure.

Automate DPDP-compliant data erasure on Databricks

From deletion request intake to verified erasure and audit evidence, Sinki.ai orchestrates every step across Delta Lake and the medallion architecture while maintaining compliance and operational safety.

Disclaimer: This article provides technical architecture and implementation guidance only and does not constitute formal legal advice. Organizations should consult qualified legal counsel to assess their specific compliance obligations under the DPDP Act 2023 and applicable sectoral regulations.


Frequently Asked Questions

Does running Delta Lake DELETE satisfy the DPDP right to erasure?

No. On tables without deletion vectors, DELETE rewrites the affected Parquet data file groups and marks the old files as removed in the transaction log, but those old files remain physically on object storage. On deletion-vector tables, the original Parquet file is not touched at all; only a sidecar bitmap is written, and both files remain actively referenced by the current table state. In both cases, VACUUM must run to physically remove the old files. For deletion-vector tables, REORG TABLE APPLY (PURGE) must sever the active file references first.

What is the correct VACUUM retention configuration for DPDP compliance tables?

Set delta.deletedFileRetentionDuration = 'interval 0 hours' to make removed data files immediately eligible for physical deletion, and delta.logRetentionDuration = 'interval 1 year' to satisfy the Rule 8(3) one-year log retention obligation. From DBR 18.0, logRetentionDuration must be greater than or equal to deletedFileRetentionDuration; the 1-year and 0-hour combination satisfies this constraint.

What happens to Delta Lake time travel after VACUUM RETAIN 0 HOURS?

Time travel queries targeting versions older than the vacuumed point fail because the required Parquet files no longer exist in the table’s primary storage path. The Delta transaction log entries (DESCRIBE HISTORY) remain accessible for the duration of logRetentionDuration, preserving the audit-supporting chain of operations without the data accessibility.

How do deletedFileRetentionDuration and logRetentionDuration interact for DPDP?

These properties control independent retention clocks. deletedFileRetentionDuration governs when VACUUM can physically delete removed data files. logRetentionDuration governs how long transaction log entries survive. For DPDP, set data files to 0 hours (immediate purge) and log entries to 1 year (Rule 8(3)), and these properties can be set independently to achieve that.

Do deletion vectors require extra steps before VACUUM for DPDP compliance?

Yes. After DELETE on a deletion-vector table, both the original Parquet files and the sidecar remain actively referenced by the current table state, making them ineligible for standard VACUUM. REORG TABLE APPLY (PURGE) must run first to physically rewrite the data blocks and sever these active references. Only then can VACUUM FULL RETAIN 0 HOURS remove the older unreferenced files.

Why does Predictive Optimization not satisfy DPDP compliance VACUUM requirements?

Predictive Optimization is an asynchronous, cost-driven background service. It operates on its own schedule and optimization criteria and cannot deliver on-demand, zero-hour compliance purges triggered by specific erasure requests. It also does not run REORG TABLE APPLY (PURGE) before VACUUM on deletion-vector tables, is unavailable in some Databricks cloud regions, and does not apply to external tables.

What is VACUUM LITE and should it be used for DPDP erasure?

VACUUM LITE uses the transaction log rather than listing all table directory files, making it faster for routine maintenance. For compliance erasure it is not appropriate because it fails with DELTA_CANNOT_VACUUM_LITE if the Delta log has been pruned. Since compliance tables carry long logRetentionDuration settings, log state cannot be guaranteed. Always use VACUUM FULL for forced compliance erasure.

How do I safely run VACUUM RETAIN 0 HOURS without risking data corruption?

The safe sequence: (1) pause all streaming and batch write jobs targeting the table, (2) confirm no active write transactions are in flight, (3) disable spark.databricks.delta.retentionDurationCheck.enabled at session scope, (4) execute VACUUM FULL RETAIN 0 HOURS, (5) immediately re-enable the setting, (6) resume ingestion pipelines. In Databricks Workflows, model this as a task dependency chain with explicit upstream pause and downstream resume gate tasks.

DPDP vs GDPR: 4 Structural Differences for Data Architecture (2026)

Your GDPR compliance program will not save you under DPDP. Not because the frameworks are incompatible — they share surface-level concepts like consent, data subject rights, and breach notification. They diverge precisely where it hurts most: at the data architecture level, where the infrastructure differences require new builds, not policy rewrites.

This is “The GDPR Assumption”: the belief that existing GDPR-compliant architecture can be repurposed for DPDP with minimal changes. It is the most expensive assumption an Indian enterprise with a European compliance history can make in 2026.

This guide covers the 4 structural differences between DPDP and GDPR — and exactly what each one demands from your data platform.

What you will master in this guide:

  • Why DPDP’s digital-only scope changes your ingestion layer design
  • Why the absence of legitimate interest as a lawful basis rebuilds your consent architecture from scratch
  • What the Consent Manager role means for your data pipeline orchestration
  • Why the absence of a sensitive data sub-category in DPDP changes your PII tagging strategy

For the full DPDP obligation framework, read the DPDP readiness on Databricks: complete guide 2026.

Difference 1: DPDP Is Digital-Only — GDPR Covers Everything

GDPR applies to all personal data, regardless of format: digital files, paper records, CCTV footage, physical HR files. DPDP applies exclusively to digital personal data — data that exists in digital form, or data that was originally non-digital and has been digitized.

This sounds like a narrower scope. It isn’t — not for a Databricks-based enterprise. Here’s why that matters.

Under GDPR, data architecture teams frequently argue that certain processing activities fall outside scope because they involve offline-only data paths. Under DPDP, that argument is not available. Every digital touchpoint is in scope. Every API call that moves personal data, every Spark job that processes it, every Delta table that stores it — all of it is subject to the Act the moment it exists in digital form.

The architectural implication: your bronze-layer ingestion pipeline must classify and tag personal data as DPDP-governed from the moment it arrives, without exception. The GDPR-era practice of selective scoping — treating some digital processing as out-of-scope because it connects to offline processes — does not apply under DPDP.

If your Databricks estate has even 1 digital touchpoint with Indian residents’ personal data, the Act applies. Full stop.

Difference 2: No Legitimate Interest — Consent Is the Only Basis That Scales

GDPR provides 6 lawful bases for processing: consent, contract, legal obligation, vital interests, public task, and legitimate interests. Most enterprise processing programs rely heavily on legitimate interests — it is the most flexible basis and the hardest for data subjects to challenge.

DPDP eliminates this. Consent is the dominant lawful basis. The Act does recognize limited exemptions — state functions, certain research and archiving activities, employment-related processing — but none of these substitute for consent in a commercial enterprise context.

The result: if your GDPR program was designed around legitimate interests for marketing, analytics, behavioral profiling, or cross-product personalization, every one of those processing activities requires a consent record under DPDP. There is no migration path — you build a consent architecture from scratch.

On Databricks, this means:

  • Every pipeline processing Indian residents’ personal data must join to a consent store before executing → A Spark job with no consent check is DPDP non-compliant regardless of technical quality
  • Consent must be purpose-specific — one consent record per processing purpose, not a blanket approval → “We use your data to improve our services” is not a DPDP-compliant consent statement
  • Consent must be withdrawable with the same ease as it was given — and withdrawal must cascade to downstream processing immediately → This requires event-driven pipeline architecture, not batch consent syncs

The absence of legitimate interest under DPDP is not a technicality. It rebuilds your processing justification layer from zero.

Difference 3: The Consent Manager Role — No GDPR Equivalent

GDPR introduced the concept of a Data Protection Officer (DPO). DPDP introduces something GDPR has no equivalent for: the Consent Manager — a government-registered intermediary that manages the relationship between Data Fiduciaries and Data Principals on consent.

Under DPDP Phase 2 (effective November 13, 2026), Data Fiduciaries must be able to interact with registered Consent Managers — third-party platforms that handle consent capture, storage, and revocation on behalf of data principals in an interoperable, standardized way.

Here is where the architecture diverges from GDPR entirely. A GDPR-compliant consent management tool is a first-party system — your organization builds and owns it. A DPDP-compliant consent infrastructure must be able to receive, process, and honor consent signals from external registered Consent Managers. Your pipeline cannot assume it owns the consent record.

FeatureGDPR Consent ManagementDPDP Consent Management
Consent infrastructureFirst-party, organization-ownedMust integrate with registered third-party Consent Managers
Lawful basis alternatives6 bases including legitimate interestConsent-dominant; narrow statutory exemptions only
Consent language requirementsLocal language encouraged22 scheduled Indian languages on request — mandatory
Withdrawal handlingMust be honoredMust be honored AND cascade to all downstream processing
Intermediary roleNo equivalentConsent Manager (registered, interoperable, Phase 2 mandatory)
ScopeAll personal data (digital and physical)Digital personal data only

The architectural implication for your Databricks estate: your consent store must be designed to receive consent signals from external Consent Managers — not just from your own application layer. A closed consent store that only accepts first-party signals is not DPDP-compliant after November 2026.

Difference 4: No Sensitive Data Sub-Category — How DPDP Changes PII Tagging Strategy

GDPR Article 9 establishes a special category of sensitive personal data — racial or ethnic origin, political opinions, religious beliefs, trade union membership, genetic data, biometric data, health data, sex life and sexual orientation — with heightened protection requirements and explicit consent obligations.

DPDP does not create a separate sensitive data category. All personal data is treated under the same consent and protection regime. The Central Government can designate specific categories of data for enhanced protection via notification, but as of 2026 no such category notification has been issued.

This changes your PII tagging strategy on Databricks in a non-obvious way. Under GDPR, most teams built a 2-tier classification: standard PII (less restrictive controls) and special category / sensitive PII (stricter controls, explicit consent required). Under DPDP, that tiering is irrelevant. Every field tagged as personal data carries the same consent and access control obligations. A name and phone number combination demands the same architectural controls as a health record.

The implication: if your Databricks Unity Catalog is configured with GDPR-era tiered PII policies — lighter controls on standard PII, heavier controls on special category data — you need to flatten that architecture for DPDP compliance. Every PII column, regardless of sensitivity, must be consent-linked and access-controlled at the same standard.

Your GDPR PII governance architecture is a starting point for DPDP. It is not a finish line.

The GDPR-to-DPDP Architecture Migration Checklist

This information has not been clearly laid out elsewhere — until now.

  • Audit all processing activities currently justified under legitimate interests — each requires a consent record under DPDP
  • Redesign consent store to accept external signals from registered Consent Managers (Phase 2 deadline: November 2026)
  • Flatten PII governance tiers in Unity Catalog — all personal data fields to carry the same DPDP control set
  • Add multi-lingual notice capability — DPDP requires notices in the data principal’s preferred language on request
  • Configure cascade revocation: consent withdrawal must trigger immediate downstream processing halt across all pipelines
  • Review digital-only scope definition — all digital personal data of Indian residents is in scope regardless of offline connections

Final Verdict

GDPR and DPDP share a vocabulary. They do not share an architecture. Consent-dominant processing, registered Consent Manager integration, uniform PII governance across all data categories, and cascade revocation workflows — none of these exist in a standard GDPR implementation. Every one of them requires new infrastructure.

The organizations that treat DPDP as a GDPR policy update will discover the gap at the worst possible time: under a DPBI investigation, with a 72-hour breach notification window running and a consent store that wasn’t designed for Indian enforcement.

Build the DPDP architecture deliberately. “The GDPR Assumption” has a ₹250 crore failure mode.

For the full technical architecture, read implementing DPDP readiness on Databricks: architecture reference.

FAQ: DPDP vs GDPR Differences

Is DPDP similar to GDPR?

They share surface concepts — consent, data subject rights, breach notification, and data protection officers — but diverge structurally. DPDP is digital-only, consent-dominant (no legitimate interest basis), introduces a unique Consent Manager role, and applies a uniform standard to all personal data without a sensitive data sub-category.

Can existing GDPR compliance infrastructure be used for DPDP?

Partially. GDPR infrastructure covers DPO governance, rights request workflows, and breach notification protocols — all of which map to DPDP requirements. However, your consent architecture, PII tagging strategy, and any processing activity currently justified under legitimate interests must be rebuilt for DPDP compliance.

What is the difference between DPDP consent and GDPR consent?

Both require freely given, specific, informed, and withdrawable consent. The key difference is that DPDP makes consent the dominant lawful basis for commercial processing — GDPR’s legitimate interest basis does not exist under DPDP. DPDP also requires consent to be purpose-specific and to integrate with registered Consent Managers from November 2026.

What is the Consent Manager under DPDP and does GDPR have an equivalent?

The Consent Manager is a government-registered third-party intermediary that manages consent on behalf of data principals. GDPR has no equivalent — under GDPR, consent management is a first-party responsibility. Under DPDP, Data Fiduciaries must be able to interact with external Consent Managers from Phase 2 onwards.

How does DPDP’s digital-only scope differ from GDPR?

GDPR applies to all personal data regardless of format — digital, paper, CCTV, physical records. DPDP applies only to digital personal data and to non-digital personal data that has been digitized. For Databricks-based enterprises, all digital processing of Indian residents’ personal data is in scope without exception.

Does DPDP have a sensitive data category like GDPR Article 9?

No. DPDP does not create a separate sensitive data sub-category. The Central Government can designate specific data types for enhanced protection via notification, but no such categories have been notified as of 2026. All personal data carries the same consent and access control obligations under DPDP.

Which is stricter — DPDP or GDPR?

On consent requirements, DPDP is stricter — it eliminates legitimate interest as a lawful basis, which GDPR’s enterprise programs heavily rely on. On breach notification, both impose 72-hour windows to regulators. Neither is categorically “stricter” — they create different obligations that require different infrastructure.

Talk to Sinki.ai about migrating your data architecture from GDPR to DPDP

Consent store redesign, Unity Catalog PII governance, and Consent Manager integration, all native to your Databricks workspace.

DPDP Act 2023 Requirements and Commencement Timeline

₹250 crore is the maximum penalty for organizations that fail to maintain reasonable security safeguards under India’s Digital Personal Data Protection (DPDP) Act, 2023. That penalty is not conditional on intent. It applies from the moment full enforcement begins on May 13, 2027.

Most DPDP compliance briefings treat the Act as a single event. It isn’t. The DPDP Rules 2025 operate on a 3-phase commencement schedule — and the first phase is already active. “The Compliance Countdown” started running on November 13, 2025, whether or not your data team knows it.

This guide covers exactly what the Act requires, when each obligation activates, and what your legal and engineering teams must have operational before enforcement arrives.

What you will master in this guide:

  • The core obligations of the DPDP Act 2023 and DPDP Rules 2025
  • The exact 3-phase commencement timeline and what triggers each phase
  • The data principal rights your platform must support and the response windows attached
  • A deadline checklist for 2026 and 2027

For the technical implementation path on Databricks, read the DPDP readiness on Databricks: complete guide 2026.

What Does the DPDP Act 2023 Actually Require of Your Organization?

Most compliance briefings reduce the DPDP Act 2023 requirements to “you need consent.” That’s not wrong — but it understates the engineering problem by a significant margin.

The Act, passed on August 11, 2023, establishes 8 categories of obligation. Each maps to a specific technical or operational requirement your data platform must support:

Lawful processing — personal data can only be processed for a specified, consented purpose → Every pipeline processing personal data must have a matching consent record before it executes

Notice and consent — data principals must receive a clear notice in their preferred language before consent is obtained → Existing privacy policies buried in terms of service do not satisfy this requirement

Data minimization — only the personal data necessary for the stated purpose may be collected → This changes how ingestion pipelines are designed, not just what data is retained

Storage limitation — personal data must be retained only for as long as the consented purpose requires → Automated retention policies are required; manual deletion is not an acceptable substitute

Security safeguards — reasonable technical and organizational measures must protect personal data → ₹250 crore is the penalty for security failures — the highest in the schedule

Accountability — data fiduciaries are responsible for compliance across their entire data estate → Third-party processors do not absorb your liability

Breach notification — the Data Protection Board and affected data principals must be notified within 72 hours of awareness → The window starts from detection, not from when the breach occurred

Data principal rights — all 5 rights must be supported with defined response timelines → Rights requests require automated workflows; ticket queues cannot meet the 7-day Rule 14 window

Reliance Jio holds personal data for over 430 million subscribers across telecom, payments, and digital services. Under DPDP, every one of those records requires consent linkage, purpose mapping, retention enforcement, and a fulfilled erasure path. That obligation is a data engineering problem — not a legal one.

The DPDP Act 2023 does not distinguish between organizations that understand their obligations and those that don’t. Both are held to the same standard.

What Are the 3 Phases of DPDP Commencement in 2026?

This is the section most compliance briefings get wrong.

The DPDP Act does not have a single enforcement date. The DPDP Rules 2025, notified on November 13, 2025, operate on a 3-phase schedule. Each phase activates specific provisions — and the phases do not pause for organizations that are still planning.

PhaseEffective DateWhat ActivatesEngineering Work Required
Phase 1November 13, 2025Data Protection Board operational — can investigate and penalize todayPII mapping, Unity Catalog governance foundation
Phase 2November 13, 2026Consent Manager registration framework live; interoperable consent platforms activeConsent store, multi-lingual notice engine, revocation workflows
Phase 3May 13, 2027Full enforcement — all rights obligations, complete penalty schedule, breach notificationRights fulfillment automation, breach detection, SDF obligations, audit readiness

Phase 1 is already active. The Data Protection Board can investigate complaints and issue penalties today. Phase 2 — the activation of India’s formal Consent Manager ecosystem — arrives in November 2026.

Here’s why that matters: Phase 2 is not more planning time. It activates a new technical requirement. Your consent architecture must be operational before November 2026 to integrate correctly with India’s Consent Manager framework.

Organizations treating May 2027 as their planning start date are already in breach of Phase 1 obligations.

What Are the Core DPDP Rules 2025 Obligations and Their Penalty Schedule?

The DPDP Rules 2025 translate the Act’s principles into operational requirements with specific response windows and penalties attached to each failure.

ObligationRequirementResponse WindowPenalty for Failure
Security safeguardsReasonable technical and organizational measuresOngoingUp to ₹250 crore
Breach notificationNotify Data Protection Board and data principals72 hours from awarenessUp to ₹200 crore
Rights fulfillmentRespond to access, correction, erasure, nomination requests7 days (Rule 14)Up to ₹50 crore per violation
Consent managementObtain free, specific, informed, unconditional consentBefore processingUp to ₹200 crore
Data retentionDelete personal data when purpose is fulfilledOngoingUp to ₹150 crore
Grievance redressalAppoint officer, resolve complaints within defined timelines30 daysUp to ₹50 crore

Most organizations focus their compliance programs on consent. The real engineering pressure comes from breach notification and rights fulfillment — both require automated systems to meet their timelines at enterprise scale.

The 72-hour breach notification window is the hardest to meet. A delayed detection system does not extend your window. It eliminates it.

Rights fulfillment is the second pressure point. 7 days at enterprise volume means automated workflows — and every organization still relying on engineering tickets will fail this requirement.

What Are the DPDP Compliance Deadlines Your Team Must Hit in 2026 and 2027?

This information has not been clearly laid out elsewhere — until now.

Before November 13, 2026:

  • PII discovery and classification complete across the full data estate → You cannot consent-map, erase, or audit what you haven’t found
  • Consent store architecture deployed and operational → Must be ready to interact with India’s Consent Manager ecosystem from day one
  • Multi-lingual notice engine deployed → DPDP requires notices in the data principal’s preferred language
  • Consent revocation workflows operational → Withdrawal must trigger automatic downstream data restriction

Before May 13, 2027:

  • All 5 data principal rights supported with automated fulfillment workflows → Access, correction, erasure, grievance redressal, and nomination — all within 7 days
  • Breach detection and 72-hour notification pipeline operational → Detection lag does not extend the notification window — it eliminates it
  • Data retention policies automated across the estate → Manual deletion processes cannot meet enforcement-era volume requirements
  • SDF assessment completed and obligations implemented if applicable → Significant Data Fiduciary designation adds DPO, DPIA, and data localization requirements
  • Audit trail infrastructure verified and defensible → Policy documentation is not evidence — technical logs are

A realistic enterprise DPDP program on Databricks takes 3 to 6 months to implement. That means the effective engineering deadline for Phase 3 is November 2026 — not May 2027.

The organizations that start now are the ones that will be defensible when the first enforcement notice arrives.

Final Verdict

The DPDP Act 2023 requirements are not ambiguous. They establish a clear set of obligations — consent, rights, breach notification, retention, security — with defined response windows and a published penalty schedule attached to each failure.

What is ambiguous is whether your data platform can actually fulfill those obligations at enterprise scale. A policy document cannot be audited. A Unity Catalog governance configuration can.

The 3-phase enforcement schedule is running. Phase 1 is active. Phase 2 arrives November 2026. Phase 3 carries the full penalty exposure in May 2027.

The only question left is whether your Databricks estate will be compliant before the first enforcement notice — or after.

For the technical architecture that fulfills these obligations, read implementing DPDP readiness on Databricks: architecture reference.

For the full program roadmap, see the DPDP readiness roadmap: implementation, operating model, and audit preparation.

FAQ: DPDP Act 2023 Requirements and Timeline

What is the DPDP Act 2023?

The Digital Personal Data Protection Act, 2023 is India’s primary data privacy legislation, governing how digital personal data of Indian residents is collected, processed, stored, and deleted. It applies to Indian organizations and foreign organizations that process personal data of Indian residents, regardless of where their servers are located.

When does DPDP enforcement actually begin?

Full enforcement begins May 13, 2027 — but the Data Protection Board became operational November 13, 2025 and can investigate complaints today. The Consent Manager framework activates November 13, 2026. Organizations do not have until May 2027 to begin; the Phase 2 deadline of November 2026 is the practical engineering cutoff.

What are the key sections of the DPDP Act?

Section 4 (lawful processing), Section 5 (notice), Section 6 (consent), Section 8–9 (data fiduciary obligations), Section 10 (Significant Data Fiduciary obligations), Section 11 (data principal rights), and Sections 25–40 (Data Protection Board and penalty schedule).

What are the DPDP penalties for non-compliance?

Penalties range from ₹50 crore to ₹250 crore depending on the violation. ₹250 crore applies to security safeguard failures — the highest in the schedule. Breach notification failure carries ₹200 crore. Penalties are per violation and cumulative.

What are the 5 rights of data principals under DPDP?

Right to access information about processing; right to correction of inaccurate data; right to erasure of personal data; right to grievance redressal; and right to nominate a trusted person to exercise rights in the event of death or incapacity.

What is a Significant Data Fiduciary?

An organization designated by the government based on volume and sensitivity of data processed, national security risks, or risks to data principals. SDFs face additional obligations: India-resident DPO, annual Data Protection Impact Assessments, and data localization requirements — with up to ₹150 crore in additional penalty exposure.

How does DPDP differ from GDPR?

DPDP is digital-only; GDPR covers all personal data. DPDP relies primarily on consent — there is no legitimate interest basis. DPDP introduces the Consent Manager role (no GDPR equivalent) and does not separately categorize sensitive data. The structural data architecture requirements also diverge significantly.

What does DPDP require from a data engineering team specifically?

Automated PII discovery and classification, a consent store linked to all processing pipelines, rights fulfillment workflows that respond within 7 days, automated retention enforcement, breach detection capable of 72-hour notification, and immutable audit trails. Each of these requires specific technical implementation — not policy documentation.

Build a DPDP-Compliant Data Program on Databricks

Talk to Sinki.ai about building a DPDP-compliant program on Databricks before the November 2026 deadline.Book an assessment → make it for three liner

Who Must Comply With DPDP? Applicability, Exemptions, and Tiers (2026)

The DPDP Act applies more broadly than most legal teams initially assess. The 2 most common misreadings: assuming the Act only applies to Indian-registered companies, and assuming it only applies to organizations above a certain size. Both are wrong — and both are expensive mistakes to discover after enforcement begins.

This guide maps exactly who the DPDP Act applies to, what is genuinely exempt, and which tier of obligation applies to your organization.

What you will master in this guide:

  • The 3 applicability tests that determine whether your organization is in scope
  • The genuine exemptions — and why most organizations cannot rely on them
  • The 2 obligation tiers (Standard Data Fiduciary vs. Significant Data Fiduciary) and what separates them
  • The cross-border applicability rule that catches foreign organizations processing Indian data

For the full obligations breakdown, read the DPDP Act 2023 requirements and commencement timeline.

Who Does the DPDP Act Apply To in 2026?

The DPDP Act, 2023 applies to the processing of digital personal data where:

  1. The data is collected within India (regardless of where it is processed or stored), OR
  2. The data is collected outside India in connection with offering goods or services to individuals in India

This is the cross-border hook. A Singapore-headquartered SaaS company with Indian customers processes their personal data under DPDP jurisdiction. A US-based analytics firm running behavioral models on Indian user data is subject to the Act. Geography of incorporation is irrelevant. Geography of the data principal is what matters.

The entity responsible for compliance is the Data Fiduciary — any person, company, government body, or other entity that determines the purpose and means of processing personal data. This is functionally equivalent to a GDPR Data Controller.

The key question your legal team must answer: does your organization determine what personal data of Indian residents is collected and why it is processed? If yes, you are a Data Fiduciary, and you are in scope.

What Are the DPDP Act Exemptions — And Who Can Actually Use Them?

The Act specifies 4 categories of exemption. These are narrow. Most commercial enterprises cannot rely on any of them.

Exemption CategoryScopeConditionCommercial Enterprise Applicability
State security and sovereigntyCentral/state government bodies processing data for national security, public order, or sovereigntyNo additional conditions — full exemptionNone
Research, archiving, and statisticsProcessing for research, archival, or statistical purposesData must not be used for any decision affecting the data principalNarrow — cannot be used to exempt commercial analytics
Personal or domestic useProcessing by an individual for purely personal or household activitiesNo commercial element — strict personal use onlyNone
Journalistic purposesProcessing by journalists for reporting, editorial, or public interest purposesSubject to journalistic ethics standardsMedia organizations only

Here’s the thing: the research and statistics exemption is the one most enterprises attempt to invoke for analytics workloads. It does not apply if the output of that analysis is used in any decision-making that affects individual data principals — pricing decisions, credit scoring, marketing targeting, fraud detection. Those are not research activities. They are commercial processing activities that require consent.

If your analytics output touches individual decision-making, you are not exempt. You are a Data Fiduciary with full compliance obligations.

What Are the 2 Obligation Tiers Under DPDP?

The DPDP Act creates 2 distinct tiers of Data Fiduciary obligation. Every entity in scope sits in one of these tiers.

Tier 1 — Standard Data Fiduciary

All organizations that process digital personal data of Indian residents, subject to the standard obligation set:

  • Obtain valid consent before processing
  • Provide notices in the data principal’s preferred language
  • Fulfill all 5 data principal rights within mandated timelines
  • Implement reasonable security safeguards
  • Notify the DPBI and affected principals within 72 hours of a breach
  • Delete personal data when purpose is fulfilled
  • Appoint a grievance officer accessible to Indian data principals

Tier 2 — Significant Data Fiduciary (SDF)

Organizations designated by the Central Government under Section 10, based on volume, sensitivity, national risk, or electoral risk criteria. SDFs must fulfill all Tier 1 obligations PLUS:

  • Appoint an India-resident Data Protection Officer with board-level access
  • Conduct annual Data Protection Impact Assessments (DPIAs)
  • Submit to annual independent data audits
  • Comply with data localization requirements if notified
  • Implement algorithmic accountability measures
ObligationStandard Data FiduciarySignificant Data Fiduciary
Consent and noticeRequiredRequired
Data principal rights fulfillmentRequired (7-day window)Required (7-day window)
Security safeguardsRequiredRequired
Breach notification72 hours72 hours
DPO appointmentGrievance officer onlyIndia-resident DPO with board access
Annual DPIANot requiredRequired
Independent data auditNot requiredRequired annually
Data localizationNot required (currently)Subject to Central Government notification
Maximum penaltyUp to ₹250 croreUp to ₹250Cr + ₹150Cr SDF violations

Does DPDP Apply to Small Businesses and Startups?

The Act does not create a small business exemption based on revenue or employee count. Every organization processing digital personal data of Indian residents is in scope, regardless of size.

The Central Government has the authority to exempt specific classes of Data Fiduciaries through notification, but no such exemption has been issued as of May 2026. The absence of a small business carve-out is a deliberate policy choice — India’s startup ecosystem processes vast quantities of personal data, and a size-based exemption would create a compliance vacuum in the highest-growth segment of the market.

The practical implication: a seed-stage fintech with 10,000 Indian users processing payment data is a Data Fiduciary subject to the full standard obligation tier. The consent store, rights fulfillment workflows, and breach notification pipeline must exist regardless of headcount or ARR.

Size does not determine DPDP applicability. Data processing of Indian residents’ personal data does.

Final Verdict

The DPDP Act applies to any organization processing digital personal data of Indian residents — Indian or foreign, large or small, commercial or non-profit. The exemptions are narrow and most commercial enterprises cannot access them. The Significant Data Fiduciary tier adds a materially heavier obligation set for organizations at scale. Both tiers carry the same ₹250 crore security safeguard penalty ceiling.

The only applicability assessment that matters is whether your organization processes digital personal data of Indian residents. If the answer is yes, you are in scope — and the compliance clock is running.

FAQ: DPDP Act Applicability and Exemptions

Who must comply with the DPDP Act?

Any organization — Indian or foreign — that processes digital personal data collected within India, or collected outside India in connection with offering goods or services to Indian residents. This applies regardless of company size, incorporation geography, or industry.

Does DPDP apply to foreign companies? 

Yes. Any company outside India that collects or processes digital personal data of Indian residents — including through apps, websites, or digital services — is subject to the Act. The jurisdiction trigger is the location of the data principal, not the location of the company.

What organizations are exempt from DPDP?

Genuine exemptions are limited to state security and sovereignty functions, certain research and statistical activities (where outputs are not used in individual decisions), personal/domestic use, and journalistic purposes. Commercial enterprises do not qualify for any of these exemptions in their normal operations.

Is there a small business or startup exemption under DPDP?

No. The Act does not create a size-based exemption. Every organization processing digital personal data of Indian residents is in scope. The Central Government may issue class-based exemptions through notification, but none have been issued as of May 2026.

What is a Significant Data Fiduciary under DPDP?

An organization designated by the Central Government based on data volume processed, sensitivity of data, risk to data principals’ rights, potential impact on national security, or risk to electoral democracy. SDFs face additional obligations including a mandatory India-resident DPO, annual DPIAs, and independent data audits.

How do you know if your organization is a Significant Data Fiduciary?

Designation is made by the Central Government through notification under Section 10. Organizations processing high-volume sensitive data — particularly in BFSI, healthtech, and large consumer platforms — should conduct an SDF self-assessment and prepare for potential designation before May 2027.

Talk to Sinki.ai about scoping your DPDP applicability

Build the compliance architecture your obligation tier requires – all native to your Databricks workspace.

When Do DPDP Obligations Come Into Force? The 3-Phase Timeline 2026

DPDP obligations are not all live at once. They roll out in 3 phases across an 18-month window from November 2025 to May 2027. Most organizations are operating under “The Phasing Misread”: they treat Phase 3 as the starting line. It is the deadline.

Phase 1 activated on November 13, 2025. The Data Protection Board of India became operational on that date. It can receive complaints, conduct inquiries, and impose penalties today. The organizations waiting for May 2027 to begin compliance are already operating under an active enforcement regime.

What you will master in this guide:

  • Which obligations are live and enforceable today under Phase 1
  • What Phase 2 activates on November 13, 2026
  • What becomes fully enforceable on May 13, 2027 under Phase 3
  • How to map the 3 phases to your internal planning calendar
  • The obligations that require the longest lead time and must start now

For the complete compliance architecture, read DPDP readiness on Databricks: complete guide 2026.

What Is the Legal Basis for DPDP’s 3-Phase Timeline?

The phased commencement structure flows from the DPDP Act, 2023 and the DPDP Rules, 2025. The Act was passed in August 2023 but required delegated rules before obligations could be enforced. The DPDP Rules were notified in 2025, establishing the 3-phase schedule.

Phase 1: November 13, 2025 The DPDP Rules came into force. The Data Protection Board of India was constituted and became operational. Phase 1 obligations became immediately enforceable.

Phase 2: November 13, 2026 Phase 2 obligations activate, including the Consent Manager framework. Registered third-party Consent Managers become operational. Data Fiduciaries must be able to accept consent signals from external Consent Managers.

Phase 3: May 13, 2027 Full enforcement. All remaining obligations are active. The complete penalty schedule is enforceable. There is no further phase after May 2027.

This is not a grace period structure. “The Phasing Misread” is treating Phase 1 as the warning shot and Phase 3 as the real enforcement date. Phase 1 is enforcement. The DPBI is operational and accepting complaints today.

The 3-phase timeline is a rolling compliance requirement, not a single enforcement cliff on May 13, 2027.

What Obligations Are Active Right Now Under Phase 1 of DPDP?

Phase 1 obligations became enforceable on November 13, 2025. Every Data Fiduciary in scope must already be compliant with these requirements.

Security safeguards (Section 8(5)) Reasonable security safeguards must be implemented. For Databricks deployments, this means: encryption of personal data at rest and in transit, Unity Catalog column masking and row-level security on PII-tagged tables, audit logging of all access to personal data, and anomaly detection on data access patterns. → Penalty for non-compliance: up to ₹250 crore

Breach notification (Section 8(6)) Within 72 hours of becoming aware of a personal data breach, the DPBI must be notified. Affected data principals must be notified as soon as possible thereafter. → Penalty for non-compliance: up to ₹200 crore

Consent obligation (Section 6) Personal data may only be processed with valid, purpose-specific, withdrawable consent. Consent must be linked to a specific notice in the data principal’s preferred language. → Penalty for non-compliance: up to ₹200 crore

Data principal rights fulfillment (Section 11) All 5 data principal rights must be honored within 30 days of a validated request. → Penalty for non-compliance: up to ₹50 crore per violation

Grievance officer appointment (Section 13) A named grievance officer must be accessible to data principals through your platform.

Data retention and purpose limitation (Section 8(3) and 8(9)) Personal data may only be retained as long as necessary for the specified processing purpose. It must be deleted or anonymized when the purpose is fulfilled. → Penalty for non-compliance: up to ₹150 crore

The DPBI became operational on November 13, 2025. All Phase 1 obligations are in scope for enforcement today. There is no waiting period.

What Obligations Activate Under Phase 2 on November 13, 2026?

Phase 2 adds one major framework obligation to the Phase 1 set.

Consent Manager Integration From November 13, 2026, registered third-party Consent Managers become operational. Data Fiduciaries must be able to:

  • Receive consent signals from registered Consent Managers through a standardized API endpoint → A first-party-only consent store is non-compliant from this date
  • Honor consent withdrawal signals from external Consent Managers immediately → Cascade revocation must work for externally originated withdrawal signals, not just signals from your own application layer
  • Maintain an accurate consent ledger that reflects signals from both internal and external sources → Consent records must be reconcilable across first-party and third-party origins

The architectural implication for Databricks: your consent store must expose an API endpoint that accepts external consent events and writes them to the internal Delta consent ledger in the correct schema. Sinki.ai’s Consent Manager includes this Phase 2 API endpoint as part of its standard deployment, ready for the November 2026 activation date from day one.

Phase 2 does not introduce new consent or rights requirements. It extends the existing consent architecture to accept external signals. Organizations that have already built a DPDP-compliant consent store in Phase 1 need only add the Consent Manager API endpoint before November 2026.

Organizations that delay Phase 1 consent store construction past Q2 2026 will not have enough time to add Phase 2 Consent Manager integration before the November 13, 2026 deadline.

What Becomes Fully Enforceable Under Phase 3 on May 13, 2027?

Phase 3 is full enforcement. From May 13, 2027:

  • All Phase 1 and Phase 2 obligations are enforceable with the complete penalty schedule
  • Significant Data Fiduciary obligations are fully enforced for all designated organizations
  • Cross-border data transfer restrictions, if any are notified by the Central Government, take effect
  • Any remaining notifications or rules issued under the Act are in full force

Phase 3 is often described as the “enforcement deadline.” That framing is inaccurate. The DPBI can and does investigate complaints under Phase 1 and Phase 2 obligations today. Phase 3 is better understood as the point at which every obligation in the Act is active and no further transitional arrangements exist.

Organizations that are not compliant with Phase 1 obligations by Phase 3 face compounding exposure: back-dated penalty risk for Phase 1 violations that have been accumulating since November 2025, plus Phase 3 obligations, all simultaneously.

How Does the 3-Phase Timeline Map to a DPDP Implementation Plan?

PhaseActivation DateKey ObligationsPlanning Action Required Now
Phase 1November 13, 2025Security safeguards, breach notification, consent, rights fulfillment, grievance officer, data retentionImmediate: gap assessment and compliance build for all Phase 1 obligations
Phase 2November 13, 2026Consent Manager API integrationBy Q3 2026: Consent Manager API endpoint deployed and tested
Phase 3May 13, 2027Full enforcement of all obligationsBy Q1 2027: all obligations verified, SDF requirements complete, audit trail ready
SDF-specificOngoing post-designationDPO appointment, DPIA, annual audit, algorithmic accountabilityImmediate upon designation: begin DPO hiring and DPIA framework

The planning calendar implication is direct. Organizations that have not completed Phase 1 compliance have 2 simultaneous problems. They are already behind on Phase 1 enforcement. And they need to complete Phase 1 before Phase 2 preparation can begin.

Which DPDP Obligations Have the Longest Lead Time and Must Start Now?

3 obligations require the longest build time and should begin immediately regardless of which phase they formally activate in.

Consent Store Architecture Building a DPDP-compliant consent store from scratch takes 6 to 10 weeks. If this work has not begun, it should begin immediately. Sinki.ai’s Consent Manager compresses this to days, but the configuration, validation, and stakeholder review still take 2 to 3 weeks.

PII Discovery and Unity Catalog Tagging A complete PII inventory is required before any other compliance control can be configured. Manual discovery takes 4 to 6 weeks at enterprise scale. Automated discovery with Audit Gap Finder takes days. But the validation, stakeholder review, and Unity Catalog configuration still take 2 to 3 weeks.

DPO Appointment for SDF Organizations Hiring an India-resident DPO with the required compliance expertise takes 3 to 6 months. This is the single longest-lead-time obligation in the entire DPDP program. Organizations at SDF designation risk that have not begun the hiring process are at risk of missing the SDF obligation even if all technical components are on schedule.

“The Phasing Misread” costs organizations 6 to 12 months of lead time they cannot recover. The compliance clock started on November 13, 2025.

Final Verdict

The DPDP timeline is not a single enforcement date. It is a rolling compliance schedule with obligations active at 3 distinct points. Phase 1 is live. The DPBI is operational. Phase 2 adds the Consent Manager integration requirement in November 2026. Phase 3 is full enforcement in May 2027.

Organizations that treat May 2027 as the starting line will arrive at Phase 3 with Phase 1 violations already on the board, no consent infrastructure, and a DPBI that has been operational for 18 months.

The compliance window is not closing in May 2027. It opened in November 2025. The only question left is how much of it you have used.

FAQ: DPDP Obligations Timeline

When did DPDP obligations come into force?

The first phase of DPDP obligations came into force on November 13, 2025, when the DPDP Rules were notified and the Data Protection Board of India was constituted. Phase 1 includes security safeguards, breach notification, consent, rights fulfillment, and data retention obligations.

What are the 3 phases of DPDP commencement?

Phase 1 activated on November 13, 2025, covering core obligations including security safeguards, breach notification, consent, and data principal rights. Phase 2 activates on November 13, 2026, adding the Consent Manager framework. Phase 3 is full enforcement from May 13, 2027.

What DPDP obligations are active today in 2026?

As of May 2026, all Phase 1 obligations are active and enforceable: reasonable security safeguards (₹250 crore penalty), 72-hour breach notification (₹200 crore), consent with purpose specificity (₹200 crore), data principal rights fulfillment within 30 days (₹50 crore per violation), grievance officer appointment, and data retention and purpose limitation (₹150 crore).

What does the November 2026 DPDP deadline require?

By November 13, 2026, Data Fiduciaries must be able to accept consent signals from registered third-party Consent Managers through a standardized API endpoint. This requires modifying the consent store architecture to accept external consent events, not just first-party application signals.

Can the DPBI impose penalties before May 2027?

Yes. The DPBI became operational on November 13, 2025 and can investigate complaints and impose penalties for Phase 1 violations today. The May 2027 date is the full enforcement deadline, not the start of enforcement. Phase 1 violations are already actionable.

What should organizations prioritize given the 3-phase timeline?

Organizations should treat Phase 1 as the current compliance requirement and begin or complete the gap assessment and architecture build immediately. Phase 2 preparation (Consent Manager API endpoint) should be completed by Q3 2026 to allow testing time before the November 2026 activation. SDF organizations should begin DPO hiring immediately given the 3 to 6 month hiring timeline.

What happens if an organization is non-compliant at Phase 3?

An organization non-compliant at Phase 3 faces the full DPDP penalty schedule with no further phasing. More significantly, Phase 1 violations that have been accumulating since November 2025 may be subject to retrospective investigation. The combined exposure from multi-phase non-compliance can exceed ₹800 crore for organizations with simultaneous violations across security, consent, and SDF obligations.

Sinki.ai’s DPDP compliance suite deploys Phase 1 and Phase 2 compliance infrastructure

Natively inside your Databricks workspace, with Phase 2 Consent Manager API readiness built in from day one.

What Are the Penalties Under the DPDP Act? The Full 2026 Schedule

₹250 crore is not a ceiling that applies only to the most egregious violations. It is the penalty for a single category of failure — inadequate security safeguards — that most Indian enterprises are currently exposed to. The DPDP Act, 2023’s penalty schedule is not designed to warn. It is designed to hold.

Most penalty guides list the numbers. This one explains what triggers them and what your Databricks estate must do to prevent each one.

What you will master in this guide:

  • The complete DPDP penalty schedule and the section behind each fine
  • What specific failure triggers each penalty — not just the amount
  • The data engineering controls that directly shield against the top 3 penalties
  • The cumulative exposure your organization carries right now

For the full compliance roadmap, read the DPDP readiness on Databricks: complete guide 2026.

What Is the Full DPDP Penalty Schedule for 2026?

The DPDP Act’s penalty schedule is defined in the Schedule appended to the Act and in Section 33. The Data Protection Board of India (DPBI) — established under Section 18 — has the authority to investigate complaints, conduct inquiries, and impose penalties. The Board became operational on November 13, 2025. It can act today.

Here is the complete 2026 DPDP penalty schedule:

ViolationGoverning SectionMaximum Penalty
Failure to implement reasonable security safeguardsSection 8(5)₹250 crore
Failure to notify DPBI of a personal data breachSection 8(6)₹200 crore
Failure to notify affected data principals of a breachSection 8(6)₹200 crore
Non-fulfillment of data principal rights requestsSection 11 / Rule 14₹50 crore per violation
Violation of Significant Data Fiduciary obligationsSection 10₹150 crore
Non-compliance with DPBI directions or ordersSection 33₹50 crore
Violation of consent obligationsSection 6₹200 crore
Violation of data retention / purpose limitation obligationsSection 8₹150 crore
Other violations not separately specifiedSchedule₹50 crore

Penalties are per violation — not per incident. A single breach that triggers both the security safeguard failure (₹250 crore) and the notification failure (₹200 crore) simultaneously creates ₹450 crore in combined exposure. This is not a theoretical risk. It is the default outcome of a poorly instrumented data estate.

What Triggers the ₹250 Crore Security Safeguard Penalty?

This is the penalty most boards don’t fully understand. The ₹250 crore exposure does not require a breach to have occurred. It applies when a Data Fiduciary fails to implement reasonable security safeguards — whether or not that failure results in a breach.

“Reasonable security safeguards” is defined by the Board through guidelines, but current best practice maps to:

  • Encryption of personal data at rest and in transit → Unencrypted Delta tables containing personal data are a direct exposure
  • Access controls at column and row level → Unity Catalog column masking and row-level security are not optional governance — they are penalty shields
  • Audit logging of all access to personal data → Passive logs that are not actively monitored and alerting are not “reasonable”
  • Anomaly detection on data access patterns → A breach you detect 2 weeks late does not extend your notification window — it eliminates it

Zomato processes delivery addresses, payment data, and behavioral profiles for over 100 million users. An estate of that scale without column-level access controls and active anomaly detection on PII access is not a compliance gap. It is a ₹250 crore liability waiting for the Board to receive the first complaint.

The ₹250 crore penalty does not require a breach. It requires the absence of reasonable safeguards. Most Databricks estates are already exposed.

What Triggers the ₹200 Crore Breach Notification Penalty?

The 72-hour rule is the most operationally demanding requirement in the Act. Section 8(6) requires notification to the DPBI within 72 hours of becoming aware of a personal data breach. Affected data principals must be notified immediately thereafter.

Here’s what most compliance teams get wrong: the window starts from detection, not from when the breach occurred. A breach that happened 3 weeks ago but was detected today gives you 72 hours from today. A detection system that identifies breaches 5 days after they occur gives you negative 48 hours.

The failure mode is not missing the deadline intentionally. The failure mode is:

  • No automated breach detection configured on Unity Catalog audit logs → Manual investigation timelines cannot meet the 72-hour window at enterprise scale
  • Breach notification process not pre-built and tested → A notification process being drafted after detection is already too slow
  • Notification template not pre-approved by legal → Every hour spent drafting is an hour closer to ₹200 crore

The 72-hour breach notification penalty is not a compliance problem. It is a detection and automation problem.

What Triggers the SDF Penalty of ₹150 Crore?

The Significant Data Fiduciary (SDF) designation carries a separate penalty tier of up to ₹150 crore for violations of SDF-specific obligations. Those obligations include:

  • Failure to appoint an India-resident Data Protection Officer
  • Failure to conduct annual Data Protection Impact Assessments (DPIAs)
  • Failure to submit to annual data audits by an independent auditor
  • Non-compliance with data localization requirements if notified

The SDF penalty is cumulative with the base penalty schedule. An SDF that also fails on security safeguards faces ₹250 crore + ₹150 crore = ₹400 crore in combined exposure from 2 simultaneous violations.

The assessment question your compliance team must answer before May 2027: has your organization been designated as an SDF, or is it at risk of designation? Large BFSI, fintech, and healthtech enterprises processing data for millions of Indian users are the most exposed.

The Cumulative DPDP Penalty Exposure Model

This is the section most DPDP penalty guides skip.

Penalties are not mutually exclusive. A single compliance failure can trigger multiple simultaneous penalties. Here is a realistic exposure model:

ScenarioViolations TriggeredCombined Exposure
Security breach with no automated detectionSecurity safeguard (₹250Cr) + Breach notification failure (₹200Cr)₹450 crore
SDF designation missed, DPO not appointedSDF obligation (₹150Cr)₹150 crore
Rights requests ignored for 60 daysRights fulfillment failure × volume of requests (₹50Cr each)₹50 crore+
Consent collected without purpose specificationConsent violation (₹200Cr)₹200 crore
Worst-case combined scenarioSecurity + Breach notification + SDF + Consent₹800 crore+

The worst-case combined scenario is not hypothetical. It is the outcome of an organization that ran workshops but never built compliant infrastructure.

Final Verdict

The DPDP penalty schedule is not a deterrent for bad actors. It is a financial consequence for organizations that understood their obligations and still didn’t build the systems to fulfill them. ₹250 crore for security failures. ₹200 crore for missed breach notification. ₹150 crore for SDF non-compliance. All simultaneously available. All Board-enforceable from November 2025.

The only question is whether your Databricks estate closes the exposure before the first enforcement notice arrives — or after.

For the implementation roadmap that directly mitigates these penalties, read DPDP readiness roadmap: implementation, operating model, and audit preparation.

FAQ: DPDP Act Penalties

What is the maximum penalty under the DPDP Act?

₹250 crore — for failure to implement reasonable security safeguards under Section 8(5). This is the highest single penalty in the schedule and can be combined with other simultaneous violations, creating total exposure well above ₹250 crore.

Can multiple DPDP penalties be applied simultaneously?

Yes. Penalties are per violation, not per incident. A data breach that also triggers a notification failure creates ₹250 crore (security) + ₹200 crore (notification) = ₹450 crore in combined exposure from a single event.

What triggers the ₹200 crore breach notification penalty? 

Failure to notify the Data Protection Board of India within 72 hours of becoming aware of a personal data breach, or failure to notify affected data principals immediately thereafter. The window starts from detection — a slow detection system eliminates your notification window.

What is the penalty for failing to fulfill data principal rights requests? 

Up to ₹50 crore per violation under Section 11 and Rule 14. Since rights requests are individual — each unfulfilled request is a separate violation — this penalty can multiply rapidly at enterprise scale.

When can the DPBI begin imposing penalties?

The Data Protection Board of India became operational on November 13, 2025 and can investigate complaints and impose penalties today. Full enforcement of all penalty provisions activates May 13, 2027, but Phase 1 violations are already actionable.

What engineering controls directly reduce DPDP penalty exposure?

Unity Catalog column masking and row-level security (₹250Cr security penalty shield), automated breach detection alerting on PII access (₹200Cr notification penalty shield), and automated rights fulfillment pipelines with 7-day SLA (₹50Cr rights penalty shield). Each technical control maps directly to a penalty category.

Sinki.ai’s DPDP compliance suite – Audit Gap Finder, Consent Manager, and Data Erasure

Directly addresses the top 3 penalty exposures natively inside your Databricks workspace.

Significant Data Fiduciary Under DPDP: Obligations and Impact (2026)

There is a tier of DPDP compliance where the standard obligation set is not enough. For organizations designated as Significant Data Fiduciaries (SDFs) under Section 10 of the DPDP Act, the rules are different — harder, more specific, and enforced with a separate penalty tier of up to ₹150 crore on top of the standard schedule.

Most large Indian enterprises in BFSI, healthtech, and e-commerce will face SDF designation. Most of them have not yet assessed what that designation technically demands from their Databricks estate.

This guide closes that gap.

What you will master in this guide:

  • The 5 criteria that trigger SDF designation
  • The 5 additional obligations SDFs must fulfill beyond the standard tier
  • The specific Databricks configuration changes each SDF obligation requires
  • The penalty structure that applies exclusively to SDF violations

For the full DPDP compliance architecture, return to the DPDP readiness on Databricks: complete guide 2026.

What Is a Significant Data Fiduciary Under DPDP — and How Is It Designated?

Significant Data Fiduciary (SDF) is a Data Fiduciary designated by the Central Government under Section 10 of the DPDP Act, 2023 and Rule 13 of the DPDP Rules, 2025.

Designation is triggered by an assessment of 5 criteria. Any one criterion is sufficient for designation — organizations do not need to meet all 5:

  • Volume of personal data processed — no specific threshold has been published, but large consumer platforms processing millions of Indian users are the primary target → BFSI, e-commerce, and telecom enterprises with 10M+ users are at elevated designation risk
  • Sensitivity of personal data — organizations processing financial, health, or children’s data are at elevated risk → A healthtech processing 1 million patient records faces the same designation pressure as a platform 10x its size
  • Potential risk to rights of data principals — processing that enables discrimination, identity theft, or financial harm at scale → Credit scoring, insurance underwriting, and behavioral ad targeting models qualify
  • Potential impact on sovereignty and integrity of India — cross-border data flows involving sensitive national datasets → Any organization transferring large-scale Indian data to foreign jurisdictions
  • Risk to electoral democracy — organizations with access to voter-linked data or political behavioral profiles → Social platforms and political analytics firms face heightened scrutiny under this criterion

SDF designation is not a question of “if” for large Indian enterprises — it is a question of “when.”

What Are the 5 Additional Obligations for Significant Data Fiduciaries?

SDFs must fulfill all standard Data Fiduciary obligations plus 5 additional requirements specific to their designation:

1. India-Resident Data Protection Officer (DPO) The DPO must be based in India, report to the highest management level (board or CEO), and be the primary point of contact for data principals and the DPBI. This is not a part-time CISO responsibility. The DPO must have direct authority over data processing decisions and the ability to halt non-compliant processing activities.

On Databricks: the DPO must have query access to Unity Catalog audit logs, the consent store, rights request logs, and erasure certificates — without depending on engineering tickets. A DPO who cannot independently verify compliance cannot fulfill their SDF obligation.

2. Annual Data Protection Impact Assessment (DPIA) SDFs must conduct an annual DPIA covering all significant processing activities, identifying risks to data principals’ rights, and documenting controls in place. The DPIA must assess both existing and new processing activities.

On Databricks: the DPIA requires a complete inventory of all tables containing personal data, the processing pipelines that access them, and the consent records linked to each. Sinki.ai’s Audit Gap Finder provides this inventory automatically — running PII classification across 30+ sources within your Unity Catalog without moving data outside your workspace.

3. Annual Independent Data Audit SDFs must engage an independent auditor to audit their data processing activities, consent architecture, rights fulfillment records, and security controls annually. The auditor’s report is submitted to the DPBI.

On Databricks: the audit requires exportable, immutable evidence of every compliance-relevant event — consent capture, rights fulfillment, erasure completion, breach detection and notification. Passive audit logs are not sufficient. The estate must generate audit-ready reports on demand.

4. Data Localization (If Notified) The Central Government may notify specific categories of personal data that SDFs must store and process within India. No such notification has been issued as of May 2026, but the architecture for data localization — India-region Databricks workspace deployment, restricted data egress controls — must be ready to activate on short notice.

5. Algorithmic Accountability SDFs that use automated decision-making systems — ML models, recommendation engines, fraud detection algorithms — must document the logic, assess the impact on data principals, and ensure the ability to explain decisions to individuals who request it.

On Databricks: this requires MLflow model documentation, lineage tracking between training data and model outputs, and a process for explaining individual predictions in response to data principal requests.

Significant Data Fiduciary Obligations and Databricks Configuration Requirements

SDF ObligationStandard DatabricksSDF-Compliant Databricks Configuration
India-resident DPONo specific configurationDPO dashboard with direct Unity Catalog audit query access; self-service rights request verification
Annual DPIANo specific configurationAutomated PII inventory from Audit Gap Finder; processing activity register in Unity Catalog
Independent data auditPassive audit log storageExportable immutable audit trail; compliance report generation on demand
Data localizationNo specific configurationIndia-region workspace deployment; data egress restrictions in Unity Catalog
Algorithmic accountabilityStandard MLflow loggingExtended MLflow documentation; decision explanation pipeline; training data lineage

What Are the Penalties for SDF Non-Compliance?

The SDF penalty tier is separate from and cumulative with the standard penalty schedule:

  • Violation of SDF-specific obligations: up to ₹150 crore
  • This stacks on top of any standard violation — an SDF that also fails on security safeguards faces ₹250 crore + ₹150 crore = ₹400 crore combined exposure

The SDF penalty is not the largest in the schedule — but it is the most avoidable. Unlike a breach (which is an operational event), SDF non-compliance results from not appointing a DPO, not conducting a DPIA, or not engaging an auditor. These are organizational decisions, not system failures.

An SDF designation with no DPO appointed is a ₹150 crore liability that requires a single hiring decision to remove.

Final Verdict

Significant Data Fiduciary designation is coming for the largest data processors in India. The obligations it triggers — DPO, annual DPIA, independent audit, algorithmic accountability — are not incremental additions to a standard compliance program. They require dedicated organizational infrastructure and Databricks configurations that standard deployments do not have.

The enterprises that conduct their SDF self-assessment now, build the DPO access infrastructure, and configure audit-ready reporting before designation are the ones that will absorb the additional obligations without operational disruption. The enterprises that receive designation notices without those systems in place face ₹150 crore in additional exposure on top of their existing standard obligation gaps.

FAQ: Significant Data Fiduciary Under DPDP

What is a Significant Data Fiduciary under DPDP?

An organization designated by the Central Government under Section 10 of the DPDP Act based on the volume and sensitivity of data processed, risk to data principals’ rights, and potential national security or electoral implications. SDFs face 5 additional obligations beyond the standard Data Fiduciary tier.

How is an organization designated as an SDF?

The Central Government issues a notification designating specific entities or classes of entities as SDFs based on the Rule 13 criteria. Organizations cannot self-designate or self-exempt. Large BFSI, healthtech, and consumer platform enterprises are the most likely candidates for early designation.

What are the SDF obligations under DPDP?

Appointment of an India-resident DPO with board-level reporting, annual Data Protection Impact Assessments, annual independent data audits, potential data localization requirements upon notification, and algorithmic accountability documentation for automated decision-making systems.

What is the penalty for SDF non-compliance?

Up to ₹150 crore for violation of SDF-specific obligations under Section 10. This penalty is cumulative with standard DPDP penalties — an SDF that also has a security safeguard failure faces ₹250 crore + ₹150 crore in combined exposure.

Does DPDP require SDFs to store data in India?

Data localization requirements for SDFs are subject to Central Government notification. No such notification has been issued as of May 2026. However, SDF-designated organizations should architect their Databricks deployment to support India-region data residency as a precautionary measure.

How does SDF designation impact a Databricks-based data estate?

SDFs need a DPO-accessible audit query interface, automated PII inventory for annual DPIAs, exportable compliance audit reports for independent auditors, India-region deployment readiness, and MLflow-based algorithmic accountability documentation. None of these exist in a standard Databricks deployment without deliberate configuration.

Sinki.ai’s Audit Gap Finder delivers the SDF-required PII inventory and audit trail

Natively inside your Databricks workspace – no data egress, no manual classification, enforcement-ready.

How to Manage DPDP Consent on Databricks: Consent Store Pattern (2026)

A Databricks pipeline that processes personal data without a linked consent record is not just non-compliant. It is processing data that — under the DPDP Act — it has no legal right to touch. Every row it reads, transforms, or aggregates is unlawful processing. At enterprise volume, that is not a compliance gap. It is a systemic liability.

The consent store pattern is the architecture that closes it. Not a consent checkbox in a UI. Not a privacy policy timestamp in a database. A Delta table-based consent ledger, natively deployed inside your Databricks workspace, that every downstream pipeline joins before it runs — and that stops automatically when a data principal withdraws.

What you will master in this guide:

  • Why a standard Databricks deployment has no consent enforcement layer by default
  • The exact Delta table schema for a DPDP-compliant consent store
  • How consent-filtered views work and why they are the correct enforcement pattern
  • The revocation workflow architecture that makes withdrawal immediate across all pipelines
  • How Sinki.ai’s Consent Manager deploys this pattern without data leaving your workspace

For the full technical architecture, read implementing DPDP readiness on Databricks: architecture reference. For the obligation that makes consent infrastructure mandatory, read DPDP Act 2023 requirements and commencement timeline.

Why Does a Standard Databricks Deployment Have No Consent Enforcement?

Standard Databricks deployments are built for analytical performance — fast reads, high-throughput writes, efficient aggregations. Consent is an application-layer concept. The Databricks lakehouse, by default, has no knowledge of whether the data principal whose record is being processed has consented to that specific processing activity.

“The Consent Blindspot”: a Databricks estate where Unity Catalog governs who can access data (access controls) but has no mechanism to enforce whether that data can be processed (consent controls). Access control and consent control are not the same thing. A pipeline can be fully authorized by Unity Catalog role-based access and still be processing data from a principal who withdrew consent 3 days ago.

Paytm processes financial transaction data for over 300 million users. A standard Unity Catalog deployment can enforce that only the fraud detection team accesses fraud-related tables. It cannot enforce that only users who have active, non-withdrawn consent for fraud detection processing have their records included in fraud model training runs. That is the consent blindspot — and it exists in every standard Databricks deployment.

DPDP requires consent control, not just access control. Every pipeline that touches personal data needs to know the consent status of the data principal before the first row is read.

What Is the DPDP Consent Store Pattern on Databricks?

The consent store pattern is the architecture that adds consent control to a Databricks lakehouse. It has 3 components:

Component 1 — The Consent Delta Table (Consent Ledger)

A Delta table storing one record per consent event. The schema must capture:

ColumnTypeDescription
principal_idSTRINGHashed or tokenized data principal identifier
purpose_codeSTRINGSpecific processing purpose (e.g., fraud_detectionmarketing_targeting)
consent_versionSTRINGVersion of the notice under which consent was given
consent_statusSTRINGactivewithdrawnexpired
consent_timestampTIMESTAMPWhen consent was given
withdrawal_timestampTIMESTAMPWhen consent was withdrawn (null if active)
channelSTRINGChannel through which consent was captured (app, web, SMS)
notice_languageSTRINGLanguage of the notice presented

Every consent event — capture, renewal, and withdrawal — writes a new record. The table is append-only. Historical consent state is fully queryable for audit purposes.

Component 2 — Consent-Filtered Views

Every silver-layer table containing personal data is accessed through a consent-filtered view — a Delta view that joins the underlying table to the consent store and returns only records with active, non-expired consent for the specific processing purpose.

CREATE OR REPLACE VIEW silver.user_transactions_consent_filtered AS
SELECT t.*
FROM silver.user_transactions t
INNER JOIN gold.consent_store cs
  ON t.principal_id = cs.principal_id
  AND cs.purpose_code = 'transaction_analytics'
  AND cs.consent_status = 'active'
  AND cs.withdrawal_timestamp IS NULL;

Every downstream pipeline reads from the consent-filtered view — never from the raw table directly. A principal who withdraws consent disappears from the view automatically, without pipeline modification.

Component 3 — The Revocation Cascade Workflow

Consent withdrawal must propagate immediately. When a data principal withdraws consent through your application or through a registered Consent Manager, the workflow:

  1. Writes a withdrawal record to the consent store Delta table (consent_status = 'withdrawn'withdrawal_timestamp populated) → This is the source-of-truth update — all downstream effects flow from this single record
  2. Consent-filtered views automatically exclude the principal’s records on the next read — no pipeline modification required → The pipeline does not need to know about the withdrawal — the view handles it
  3. A Databricks job triggers to verify the principal’s records are excluded from all active processing pipelines → Verification is automated — not dependent on engineering confirmation
  4. An erasure job evaluates whether the withdrawal also triggers a right-to-erasure obligation → Withdrawal and erasure are separate obligations — both must be evaluated on every revocation event
  5. The withdrawal event is logged to the audit trail with the exact timestamp → DPBI audit evidence requires the withdrawal timestamp to be immutable and queryable

The revocation cascade is event-driven, not batch. A withdrawal processed overnight is a DPDP violation.

What Does DPDP Require From Consent Notices and How Does This Affect Your Architecture?

DPDP consent is only valid if it follows a valid notice. The notice requirements create specific technical obligations:

  • Language: Notices must be available in all 22 scheduled Indian languages on request → Your notice engine must support dynamic language selection, not a static English-only privacy policy
  • Purpose specificity: Each processing purpose requires a separate, named notice and a separate consent record → A blanket “we use your data to improve our services” notice does not create valid DPDP consent for any specific purpose
  • Notice versioning: When a notice is updated, affected data principals must be re-notified and must re-consent → Your consent store must track notice versions — consent given under version 1.0 does not authorize processing under version 2.0
  • Consent Manager integration: From November 13, 2026, consent must be receivable from registered third-party Consent Managers → Your consent store must expose an API that accepts consent signals from external platforms, not just from your own application layer
Consent RequirementStandard ImplementationDPDP-Compliant Implementation
Notice languageEnglish-only, staticDynamic, 22 scheduled Indian languages on request
Purpose specificityBlanket consent statementPer-purpose consent records with named purpose codes
Consent versioningSingle timestampNotice version tracked per consent record; re-consent on update
Withdrawal handlingAccount deletion or opt-out flagConsent store write + cascade to all pipelines immediately
Consent Manager integrationFirst-party onlyAPI endpoint accepting external Consent Manager signals
Audit trailApplication logImmutable Delta consent ledger with full event history

How Does Sinki.ai’s Consent Manager Deploy This Pattern?

Sinki.ai’s Consent Manager product deploys the full consent store pattern natively inside your Databricks workspace — zero data movement, zero external APIs routing your consent records through third-party infrastructure.

The deployment covers:

  • Delta consent ledger instantiation with DPDP-compliant schema → Pre-built, tested schema with all required fields and append-only write configuration
  • Automated consent-filtered view generation for all PII-tagged tables in Unity Catalog → No manual view creation — Audit Gap Finder tags identify the tables, Consent Manager generates the views
  • Multi-lingual notice engine with 22 scheduled Indian language support → Dynamic notice delivery without maintaining 22 separate notice templates manually
  • Revocation cascade workflow configuration → Event-driven, not scheduled — withdrawal takes effect on the next pipeline read
  • Consent Manager API endpoint for Phase 2 external integration → Ready for November 2026 Consent Manager framework activation from day one

The zero-data-movement principle is critical. Your consent records are the most sensitive records in your estate — they map personal identities to specific data processing activities. Sinki.ai’s Consent Manager keeps consent data inside your Databricks workspace. No consent record ever leaves your environment.

Final Verdict

The consent store pattern is not optional architecture for DPDP on Databricks. It is the technical layer that transforms a well-governed analytics platform into a legally compliant one. Without it, every pipeline run against personal data is a potential consent violation — regardless of how clean the pipeline is or how well the access controls are configured.

The Delta consent ledger, consent-filtered views, and revocation cascade workflow are buildable inside a standard Databricks workspace. Sinki.ai’s Consent Manager deploys this pattern without data egress, with multi-lingual notice support, and with Phase 2 Consent Manager API readiness out of the box.

Every day your silver-layer pipelines run without consent-filtered views is a day of unlawful processing under DPDP — at whatever volume your estate processes.

FAQ: DPDP Consent Management on Databricks

What is the consent store pattern on Databricks?

A Delta table-based consent ledger deployed inside your Databricks workspace that stores one record per consent event (principal ID, purpose, notice version, status, timestamps). Silver-layer tables use consent-filtered views that join to this ledger, ensuring only records with active consent are processed by downstream pipelines.

What does DPDP require for consent on a Databricks platform?

Purpose-specific consent records linked to every processing pipeline, multi-lingual notice delivery (22 scheduled Indian languages on request), consent version tracking, immediate revocation cascade across all pipelines, and from November 2026, an API endpoint accepting consent signals from registered third-party Consent Managers.

How do consent-filtered views work in Databricks for DPDP? 

A consent-filtered view is a Delta SQL view that joins a personal data table to the consent store on principal ID and purpose code, returning only records with consent_status = 'active' and no withdrawal timestamp. Downstream pipelines read from the view — never the raw table — so withdrawn consent automatically excludes a principal’s records without pipeline modification.

How does consent revocation work on Databricks under DPDP? 

Withdrawal writes a new record to the consent store Delta table setting consent_status = 'withdrawn' and populating the withdrawal_timestamp. Consent-filtered views immediately exclude the principal on the next pipeline read. An event-driven job then verifies exclusion across all active pipelines and evaluates whether erasure obligations are triggered.

What is the difference between access control and consent control on Databricks? 

Access control (Unity Catalog RBAC, column masking, row-level security) governs who can read a table. Consent control governs whether a specific data principal’s records can be processed by an authorized pipeline. Both are required for DPDP compliance — Unity Catalog access control alone is insufficient.

How does Phase 2 Consent Manager integration work on Databricks?

From November 13, 2026, Data Fiduciaries must accept consent signals from registered third-party Consent Manager platforms. On Databricks, this requires an API endpoint that receives external consent events and writes them to the internal consent store Delta table in the correct schema. Sinki.ai’s Consent Manager includes this endpoint as part of its deployment.

Sinki.ai’s Consent Manager Deploys the Full Consent Store Pattern

Delta ledger, consent-filtered views, revocation cascade, and Consent Manager API — natively inside your Databricks workspace with zero data movement.

DPDP Readiness Roadmap: Implementation and Audit Preparation (2026)

Your GDPR compliance program will not save you under DPDP. Not because the frameworks are incompatible — they share surface-level concepts like consent, data subject rights, and breach notification. They diverge precisely where it hurts most: at the data architecture level, where the infrastructure differences require new builds, not policy rewrites.

This is “The GDPR Assumption”: the belief that existing GDPR-compliant architecture can be repurposed for DPDP with minimal changes. It is the most expensive assumption an Indian enterprise with a European compliance history can make in 2026.

This guide covers the 4 structural differences between DPDP and GDPR — and exactly what each one demands from your data platform.

What you will master in this guide:

  • Why DPDP’s digital-only scope changes your ingestion layer design
  • Why the absence of legitimate interest as a lawful basis rebuilds your consent architecture from scratch
  • What the Consent Manager role means for your data pipeline orchestration
  • Why the absence of a sensitive data sub-category in DPDP changes your PII tagging strategy

For the full DPDP obligation framework, read the DPDP readiness on Databricks: complete guide 2026.

Difference 1: DPDP Is Digital-Only — GDPR Covers Everything

GDPR applies to all personal data, regardless of format: digital files, paper records, CCTV footage, physical HR files. DPDP applies exclusively to digital personal data — data that exists in digital form, or data that was originally non-digital and has been digitized.

This sounds like a narrower scope. It isn’t — not for a Databricks-based enterprise. Here’s why that matters.

Under GDPR, data architecture teams frequently argue that certain processing activities fall outside scope because they involve offline-only data paths. Under DPDP, that argument is not available. Every digital touchpoint is in scope. Every API call that moves personal data, every Spark job that processes it, every Delta table that stores it — all of it is subject to the Act the moment it exists in digital form.

The architectural implication: your bronze-layer ingestion pipeline must classify and tag personal data as DPDP-governed from the moment it arrives, without exception. The GDPR-era practice of selective scoping — treating some digital processing as out-of-scope because it connects to offline processes — does not apply under DPDP.

If your Databricks estate has even 1 digital touchpoint with Indian residents’ personal data, the Act applies. Full stop.

Difference 2: No Legitimate Interest — Consent Is the Only Basis That Scales

GDPR provides 6 lawful bases for processing: consent, contract, legal obligation, vital interests, public task, and legitimate interests. Most enterprise processing programs rely heavily on legitimate interests — it is the most flexible basis and the hardest for data subjects to challenge.

DPDP eliminates this. Consent is the dominant lawful basis. The Act does recognize limited exemptions — state functions, certain research and archiving activities, employment-related processing — but none of these substitute for consent in a commercial enterprise context.

The result: if your GDPR program was designed around legitimate interests for marketing, analytics, behavioral profiling, or cross-product personalization, every one of those processing activities requires a consent record under DPDP. There is no migration path — you build a consent architecture from scratch.

On Databricks, this means:

  • Every pipeline processing Indian residents’ personal data must join to a consent store before executing → A Spark job with no consent check is DPDP non-compliant regardless of technical quality
  • Consent must be purpose-specific — one consent record per processing purpose, not a blanket approval → “We use your data to improve our services” is not a DPDP-compliant consent statement
  • Consent must be withdrawable with the same ease as it was given — and withdrawal must cascade to downstream processing immediately → This requires event-driven pipeline architecture, not batch consent syncs

The absence of legitimate interest under DPDP is not a technicality. It rebuilds your processing justification layer from zero.

Difference 3: The Consent Manager Role — No GDPR Equivalent

GDPR introduced the concept of a Data Protection Officer (DPO). DPDP introduces something GDPR has no equivalent for: the Consent Manager — a government-registered intermediary that manages the relationship between Data Fiduciaries and Data Principals on consent.

Under DPDP Phase 2 (effective November 13, 2026), Data Fiduciaries must be able to interact with registered Consent Managers — third-party platforms that handle consent capture, storage, and revocation on behalf of data principals in an interoperable, standardized way.

Here is where the architecture diverges from GDPR entirely. A GDPR-compliant consent management tool is a first-party system — your organization builds and owns it. A DPDP-compliant consent infrastructure must be able to receive, process, and honor consent signals from external registered Consent Managers. Your pipeline cannot assume it owns the consent record.

FeatureGDPR Consent ManagementDPDP Consent Management
Consent infrastructureFirst-party, organization-ownedMust integrate with registered third-party Consent Managers
Lawful basis alternatives6 bases including legitimate interestConsent-dominant; narrow statutory exemptions only
Consent language requirementsLocal language encouraged22 scheduled Indian languages on request — mandatory
Withdrawal handlingMust be honoredMust be honored AND cascade to all downstream processing
Intermediary roleNo equivalentConsent Manager (registered, interoperable, Phase 2 mandatory)
ScopeAll personal data (digital and physical)Digital personal data only

The architectural implication for your Databricks estate: your consent store must be designed to receive consent signals from external Consent Managers — not just from your own application layer. A closed consent store that only accepts first-party signals is not DPDP-compliant after November 2026.

Difference 4: No Sensitive Data Sub-Category — How DPDP Changes PII Tagging Strategy

GDPR Article 9 establishes a special category of sensitive personal data — racial or ethnic origin, political opinions, religious beliefs, trade union membership, genetic data, biometric data, health data, sex life and sexual orientation — with heightened protection requirements and explicit consent obligations.

DPDP does not create a separate sensitive data category. All personal data is treated under the same consent and protection regime. The Central Government can designate specific categories of data for enhanced protection via notification, but as of 2026 no such category notification has been issued.

This changes your PII tagging strategy on Databricks in a non-obvious way. Under GDPR, most teams built a 2-tier classification: standard PII (less restrictive controls) and special category / sensitive PII (stricter controls, explicit consent required). Under DPDP, that tiering is irrelevant. Every field tagged as personal data carries the same consent and access control obligations. A name and phone number combination demands the same architectural controls as a health record.

The implication: if your Databricks Unity Catalog is configured with GDPR-era tiered PII policies — lighter controls on standard PII, heavier controls on special category data — you need to flatten that architecture for DPDP compliance. Every PII column, regardless of sensitivity, must be consent-linked and access-controlled at the same standard.

Your GDPR PII governance architecture is a starting point for DPDP. It is not a finish line.

The GDPR-to-DPDP Architecture Migration Checklist

This information has not been clearly laid out elsewhere — until now.

  • Audit all processing activities currently justified under legitimate interests — each requires a consent record under DPDP
  • Redesign consent store to accept external signals from registered Consent Managers (Phase 2 deadline: November 2026)
  • Flatten PII governance tiers in Unity Catalog — all personal data fields to carry the same DPDP control set
  • Add multi-lingual notice capability — DPDP requires notices in the data principal’s preferred language on request
  • Configure cascade revocation: consent withdrawal must trigger immediate downstream processing halt across all pipelines
  • Review digital-only scope definition — all digital personal data of Indian residents is in scope regardless of offline connections

Final Verdict

GDPR and DPDP share a vocabulary. They do not share an architecture. Consent-dominant processing, registered Consent Manager integration, uniform PII governance across all data categories, and cascade revocation workflows — none of these exist in a standard GDPR implementation. Every one of them requires new infrastructure.

The organizations that treat DPDP as a GDPR policy update will discover the gap at the worst possible time: under a DPBI investigation, with a 72-hour breach notification window running and a consent store that wasn’t designed for Indian enforcement.

Build the DPDP architecture deliberately. “The GDPR Assumption” has a ₹250 crore failure mode.

For the full technical architecture, read implementing DPDP readiness on Databricks: architecture reference.

FAQ: DPDP vs GDPR Differences

Is DPDP similar to GDPR? 

They share surface concepts — consent, data subject rights, breach notification, and data protection officers — but diverge structurally. DPDP is digital-only, consent-dominant (no legitimate interest basis), introduces a unique Consent Manager role, and applies a uniform standard to all personal data without a sensitive data sub-category.

Can existing GDPR compliance infrastructure be used for DPDP?

Partially. GDPR infrastructure covers DPO governance, rights request workflows, and breach notification protocols — all of which map to DPDP requirements. However, your consent architecture, PII tagging strategy, and any processing activity currently justified under legitimate interests must be rebuilt for DPDP compliance.

What is the difference between DPDP consent and GDPR consent?

Both require freely given, specific, informed, and withdrawable consent. The key difference is that DPDP makes consent the dominant lawful basis for commercial processing — GDPR’s legitimate interest basis does not exist under DPDP. DPDP also requires consent to be purpose-specific and to integrate with registered Consent Managers from November 2026.

What is the Consent Manager under DPDP and does GDPR have an equivalent?

The Consent Manager is a government-registered third-party intermediary that manages consent on behalf of data principals. GDPR has no equivalent — under GDPR, consent management is a first-party responsibility. Under DPDP, Data Fiduciaries must be able to interact with external Consent Managers from Phase 2 onwards.

How does DPDP’s digital-only scope differ from GDPR? 

GDPR applies to all personal data regardless of format — digital, paper, CCTV, physical records. DPDP applies only to digital personal data and to non-digital personal data that has been digitized. For Databricks-based enterprises, all digital processing of Indian residents’ personal data is in scope without exception.

Does DPDP have a sensitive data category like GDPR Article 9? 

No. DPDP does not create a separate sensitive data sub-category. The Central Government can designate specific data types for enhanced protection via notification, but no such categories have been notified as of 2026. All personal data carries the same consent and access control obligations under DPDP.

Which is stricter — DPDP or GDPR? 

On consent requirements, DPDP is stricter — it eliminates legitimate interest as a lawful basis, which GDPR’s enterprise programs heavily rely on. On breach notification, both impose 72-hour windows to regulators. Neither is categorically “stricter” — they create different obligations that require different infrastructure.

Migrate Your Data Architecture from GDPR to DPDP

Talk to Sinki.ai about migrating your data architecture from GDPR to DPDP — consent store redesign, Unity Catalog PII governance, and Consent Manager integration, all native to your Databricks workspace.