Functional Data Engineering - A Blueprint for adopting functional principles in data pipeline

Functional Data Engineering - A Blueprint for adopting functional principles
in data pipeline Ananth Packkildurai

Slack Data Engineer Zendesk Principal Data Engineer Creator Schemata -
Data Contract Platform Author Data Engineering Weekly

Key Principles of Functional Data Engineering Reproducibility Re-Computability 1 2

The Modern Data Cloud = LakeHouse & Warehouse State of
the Data 2023 Separation of storage and compute Unlimited scale data repository ACID transaction and mutation support

Schema Classification

Warehouse LakeHouse CREATE TABLE dw.user ( user_id BIGINT, user_name STRING,
created_at DATE ) PARTITION BY (ds STRING) # ds = date timestamp of the snapshot s3://dw/user/2022-12-20/<all users data at the time of snapshot> s3://dw/user/2022-12-21/<all users data at the time of snapshot> DateTime Partition Table Design

Entity Modeling Incremental Snapshot Full Snapshot 1 2

Entity Modeling CREATE OR REPLACE VIEW dw.user_latest AS SELECT user_id,
user_name, created_at, ds FROM dw.user WHERE ds =< current DateTime partition >;

Event Modeling

Key Challenges Late Arriving Data Data Deletion 1 2

Hour T1 Data Hour T2 Data Hour T3 Data Hour
T1 Data Hour T2 Data Hour T3 Data Hour T1 Data Hour T2 Data Tumbling Window Hour T1 Pipeline Hour T2 Pipeline Hour T3 Pipeline Sliding Window Apply Window Functions

Hour T1 Data Window Time Hour T1 pipeline starts Apply
Watermark Adopt Reconciliation Hour T1 pipeline Hour T2 pipeline Hour T3 pipeline Reconciliation pipeline

Choose your Conﬁdence Window of Correctness

Data Deletion Reprocessing Deletion Audit Log 1 2

https://schemata.app https://www.linkedin.com/in/ananthdurai [email protected]

Functional Data Engineering - A Blueprint for a...

Functional Data Engineering - A Blueprint for adopting functional principles in data pipeline

Ananth Packkildurai

More Decks by Ananth Packkildurai

Other Decks in Programming

Featured

Transcript