Architect your Data for the 21st Century: Hasta la vista, Data Lake?

Architect your Data for the 21st Century Hasta la vista,
data lake? Frank Munz @frankmunz

Big Data Processing & Tools

Big Data Processing • Supercomputers / COTS cluster HPC with
PVM/MPI • Hadoop cluster map/reduce • Apache Spark Python / Scala or SQL Streaming & Batch on-premises / multi-cloud

Apache Spark: Dataframes and SQL

Apache Spark: Batch and Streaming

Scala DataFrame Python DataFrame pandas APIs Different APIs, Powered by
the Same Engine SQL Language

Data in AI / Machine Learning

Machine Learning: Classiﬁcation twitter: teenybiscuit

Machine Learning: Forecasting

BI / Data Analytics

Analytics

great for BI SQL only, expensive, poor for ML (future)
cheap, open formats poor for BI (past), data swamps? Data Warehouse Data Lake

Data lakes become data swamps RELIABILITY & QUALITY PERFORMANCE &
LATENCY GOVERNANCE

Delta Lake solves the challenges with data lakes RELIABILITY &
QUALITY PERFORMANCE & LATENCY GOVERNANCE ACID transactions Advanced indexing & caching Fine-grained access control

Lakehouse: Delta Lake Publication https://databricks.com/wp-content/ uploads/2020/12/cidr_lakehouse.pdf

Lakehouse adoption across industries

Demo 1 Getting Started with Delta.io and CLI

Demo 2 Jupyter Notebook: Delta Tables, UPDATEs, time travel etc.

Delta Sharing Delta Lake Table Delta Sharing Server Delta Sharing
Protocol Data Provider Data Recipient … Any Sharing Client Access permissions Delta.io/sharing VIdeo

Delta Sharing from Pandas / Jupyter Notebook

Demo 3: Delta Sharing

Multicloud Databricks Workspace • Delta Live Tables • Unity Catalog
• MLFlow • AutoML • Feature Store

Hasta la vista, data lake? Delta lake ﬁxes the problems
with a data lake and provides reliability, performance and governance. The Lakehouse is a uniﬁed platform for Data, Analytics & ML Check it out: delta.io or databricks.com/try-databricks

How to engage? delta.io delta-users Slack delta-users Google Group Delta
Lake YouTube channel Delta Lake GitHub Issues Delta Lake RS Bi-weekly meetings

https://hackernoon.com/top-7-announcements-from- data-and-ai-summit-202-hd2735c0 Summary Posting Slice&DAIS 2021 EMEA Event https://www.meetup.com/Spark-Munich/events/278396185/

@frankmunz https://fmunz.medium.com https://github.com/fmunz https://www.linkedin.com/in/frankmunz https://speakerdeck.com/fmunz

Architect your Data for the 21st Century: Hasta...

Architect your Data for the 21st Century: Hasta la vista, Data Lake?

Frank Munz

More Decks by Frank Munz

Other Decks in Technology

Featured

Transcript