Slide 1

Slide 1 text

Architect your Data for the 21st Century Hasta la vista, data lake? Frank Munz @frankmunz

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Big Data Processing & Tools

Slide 4

Slide 4 text

Big Data Processing ● Supercomputers / COTS cluster HPC with PVM/MPI ● Hadoop cluster map/reduce ● Apache Spark Python / Scala or SQL Streaming & Batch on-premises / multi-cloud

Slide 5

Slide 5 text

Apache Spark: Dataframes and SQL

Slide 6

Slide 6 text

Apache Spark: Batch and Streaming

Slide 7

Slide 7 text

Scala DataFrame Python DataFrame pandas APIs Different APIs, Powered by the Same Engine SQL Language

Slide 8

Slide 8 text

Data in AI / Machine Learning

Slide 9

Slide 9 text

Machine Learning: Classification twitter: teenybiscuit

Slide 10

Slide 10 text

Machine Learning: Forecasting

Slide 11

Slide 11 text

BI / Data Analytics

Slide 12

Slide 12 text

Analytics

Slide 13

Slide 13 text

great for BI SQL only, expensive, poor for ML (future) cheap, open formats poor for BI (past), data swamps? Data Warehouse Data Lake

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Data lakes become data swamps RELIABILITY & QUALITY PERFORMANCE & LATENCY GOVERNANCE

Slide 16

Slide 16 text

Delta Lake solves the challenges with data lakes RELIABILITY & QUALITY PERFORMANCE & LATENCY GOVERNANCE ACID transactions Advanced indexing & caching Fine-grained access control

Slide 17

Slide 17 text

Lakehouse: Delta Lake Publication https://databricks.com/wp-content/ uploads/2020/12/cidr_lakehouse.pdf

Slide 18

Slide 18 text

Lakehouse adoption across industries

Slide 19

Slide 19 text

Demo 1 Getting Started with Delta.io and CLI

Slide 20

Slide 20 text

Demo 2 Jupyter Notebook: Delta Tables, UPDATEs, time travel etc.

Slide 21

Slide 21 text

Delta Sharing Delta Lake Table Delta Sharing Server Delta Sharing Protocol Data Provider Data Recipient … Any Sharing Client Access permissions Delta.io/sharing VIdeo

Slide 22

Slide 22 text

Delta Sharing from Pandas / Jupyter Notebook

Slide 23

Slide 23 text

Demo 3: Delta Sharing

Slide 24

Slide 24 text

Multicloud Databricks Workspace ● Delta Live Tables ● Unity Catalog ● MLFlow ● AutoML ● Feature Store

Slide 25

Slide 25 text

Hasta la vista, data lake? Delta lake fixes the problems with a data lake and provides reliability, performance and governance. The Lakehouse is a unified platform for Data, Analytics & ML Check it out: delta.io or databricks.com/try-databricks

Slide 26

Slide 26 text

How to engage? delta.io delta-users Slack delta-users Google Group Delta Lake YouTube channel Delta Lake GitHub Issues Delta Lake RS Bi-weekly meetings

Slide 27

Slide 27 text

https://hackernoon.com/top-7-announcements-from- data-and-ai-summit-202-hd2735c0 Summary Posting Slice&DAIS 2021 EMEA Event https://www.meetup.com/Spark-Munich/events/278396185/

Slide 28

Slide 28 text

@frankmunz https://fmunz.medium.com https://github.com/fmunz https://www.linkedin.com/in/frankmunz https://speakerdeck.com/fmunz