Slide 1

Slide 1 text

©2022 Databricks Inc. — All rights reserved The Data Lakehouse Data Natives Conference 2022 Dr Frank Munz @frankmunz

Slide 2

Slide 2 text

©2022 Databricks Inc. — All rights reserved Databricks The Data + AI Company Global adoption Over 7000 customers, from F500 to unicorns Inventor and pioneer of the data lakehouse Gartner recognized leader in both ● Database Management Systems ● Data Science and Machine Learning Platforms Creator of highly successful OSS data projects: Delta Lake, Apache Spark, Delta Sharing, and MLflow Raised over $3B in investment 4000+ employees across the globe

Slide 3

Slide 3 text

©2022 Databricks Inc. — All rights reserved Data, analytics, and AI enabled tech’s leaders to disrupt industries 3

Slide 4

Slide 4 text

©2022 Databricks Inc. — All rights reserved Most enterprises still struggle with data, analytics, and AI

Slide 5

Slide 5 text

©2022 Databricks Inc. — All rights reserved Realizing this requires two disparate, incompatible data platforms Data + AI Maturity Competitive Advantage Reports Clean Data Ad Hoc Queries Data Exploration Predictive Modeling Prescriptive Analytics Automated Decision Making Data Lake for AI Data Warehouse for BI Data Maturity Curve What will happen? What happened? 5 What will happen? How should we respond? Automatically make the best decision

Slide 6

Slide 6 text

©2022 Databricks Inc. — All rights reserved Business Intelligence SQL Analytics Data Science & ML Data Streaming Structured and unstructured files Data Lake Governance and Security Table ACLs Governance and Security Files and Blobs Copy subsets of data Disjointed and duplicative data silos Incompatible security and governance models Structured tables Data Warehouse Highly reliable and efficient All of the data and very adaptable Data Science & ML Data Streaming Incomplete support for use cases Business Intelligence SQL Analytics Governance and Security Files and Blobs and Table ACLs Structured tables and unstructured files There is no need to have two disparate platforms

Slide 7

Slide 7 text

©2022 Databricks Inc. — All rights reserved 7 Simple Unify your data warehousing and AI use cases on a single platform Multicloud One consistent data platform across clouds Open Built on open source and open standards Databricks Lakehouse Platform Lakehouse Platform Data Warehousing Data Engineering Data Science and ML Data Streaming All structured and unstructured data Cloud Data Lake Unity Catalog Fine-grained governance for data and AI Delta Lake Data reliability and performance

Slide 8

Slide 8 text

©2022 Databricks Inc. — All rights reserved Data Governance Data Warehousing Data Engineering Data Science and ML Data Streaming BI and Dashboards Machine Learning Data Science Consulting & SI Partners Databricks thrives within your modern data stack Data Pipelines Unity Catalog Delta Lake Cloud Data Lake Data Ingestion

Slide 9

Slide 9 text

©2021 Databricks Inc. — All rights reserved Supporting enterprises in every industry Healthcare & Life Sciences Media & Entertainment Financial Services Public Sector Energy & Utilities Digital Native Manufacturing & Logistics Retail & CPG

Slide 10

Slide 10 text

©2021 Databricks Inc. — All rights reserved An open approach to bringing data management and governance to data lakes Better reliability with transactions 48x faster data processing with indexing Data governance at scale with fine-grained access control lists Data Warehouse Data Lake

Slide 11

Slide 11 text

©2022 Databricks Inc. — All rights reserved All of Delta Lake 2.0 is open ACID Transactions Scalable Metadata Time Travel Open Source Unified Batch/Streaming Schema Evolution /Enforcement Audit History DML Operations OPTIMIZE Compaction OPTIMIZE ZORDER Change data feed Table Restore S3 Multi-cluster writes MERGE Enhancements Stream Enhancements Simplified Logstore Data Skipping via Column Stats Multi-part checkpoint writes Generated Columns Column Mapping Generated column support w/ partitioning Identity Columns Subqueries in deletes and updates Clones Iceberg to Delta converter Fast metadata only deletes Coming Soon!

Slide 12

Slide 12 text

©2022 Databricks Inc. — All rights reserved Databricks SQL Photon Serverless Eliminate compute infrastructure management Instant, Elastic Compute Zero Management Lower TCO Vectorized C++ exec engine Apache Spark API

Slide 13

Slide 13 text

©2022 Databricks Inc. — All rights reserved $100M saved in clinical trial costs 11% uplift in sales success with physicians Challenge Amgen is relentlessly focused on invention and optimization, but disjointed data platforms prevented their departments from collaborating to uncover new avenues of revenue growth with machine learning Solution With an open Databricks lakehouse, Amgen delivered almost 300 cross-functional analytics and machine learning projects using a wide variety of tools in the first year to improve drug delivery and patient outcomes $6.4M saved in infrastructure costs Impact Amgen 13 ©2022 Databricks Inc. — All rights reserved

Slide 14

Slide 14 text

©2022 Databricks Inc. — All rights reserved ©2022 Databricks Inc. — All rights reserved $50M in revenue from improved credit risk approval models $53M in revenue from better cross-selling promotions Challenge Goldman Sachs wanted the Apple Card to reach as many customers as possible without significantly increasing risk, but their data architecture could not easily support the real-time machine learning required to make it happen Solution Using Databricks, Goldman Sachs deployed a lakehouse that processes 30TB a day across a large portfolio of data providers to accurately predict constantly evolving lender risk profiles Impact

Slide 15

Slide 15 text

©2022 Databricks Inc. — All rights reserved Demo Time!

Slide 16

Slide 16 text

©2022 Databricks Inc. — All rights reserved

Slide 17

Slide 17 text

©2022 Databricks Inc. — All rights reserved Delta Live Tables Cleanse and Transform Tweets

Slide 18

Slide 18 text

©2022 Databricks Inc. — All rights reserved Tweepy API: Streaming Twitter Feed

Slide 19

Slide 19 text

©2022 Databricks Inc. — All rights reserved Auto Loader: Streaming Data Ingestion Ingest Streaming Data with Automatic Schema Detection

Slide 20

Slide 20 text

©2022 Databricks Inc. — All rights reserved Declarative, auto scaling Data Pipelines in SQL CTAS Pattern: Create Table As Select …

Slide 21

Slide 21 text

©2022 Databricks Inc. — All rights reserved Declarative, auto scaling Data Pipelines

Slide 22

Slide 22 text

©2022 Databricks Inc. — All rights reserved DWH / SQL Persona

Slide 23

Slide 23 text

©2022 Databricks Inc. — All rights reserved Hugging Face -> Sentiment Analysis (POS, NEG, NEU) + probability

Slide 24

Slide 24 text

©2022 Databricks Inc. — All rights reserved 24

Slide 25

Slide 25 text

©2022 Databricks Inc. — All rights reserved 25

Slide 26

Slide 26 text

©2022 Databricks Inc. — All rights reserved Built-in Orchestration for all Tasks

Slide 27

Slide 27 text

©2022 Databricks Inc. — All rights reserved Watch the live demo from Data AI Summit Databricks.com / Watch Demos 27 Demo recording Notebooks on GitHub Hot off the press: Kafka+DLT BLOG

Slide 28

Slide 28 text

@frankmunz https://fmunz.medium.com https://www.linkedin.com/in/frankmunz https://speakerdeck.com/fmunz www.databricks.com/ try-databricks