Slide 1

Slide 1 text

©2022 Databricks Inc. — All rights reserved Evolution of Data Architectures and how to build a Lakehouse Big Data Expo 15/09/2022 Ivo Everts, Strategic Architect

Slide 2

Slide 2 text

©2022 Databricks Inc. — All rights reserved Evolution of Data Architectures And how to build a Lakehouse 2001: ML Research 2014: DS Consultancy 2020: Field Engineering I. Simplifying Data+AI with Databricks Lakehouse II. Customer stories I’ll make it quick and punchy for a crisp conference end! About me Agenda 2

Slide 3

Slide 3 text

©2022 Databricks Inc. — All rights reserved Databricks The Data + AI Company Global adoption Over 7000 customers, from F500 to unicorns Inventor and pioneer of the data lakehouse Gartner recognized leader in both ● Database Management Systems ● Data Science and Machine Learning Platforms Creator of highly successful OSS data projects: Delta Lake, Apache Spark, and MLflow Raised over $3B in investment 3000+ employees across the globe

Slide 4

Slide 4 text

©2022 Databricks Inc. — All rights reserved Data, analytics, and AI enabled tech’s leaders to disrupt industries 4

Slide 5

Slide 5 text

©2022 Databricks Inc. — All rights reserved Tech leaders are to the right of the Data Maturity Curve Data + AI Maturity Competitive Advantage Clean Data Reports Ad Hoc Queries Data Exploration Predictive Modeling Prescriptive Analytics Automated Decision Making What will happen? How should we respond? What happened? Automatically make the best decision From hindsight to foresight 5

Slide 6

Slide 6 text

©2022 Databricks Inc. — All rights reserved Most enterprises still struggle with data, analytics, and AI

Slide 7

Slide 7 text

©2022 Databricks Inc. — All rights reserved Realizing this requires two disparate, incompatible data platforms Data + AI Maturity Competitive Advantage Reports Clean Data Ad Hoc Queries Data Exploratio n Predictive Modeling Prescriptive Analytics Automated Decision Making Data Lake for AI Data Warehouse for BI Data Maturity Curve What will happen? What happened? 7

Slide 8

Slide 8 text

©2022 Databricks Inc. — All rights reserved Realizing this requires two disparate, incompatible data platforms Unstructured files: logs, text, images, video, Data Lake Governance and Security Table ACLs Data Science & ML Governance and Security Files and Blobs Data Streaming Business Intelligence SQL Analytics Copy subsets of data Structured tables Data Warehouse

Slide 9

Slide 9 text

©2022 Databricks Inc. — All rights reserved Business Intelligence SQL Analytics Data Science & ML Data Streaming Realizing this requires two disparate, incompatible data platforms Unstructured files: logs, text, images, video, Data Lake Governance and Security Table ACLs Governance and Security Files and Blobs Copy subsets of data Disjointed and duplicative data silos Incompatible security and governance models Incomplete support for use cases Structured tables Data Warehouse

Slide 10

Slide 10 text

©2022 Databricks Inc. — All rights reserved Structured tables Data Warehouse Business Intelligence SQL Analytics Data Science & ML Data Streaming Unstructured files: logs, text, images, video, Data Lake Governance and Security Table ACLs Governance and Security Files and Blobs Copy subsets of data Disjointed and duplicative data silos Incomplete support for use cases Incompatible security and governance models This is too complex and expensive making it hard to achieve the full potential of Data, Analytics, and AI Realizing this requires two disparate, incompatible data platforms 10

Slide 11

Slide 11 text

©2022 Databricks Inc. — All rights reserved Structured tables Data Warehouse Business Intelligence SQL Analytics Data Science & ML Data Streaming Realizing this requires two disparate, incompatible data platforms Unstructured files: logs, text, images, video, Data Lake Governance and Security Table ACLs Governance and Security Files and Blobs Copy subsets of data Disjoint and duplicative data silos Incomplete support for use cases Incompatible security and governance models Disjointed and duplicative data silos Incomplete support for use cases Incompatible security and governance models Disjointed and duplicative data silos Lakehouse Platform Incomplete support for use cases All machine learning, SQL, BI, and streaming use cases An open and reliable data platform to efficiently handle all data types Incompatible security and governance models One security and governance approach for all data assets on all clouds 11

Slide 12

Slide 12 text

©2022 Databricks Inc. — All rights reserved 12 Simple Unify your data warehousing and AI use cases on a single platform Open Built on open source and open standards Multicloud One consistent data platform across clouds Databricks Lakehouse Platform Lakehouse Platform Data Warehousing Data Engineering Data Science and ML Data Streaming All structured and unstructured data Cloud Data Lake Unity Catalog Fine-grained governance for data and AI Delta Lake Data reliability and performance

Slide 13

Slide 13 text

©2022 Databricks Inc. — All rights reserved Data Governance Data Warehousing Data Engineering Data Science and ML Data Streaming BI and Dashboards Machine Learning Data Science Consulting & SI Partners Databricks thrives within your modern data stack Data Pipelines Unity Catalog Delta Lake Cloud Data Lake Data Ingestion

Slide 14

Slide 14 text

©2022 Databricks Inc. — All rights reserved Data engineering workloads on Databricks • Data orchestration through Databricks Workflows • Delta Live Tables manage your full data pipelines • Simplifies data engineering with a curated data lake approach through Delta Lake

Slide 15

Slide 15 text

©2022 Databricks Inc. — All rights reserved ML & data science workloads on Databricks Machine Learning • Model registry, reproducibility, productionization • Leverages Delta Lake for reproducibility • AutoML for citizen data scientists Data Science • Collaborative notebooks and dashboards for interactive analysis • Native support for Python, Java, R, Scala • Delta Lake data natively supported

Slide 16

Slide 16 text

©2022 Databricks Inc. — All rights reserved SQL workloads on Databricks • Great performance and concurrency for BI and SQL workloads on Delta Lake • Native SQL interface for analysts • Support for BI tools to directly query your most recent data in Delta Lake

Slide 17

Slide 17 text

©2022 Databricks Inc. — All rights reserved Unity Catalog for Lakehouse Governance Govern and manage all data assets • Warehouse, Tables, Columns • Data Lake, Files • Machine Learning Models • Dashboards and Notebooks Capabilities • Data lineage • Attribute-based access control • Security policies • Table or column level tags • Auditing • Data sharing

Slide 18

Slide 18 text

©2022 Databricks Inc. — All rights reserved Customer stories 18

Slide 19

Slide 19 text

©2022 Databricks Inc. — All rights reserved Shell ➔ Lakehouse central to global Multi-cloud Enterprise Data Platform ➔ Center of Excellence central to way of working ➔ Instrumental to the Energy Transition Energy Transition Campus Amsterdam

Slide 20

Slide 20 text

©2021 Databricks Inc. — All rights reserved Seminal work: PI in the Sky DATA PREPARATION 70 billion rows of sensor data ingested, enriched and prepared for data science (5 years from Europe’s largest refinery). SQL Real-time insights, monitoring and alerting with ad-hoc and scheduled SQL queries MODEL MANAGEMENT Every model is tracked in MLflow, providing a record of how each was trained, and a model registry for easy selection and deployment. MODEL TRAINING 160,000 models trained using Databricks optimised Spark runtime (one for each sensor).

Slide 21

Slide 21 text

©2022 Databricks Inc. — All rights reserved 21 “The usage of Databricks over the years has broadened significantly. We started out using Databricks as a big data and AI platform but the scope has broadened. We have an entirely different class of citizen engineers and data scientists who are using it as a modern business intelligence tool to make smarter business decisions.” Dan Jeavons - VP Computational Science & Digital Innovation at Shell

Slide 22

Slide 22 text

©2022 Databricks Inc. — All rights reserved ABN AMRO Databricks Lakehouse for Data Mesh and Detecting Financial Crime “Databricks has provided one platform for our data and analytics teams to access and share data across ABN AMRO, delivering ML-based solutions that drive automation and insight throughout the company.” Stefan Groot - Engineering Manager | AI | ML | BI at ABN AMRO 22

Slide 23

Slide 23 text

©2022 Databricks Inc. — All rights reserved Case Study D T A P D & T A & P Feature Store The only option for a governed and integral MLops solution 🔥 link

Slide 24

Slide 24 text

©2022 Databricks Inc. — All rights reserved Data Mesh Case Study link Domain 1 DS/DE Domain 2 DS/DE Domain 3 DS/DE Domain 4 DS/DE Domain 5 SQL

Slide 25

Slide 25 text

©2021 Databricks Inc. — All rights reserved 25 Human Longevity Inc. Spark NLP Roche automates knowledge extraction from pathology reports with Spark NLP More goodness

Slide 26

Slide 26 text

©2022 Databricks Inc. — All rights reserved 26 Simple Unify your data warehousing and AI use cases on a single platform Open Built on open source and open standards Multicloud One consistent data platform across clouds Databricks Lakehouse Platform Lakehouse Platform Data Warehousing Data Engineering Data Science and ML Data Streaming All structured and unstructured data Cloud Data Lake Unity Catalog Fine-grained governance for data and AI Delta Lake Data reliability and performance

Slide 27

Slide 27 text

©2022 Databricks Inc. — All rights reserved Building your Lakehouse Comprehensive investment into your success 27 Supported by 24/7/365 global, production operations at scale Your success Solution Accelerators In-person and Virtual Training Co-located Professional Services

Slide 28

Slide 28 text

©2022 Databricks Inc. — All rights reserved Thank you [email protected]