Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Evolution of Data Architectures

Marketing OGZ
September 20, 2022
250

Evolution of Data Architectures

Marketing OGZ

September 20, 2022
Tweet

Transcript

  1. ©2022 Databricks Inc. — All rights reserved Evolution of Data

    Architectures and how to build a Lakehouse Big Data Expo 15/09/2022 Ivo Everts, Strategic Architect
  2. ©2022 Databricks Inc. — All rights reserved Evolution of Data

    Architectures And how to build a Lakehouse 2001: ML Research 2014: DS Consultancy 2020: Field Engineering I. Simplifying Data+AI with Databricks Lakehouse II. Customer stories I’ll make it quick and punchy for a crisp conference end! About me Agenda 2
  3. ©2022 Databricks Inc. — All rights reserved Databricks The Data

    + AI Company Global adoption Over 7000 customers, from F500 to unicorns Inventor and pioneer of the data lakehouse Gartner recognized leader in both • Database Management Systems • Data Science and Machine Learning Platforms Creator of highly successful OSS data projects: Delta Lake, Apache Spark, and MLflow Raised over $3B in investment 3000+ employees across the globe
  4. ©2022 Databricks Inc. — All rights reserved Data, analytics, and

    AI enabled tech’s leaders to disrupt industries 4
  5. ©2022 Databricks Inc. — All rights reserved Tech leaders are

    to the right of the Data Maturity Curve Data + AI Maturity Competitive Advantage Clean Data Reports Ad Hoc Queries Data Exploration Predictive Modeling Prescriptive Analytics Automated Decision Making What will happen? How should we respond? What happened? Automatically make the best decision From hindsight to foresight 5
  6. ©2022 Databricks Inc. — All rights reserved Realizing this requires

    two disparate, incompatible data platforms Data + AI Maturity Competitive Advantage Reports Clean Data Ad Hoc Queries Data Exploratio n Predictive Modeling Prescriptive Analytics Automated Decision Making Data Lake for AI Data Warehouse for BI Data Maturity Curve What will happen? What happened? 7
  7. ©2022 Databricks Inc. — All rights reserved Realizing this requires

    two disparate, incompatible data platforms Unstructured files: logs, text, images, video, Data Lake Governance and Security Table ACLs Data Science & ML Governance and Security Files and Blobs Data Streaming Business Intelligence SQL Analytics Copy subsets of data Structured tables Data Warehouse
  8. ©2022 Databricks Inc. — All rights reserved Business Intelligence SQL

    Analytics Data Science & ML Data Streaming Realizing this requires two disparate, incompatible data platforms Unstructured files: logs, text, images, video, Data Lake Governance and Security Table ACLs Governance and Security Files and Blobs Copy subsets of data Disjointed and duplicative data silos Incompatible security and governance models Incomplete support for use cases Structured tables Data Warehouse
  9. ©2022 Databricks Inc. — All rights reserved Structured tables Data

    Warehouse Business Intelligence SQL Analytics Data Science & ML Data Streaming Unstructured files: logs, text, images, video, Data Lake Governance and Security Table ACLs Governance and Security Files and Blobs Copy subsets of data Disjointed and duplicative data silos Incomplete support for use cases Incompatible security and governance models This is too complex and expensive making it hard to achieve the full potential of Data, Analytics, and AI Realizing this requires two disparate, incompatible data platforms 10
  10. ©2022 Databricks Inc. — All rights reserved Structured tables Data

    Warehouse Business Intelligence SQL Analytics Data Science & ML Data Streaming Realizing this requires two disparate, incompatible data platforms Unstructured files: logs, text, images, video, Data Lake Governance and Security Table ACLs Governance and Security Files and Blobs Copy subsets of data Disjoint and duplicative data silos Incomplete support for use cases Incompatible security and governance models Disjointed and duplicative data silos Incomplete support for use cases Incompatible security and governance models Disjointed and duplicative data silos Lakehouse Platform Incomplete support for use cases All machine learning, SQL, BI, and streaming use cases An open and reliable data platform to efficiently handle all data types Incompatible security and governance models One security and governance approach for all data assets on all clouds 11
  11. ©2022 Databricks Inc. — All rights reserved 12 Simple Unify

    your data warehousing and AI use cases on a single platform Open Built on open source and open standards Multicloud One consistent data platform across clouds Databricks Lakehouse Platform Lakehouse Platform Data Warehousing Data Engineering Data Science and ML Data Streaming All structured and unstructured data Cloud Data Lake Unity Catalog Fine-grained governance for data and AI Delta Lake Data reliability and performance
  12. ©2022 Databricks Inc. — All rights reserved Data Governance Data

    Warehousing Data Engineering Data Science and ML Data Streaming BI and Dashboards Machine Learning Data Science Consulting & SI Partners Databricks thrives within your modern data stack Data Pipelines Unity Catalog Delta Lake Cloud Data Lake Data Ingestion
  13. ©2022 Databricks Inc. — All rights reserved Data engineering workloads

    on Databricks • Data orchestration through Databricks Workflows • Delta Live Tables manage your full data pipelines • Simplifies data engineering with a curated data lake approach through Delta Lake
  14. ©2022 Databricks Inc. — All rights reserved ML & data

    science workloads on Databricks Machine Learning • Model registry, reproducibility, productionization • Leverages Delta Lake for reproducibility • AutoML for citizen data scientists Data Science • Collaborative notebooks and dashboards for interactive analysis • Native support for Python, Java, R, Scala • Delta Lake data natively supported
  15. ©2022 Databricks Inc. — All rights reserved SQL workloads on

    Databricks • Great performance and concurrency for BI and SQL workloads on Delta Lake • Native SQL interface for analysts • Support for BI tools to directly query your most recent data in Delta Lake
  16. ©2022 Databricks Inc. — All rights reserved Unity Catalog for

    Lakehouse Governance Govern and manage all data assets • Warehouse, Tables, Columns • Data Lake, Files • Machine Learning Models • Dashboards and Notebooks Capabilities • Data lineage • Attribute-based access control • Security policies • Table or column level tags • Auditing • Data sharing
  17. ©2022 Databricks Inc. — All rights reserved Shell ➔ Lakehouse

    central to global Multi-cloud Enterprise Data Platform ➔ Center of Excellence central to way of working ➔ Instrumental to the Energy Transition Energy Transition Campus Amsterdam
  18. ©2021 Databricks Inc. — All rights reserved Seminal work: PI

    in the Sky DATA PREPARATION 70 billion rows of sensor data ingested, enriched and prepared for data science (5 years from Europe’s largest refinery). SQL Real-time insights, monitoring and alerting with ad-hoc and scheduled SQL queries MODEL MANAGEMENT Every model is tracked in MLflow, providing a record of how each was trained, and a model registry for easy selection and deployment. MODEL TRAINING 160,000 models trained using Databricks optimised Spark runtime (one for each sensor).
  19. ©2022 Databricks Inc. — All rights reserved 21 “The usage

    of Databricks over the years has broadened significantly. We started out using Databricks as a big data and AI platform but the scope has broadened. We have an entirely different class of citizen engineers and data scientists who are using it as a modern business intelligence tool to make smarter business decisions.” Dan Jeavons - VP Computational Science & Digital Innovation at Shell
  20. ©2022 Databricks Inc. — All rights reserved ABN AMRO Databricks

    Lakehouse for Data Mesh and Detecting Financial Crime “Databricks has provided one platform for our data and analytics teams to access and share data across ABN AMRO, delivering ML-based solutions that drive automation and insight throughout the company.” Stefan Groot - Engineering Manager | AI | ML | BI at ABN AMRO 22
  21. ©2022 Databricks Inc. — All rights reserved Case Study D

    T A P D & T A & P Feature Store The only option for a governed and integral MLops solution 🔥 link
  22. ©2022 Databricks Inc. — All rights reserved Data Mesh Case

    Study link Domain 1 DS/DE Domain 2 DS/DE Domain 3 DS/DE Domain 4 DS/DE Domain 5 SQL
  23. ©2021 Databricks Inc. — All rights reserved 25 Human Longevity

    Inc. Spark NLP Roche automates knowledge extraction from pathology reports with Spark NLP More goodness
  24. ©2022 Databricks Inc. — All rights reserved 26 Simple Unify

    your data warehousing and AI use cases on a single platform Open Built on open source and open standards Multicloud One consistent data platform across clouds Databricks Lakehouse Platform Lakehouse Platform Data Warehousing Data Engineering Data Science and ML Data Streaming All structured and unstructured data Cloud Data Lake Unity Catalog Fine-grained governance for data and AI Delta Lake Data reliability and performance
  25. ©2022 Databricks Inc. — All rights reserved Building your Lakehouse

    Comprehensive investment into your success 27 Supported by 24/7/365 global, production operations at scale Your success Solution Accelerators In-person and Virtual Training Co-located Professional Services