Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Warehouse or Data Lake: Which one do I use?

Ahana
August 19, 2021

Data Warehouse or Data Lake: Which one do I use?

In this webinar, you’ll hear from industry analyst John Santaferraro and Ahana cofounder and CPO Dipti Borkar who will discuss the data landscape and how many companies are thinking about their data warehouse/data lake strategy. They’ll share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lake.

Ahana

August 19, 2021
Tweet

More Decks by Ahana

Other Decks in Technology

Transcript

  1. Today’s Speakers John Santaferraro is an industry analyst with Ferraro

    Consulting. He has 26 years of experience in data and analytics, including research, implementation, and product marketing. Dipti Borkar is the cofounder and CPO at Ahana and an expert in relational and non-relational database engines. She’s also Presto Foundation Outreach Chair. © 2021 Enterprise Management Associates, Inc. 2 John Santaferraro Industry Analyst Ferraro Consulting Dipti Borkar Cofounder & Chief Product Officer Ahana
  2. © 2021 Enterprise Management Associates, Inc. 3 Event Recording An

    archived version of the event recording will be available at www.ahana.com Questions Log questions in the chat panel located on the lower left-hand corner of your screen Questions will be addressed during the Q&A session of the event ? | @ema_research
  3. Agenda Traditional Data Lakes and Data Warehouses 1 2 3

    Modern Data Lakes and Data Warehouses Merging the Data Lake and the Data Warehouse 4 Use Cases for Modern Platforms 5 The Uber Technical Case Study 6 Questions & Answers © 2021 Enterprise Management Associates, Inc. 4 | @ema_research
  4. The Traditional Data Warehouse © 2021 Enterprise Management Associates, Inc.

    5 • Relational Database • Columnar Structure • In-Database Analytics • Structured Data • Modeled Data • Extract, Transform, Load • SQL Access • Expensive • Difficult to Manage • Costly to Maintain • Limited Data • Limited Access | @ema_research
  5. The Traditional Data Lake © 2021 Enterprise Management Associates, Inc.

    6 • File System Data Store • Semi-Structured Data • Ingestion • Discovery • Data Science • Notebook and Python Access • Less expensive, but… • Limited Performance • Limited Analytics • Limited SQL Access • Difficult to Govern | @ema_research
  6. The Drivers Behind Modernization © 2021 Enterprise Management Associates, Inc.

    7 Digital Transformation Real Time Events Automation of Everything More Data Fast Data Smart Data
  7. The Modern Data Warehouse v. The Modern Data Lake •

    Cloud-First • In-Memory Capabilities • Complex Data Types • Separate Storage & Compute • Expanded Analytics • Improved Performance • Storage Options • SQL Access © 2021 Enterprise Management Associates, Inc. 8 | @ema_research • Cloud-First • In-Memory Capabilities • Columnar Data Types • Separate Storage & Compute • Expanded Analytics • Improved Performance • Storage Options • SQL Access
  8. © 2021 Enterprise Management Associates, Inc. 9 From Data to

    Insight - The SQL Query Engine Data SQL Query Processing Data Warehouse Cloud Data Lake Open Source Data Warehouse SQL Query Processing 1-10 TB 1TB -> PB Reporting & Dashboarding Data Science In-data lake transformation
  9. Use Cases Criteria for the Modern Data Lake and Data

    Warehouse Modern Data Lake 1. High-Performance, Data Intensive 2. Lower Cost Storage 3. Massive Scale 4. Diverse Data Types 5. Diverse Analytical Types 6. Diverse Access Types 7. Enterprise Capabilities 8. High Concurrency of Analytics Modern Data Warehouse 1. High-Performance, Compute Intensive 2. Lower Cost Storage 3. Enterprise Capabilities 4. High Concurrency of Analytics 5. Diverse Analytical Types 6. Massive Scale 7. Diverse Data Types 8. Diverse Access Types © 2021 Enterprise Management Associates, Inc. 10 | @ema_research
  10. Merging the Cloud Data Warehouse and the Cloud Data Lake

    © 2021 Enterprise Management Associates, Inc. 11 1. From two platforms to one 2. From two resource types to one 3. From self-managed to fully managed 4. From complex query joins to simple 5. From disparate to connected intelligence
  11. Merging the Data Warehouse and the Data Lake with a

    Distributed Query Engine © 2021 Enterprise Management Associates, Inc. 12 1. SQL Access 2. Data Lake and Data Warehouse Access 3. Unified Analytics 4. Distributed Queries 5. Limitless Scale 6. Complex Data Types • Leverage Resources • Better Insight • More Use Cases • Leverage Platforms • Remove Limits • Amplified Insight
  12. Considerations for Any Unified Analytics Decision © 2021 Enterprise Management

    Associates, Inc. 13 | @ema_research Data Analytics Users Platform Cloud Enterprise Business Cost
  13. Considerations for Any Unified Analytics Decision © 2021 Enterprise Management

    Associates, Inc. 14 | @ema_research Sample Size or Source Info goes here: Data Structured Semi- Structured Real Time Structured Complex Data Types Textual Streaming
  14. Considerations for Any Unified Analytics Decision © 2021 Enterprise Management

    Associates, Inc. 15 | @ema_research Data Analytics Users Platform SQL Python Notebook Search
  15. Considerations for Any Unified Analytics Decision © 2021 Enterprise Management

    Associates, Inc. 16 | @ema_research Data Analytics Users Platform Engineer Analyst Scientist Business
  16. Considerations for Any Unified Analytics Decision © 2021 Enterprise Management

    Associates, Inc. 17 | @ema_research Data Analytics Users Platform Cloud Enterprise Business Cost
  17. Considerations for Any Unified Analytics Decision © 2021 Enterprise Management

    Associates, Inc. 18 | @ema_research Elasticity Scale Mobility Globality Cloud Enterprise Business Cost
  18. Considerations for Any Unified Analytics Decision © 2021 Enterprise Management

    Associates, Inc. 19 | @ema_research Security Privacy Governance Unification Cloud Enterprise Business Cost
  19. Considerations for Any Unified Analytics Decision © 2021 Enterprise Management

    Associates, Inc. 20 | @ema_research Semantics Logic Value Optimization Cloud Enterprise Business Cost
  20. Considerations for Any Unified Analytics Decision © 2021 Enterprise Management

    Associates, Inc. 21 | @ema_research Forecast Containment Chargeback Scale Cloud Enterprise Business Cost
  21. Considerations for Any Unified Analytics Decision © 2021 Enterprise Management

    Associates, Inc. 22 | @ema_research Data Analytics Users Platform Cloud Enterprise Business Cost
  22. Uber: A User and Developer of Presto Presto and Hyperscale

    Analytics 10,000 cities, 18+ million trips per day, 256 petabytes of stored data, 35 petabytes of new data every day, 12,000 monthly active users of analytics running more than 400,000 queries every single day Presto and Enterprise Readiness Automation, Workload Management, Complex Queries, Security Presto and Technical Value Extended Analytics: Analytical Functions, Complex Data Types Expanded Use Cases: ETL, Data Science, Exploration, Online Analytical Processing, Federated Queries Presto and the Future Realtime, Exabyte Scale, Sampling, Optimization Project Aria, Project Presto Unlimited, Fireball © 2021 Enterprise Management Associates, Inc. 23 | @ema_research
  23. 25

  24. 26 How Ahana Cloud works? ~ 30 mins to create

    the compute plane https://app.ahana.cloud/signup Create Presto Clusters in your account
  25. 27 Ahana Cloud – Reference Architecture • Distributed SQL engine

    with proven scalability • Interactive ANSI SQL queries • Query data where it lives with Federated Connectors (no ETL) • High concurrency • Separation of compute and storage
  26. Questions? Please log your questions in the Q&A window ©

    2021 Enterprise Management Associates, Inc. 28 | @ema_research