Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL HeatWave Lakehouse - Modernize MySQL & no...

MySQL HeatWave Lakehouse - Modernize MySQL & non-MySQL workloads with MySQL HeatWave

MySQL HeatWave enables users to process and query hundreds of terabytes of data in the object store — in a variety of file formats, such as CSV, Parquet, and Aurora/Redshift export files.
The data remains in the object store and customer can query it with standard SQL syntax.

With this capability, MySQL HeatWave provides one service for transaction processing, analytics across data warehouses and data lakes, and machine learning — without ETL across cloud services.
And with no additional cost for this capability except the cost of storing the data in object store.

Olivier DASINI

October 03, 2023
Tweet

More Decks by Olivier DASINI

Other Decks in Technology

Transcript

  1. MySQL HeatWave Lakehouse Olivier Dasini MySQL Cloud Principal Solutions Architect

    EMEA [email protected] Blogs : www.dasini.net/blog/en : www.dasini.net/blog/fr Linkedin: www.linkedin.com/in/olivier-dasini August 2023 Modernize MySQL & non-MySQL workloads with MySQL HeatWave
  2. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    2 Me, Myself & I  MySQL Geek  Addicted to MySQL for 15+ years  Playing with databases for 20+ years  MySQL Writer, Blogger and Speaker  Also: DBA, Consultant, Architect, Trainer, ...  MySQL Cloud Principal Solutions Architect EMEA at Oracle  Stay up to date!  Blog: www.dasini.net/blog/en  Linkedin: www.linkedin.com/in/olivier-dasini/  Twitter: @freshdaz Olivier DASINI
  3. 45 regions in 23 countries including Paris & Marseille; 12

    Azure Interconnect Regions Oracle Cloud Infrastructure Global Locations MySQL HeatWave Databases Service(s) is/are part of all of them MySQL HeatWave Databases Service(s) is/are part of all of them And also Cloud @Customer & EU Soveriegn Cloud 100% renewable energy by 2025 3 Copyright © 2023, Oracle and/or its affiliates August 2023 https://www.oracle.com/cloud/public-cloud-regions
  4. Oracle Cloud Infrastructure Europe Locations MySQL HeatWave Databases Service(s) is/are

    part of all of them MySQL HeatWave Databases Service(s) is/are part of all of them 4 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/cloud/public-cloud-regions August 2023
  5. MySQL HeatWave – optimized for analytics, machine learning and OLTP

    5 Copyright © 2023, Oracle and/or its affiliates
  6. Existing applications work without changes Oracle Analytics Cloud is integrated

    with MySQL HeatWave Oracle Analytics Cloud is integrated with MySQL HeatWave 6 Copyright © 2023, Oracle and/or its affiliates
  7. Already best performance in industry for data warehouse TPC-H 10TB

    TPC-H 10TB 13x better than Redshift 28x better than Snowflake 28x better than BigQuery 62x better than Databricks 10X ra3.4xlarge X-Large Cluster 800 slots Large Cluster Benchmark queries are derived from the TPC-H benchmarks, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specifications. 4.2x faster than Redshift 3.3x faster than Snowflake 5.6x faster than BigQuery 7.4x faster than Databricks 7 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/heatwave/performance/#heatwave-on-oci
  8. Already lowest cost in industry for data warehouse TPC-H 10TB

    price performance comparison TPC-H 10TB price performance comparison 13x better than Redshift 28x better than Snowflake 28x better than BigQuery 62x better than Databricks 3 year reserved, paid upfront Standard Edition 1 year reserved 1 year reserved Benchmark queries are derived from the TPC-H benchmarks, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specifications. 8 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/heatwave/performance/#heatwave-on-oci
  9. Classification Classify warranty claims Identify similar users Recommend movies Recommender

    System Loan default prediction Predict flight delay Rain fall prediction Regression Predict Advt spend ROI Demand forecasting Anomaly Detection Detect anomalous credit card spend Identify game hacker Fully automated in-database machine learning Training, inference, explanation with HeatWave AutoML Training, inference, explanation with HeatWave AutoML • In-database • Secure • Fully automated • 25x faster than Redshift ML • Explainable • No additional cost Time-series forecasting 9 Copyright © 2023, Oracle and/or its affiliates
  10. MySQL HeatWave is available in multiple clouds Optimized for price

    performance in each cloud 10 Copyright © 2023, Oracle and/or its affiliates
  11. MySQL HeatWave on AWS 11 Copyright © 2023, Oracle and/or

    its affiliates • MySQL HeatWave runs natively on AWS, optimized for AWS infrastructure • Data doesn’t leave AWS – saves egress cost, and avoids compliance approvals • Lowest latency access to MySQL HeatWave • Tight integration with the AWS ecosystem – S3, CloudWatch, PrivateLink • Easier migration from other databases (e.g., Amazon Aurora, Redshift, Snowflake) OCI and AWS Regions – August 2023 Create & manage a MySQL DB System with a HeatWave Cluster to use with AWS applications Create & manage a MySQL DB System with a HeatWave Cluster to use with AWS applications https://dev.mysql.com/doc/heatwave-aws/en
  12. MySQL HeatWave on Azure 12 Copyright © 2023, Oracle and/or

    its affiliates • Familiar Azure-native user experience • Automated identity, networking, and monitoring integration • Private interconnect and networking with < 2 ms latency • Use Microsoft Azure services with MySQL HeatWave • Collaborative support https://www.oracle.com/cloud/azure/oracle-database-for-azure Connecting to MySQL HeatWave on OCI from Azure VNET Connecting to MySQL HeatWave on OCI from Azure VNET
  13. MySQL HeatWave customer momentum Data warehouse, machine learning and OLTP

    workloads Data warehouse, machine learning and OLTP workloads https://www.oracle.com/customers/?product=mpd-cld-infra:db-services:mysql-heatwave 13 Copyright © 2023, Oracle and/or its affiliates
  14. • Databases are systems of record • Files are repository

    for other types of data (e.g IoT, web content, log files) • 99.5% of collected data remains unused Massive amount of data stored outside databases Object Store Devices Sensors Social Events 14 Copyright © 2023, Oracle and/or its affiliates
  15. Object Store Query InnoDB AWS Aurora export Redshift export HeatWave

    Lakehouse can query object store and MySQL database OLTP Analytics Autopilot Machine Learning Lakehouse Data stays in object store, processed by HeatWave Data stays in object store, processed by HeatWave MySQL Autopilot 16 Copyright © 2023, Oracle and/or its affiliates
  16. HeatWave Lakehouse Query data in object storage • Querying in

    HeatWave • Scale to 512 nodes, 512 TB • CSV, Parquet, Aurora & Redshift exports • Fastest load, Best price-performance Use standard MySQL syntax Combine OLTP data with object store data • 100% compatible with MySQL syntax • Use MySQL Autopilot to auto-infer schema, estimate capacity, load times, and generate load scripts • Treat data lake data as tables • Use select, join, aggregations, filters, etc… to combine data in OLTP tables with data lake tables Main benefits Main benefits 17 Copyright © 2023, Oracle and/or its affiliates 1 2 3
  17. HeatWave scales out Scale to any cluster size • Scale

    to any size upto 512 nodes • Scale up or scale down Real-time scaling Highly scalable • System is always available for all operations • Data in the cluster is always balanced • Query performance scales very well with cluster size • Load performance scales very well with cluster size Flexible, fast, and highly scalable Flexible, fast, and highly scalable 18 Copyright © 2023, Oracle and/or its affiliates
  18. Fully compatible MySQL syntax generated by Autopilot, no human required

    Fully compatible MySQL syntax generated by Autopilot, no human required Three simple steps to query data in object store 1. Run MySQL Autopilot on data in object store mysql> CALL sys.heatwave_load(<db_names>,<info_about_file_in_OS>); OUTPUT: DDLs automatically generated 2. Execute DDLs generated by Autopilot mysql> CREATE TABLE `cust1DB`.`Sensor` (date DATE, degree INT) -> ENGINE=Lakehouse SECONDARY_ENGINE=Rapid -> ENGINE_ATTRIBUTE = ‘{“file”:[{“prefix”:”sensor1-April”, “par”:”<PAR URL>”}]}’; mysql> ALTER TABLE `cust1DB`.`Sensor` SECONDARY_LOAD; 3. Query across file and table mysql> SELECT count(*) FROM Sensor, SALES WHERE Sensor.degrees > 30 AND Sensor.date = SALES.date; 19 Copyright © 2023, Oracle and/or its affiliates
  19. MySQL Autopilot: machine learning-powered automation Workload aware automation for analytics,

    OLTP and object store Workload aware automation for analytics, OLTP and object store 20 Copyright © 2023, Oracle and/or its affiliates
  20. Auto provisioning with MySQL HeatWave Lakehouse How to determine the

    right cluster size required for processing data in object store? 21 Copyright © 2023, Oracle and/or its affiliates
  21. Auto schema inference with HeatWave Lakehouse …Even for files that

    don’t have metadata! 22 Copyright © 2023, Oracle and/or its affiliates
  22. Inferring schema mapping from file metadata has limitations Adaptive data

    sampling in MySQL Autopilot is fast and accurate Adaptive data sampling in MySQL Autopilot is fast and accurate MySQL Autopilot performance on 1 node … Adaptive data sampling N 1 C 2 C N … TPCH LINEITEM (TB) Autopilot time (sec) 1 TB 8 20 TB 13 75 TB 15 200 TB 25 300 TB 40 400 TB 47 23 Copyright © 2023, Oracle and/or its affiliates
  23. Load performance MySQL HeatWave is faster to load data and

    less expensive MySQL HeatWave is faster to load data and less expensive 500 TB TPCH HeatWave Lakehouse Snowflake Redshift Databricks Google BigQuery Annual Cost $1,742,036 $2,300,160 $1,544,268 $1,822,817 $1,446,900 Pricing Term PAYG Standard Edition 1 year upfront 1 year reserved 1 year reserved Load Time (hrs) 4.43 9.04 (2x) 40.86 (9.2x) 25.42 (5.7x) 38.2 (8.6x) 24 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse
  24. Load & Query performance comparison – best in the industry

    500 TB TPCH HeatWave Lakehouse Snowflake Redshift Databricks Google BigQuery Annual Cost $1,742,036 $2,300,160 $1,544,268 $1,822,817 $1,446,900 Pricing Term PAYG Standard Edition 1 year upfront 1 year reserved 1 year reserved Load Time (hrs) 4.43 9.04 (2x slower) 40.86 (9.2x slower) 25.42 (5.7x slower) 38.2 (8.6x slower) Query Time 2,150 sec 39,040 sec (18x slower) 32,715 sec (15x slower) 37,729 sec (17x slower) 76,180 sec (35x slower) 25 Copyright © 2023, Oracle and/or its affiliates MySQL HeatWave is faster to load & query data and still less expensive MySQL HeatWave is faster to load & query data and still less expensive https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse
  25. MySQL Autopilot boosts query performance of HeatWave Lakehouse Optimizer learns

    and improves query plan based on previous queries executed Optimizer learns and improves query plan based on previous queries executed A B C ⨝ ⨝ Query #1 A B C ⨝ ⨝ Autopilot Statistics Query #2 A B ⨝ U D A B D U ⨝ 26 Copyright © 2023, Oracle and/or its affiliates
  26. Provides flexibility to develop applications on object store without any

    performance, cost impact Provides flexibility to develop applications on object store without any performance, cost impact Same query performance when data inside MySQL or in object store HeatWave HeatWave Lakehouse Snowflake Redshift Google Big Query Databricks 0 20 40 60 80 100 120 14.2 14.2 47 59 79 105 10 TB TPC-H Query performance Query time (seconds) 10 HeatWave nodes, X-Large cluster for Snowflake; 10 nodes of ra3.4xlarge for Redshift; 800 slots for Google BigQuery; Large cluster for Databricks *Benchmark queries are derived from the TPC-H benchmark, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specifications. 27 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse
  27. Provides flexibility to develop applications on object store without any

    performance, cost impact Provides flexibility to develop applications on object store without any performance, cost impact Same price-performance when data inside MySQL or in object store HeatWave HeatWave Lakehouse Snowflake Redshift Google Big Query Databricks 0 10 20 30 40 50 60 70 80 90 100 1.5 1.5 41.9 20.2 41.4 92.5 10TB TPC-H Price-Performance Price-Performance (cents) • 10 HeatWave Nodes, X-Large cluster for Snowflake; 10 nodes of ra3.4xlarge for Redshift; 800 slots for Google BigQuery; Large cluster for Databricks • Standard edition price for Snowflake; 3 yr upfront price for Redshift; 1 year reserved price for Google BigQuery and Databricks 28 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse
  28. HeatWave Lakehouse is integrated with VS-Code for MySQL shell 29

    Copyright © 2023, Oracle and/or its affiliates
  29. 1. Designed to process non-MySQL workloads 2. Best query performance

    and load performance for data warehouse 3. Query data in object store and OLTP data in MySQL database 4. Data in object store remains in object store 5. MySQL Autopilot for automating data management 6. HeatWave scales to 512 HeatWave nodes and 1/2 Petabyte data Summary Functionality available with MySQL HeatWave in all OCI regions Functionality available with MySQL HeatWave in all OCI regions 30 Copyright © 2023, Oracle and/or its affiliates
  30. Get $300 in credits and try free for 30 days

    Get started with MySQL HeatWave oracle.com/mysql/free Learn more about MySQL HeatWave oracle.com/mysql Request a guided workshop Ask your account manager 32 Copyright © 2023, Oracle and/or its affiliates
  31. Follow us on Social Media “Data is the Oxygen of

    Business” 33 Copyright © 2023, Oracle and/or its affiliates
  32. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    34 Merci! Q&R Olivier Dasini MySQL Cloud Principal Solutions Architect EMEA [email protected] Blogs : www.dasini.net/blog/en : www.dasini.net/blog/fr Linkedin: www.linkedin.com/in/olivier-dasini Twitter : @freshdaz
  33. “HeatWave Lakehouse scales out very well for loading data from

    object storage and for running queries on object store… This scale out characteristic of HeatWave Lakehouse for data management is key to efficiently process very large amounts of data.” Henry Tullis Leader, Cloud Infrastructure and Engineering Deloitte Consulting
  34. What are industry analysts saying about MySQL HeatWave Lakehouse? “MySQL

    HeatWave demonstrates that Lakehouse performance can be identical to transaction query performance—unheard of and even unthinkable.” “For HeatWave Lakehouse to deliver record performance for both loading data and querying data is an unprecedented innovation in cloud data services.” “The ability of HeatWave to load and query data on such a massive number of nodes in parallel is the first in the industry.” “MySQL HeatWave Lakehouse is not your typical analytical database architecture, and its design engineering will continue to push the competitive market forward.” “Data lakehouses are meant to bridge the gap between data warehouses and data lakes... MySQL HeatWave Lakehouse takes that a step further by making cloud object storage a first-class citizen.” “Simply put: MySQL HeatWave Lakehouse enables you to stay ahead of the competition by taking swift action on meaningful business insights.” “Organizations looking for the best value in the cloud data lakehouse landscape must seriously consider MySQL HeatWave Lakehouse.” “MySQL HeatWave Lakehouse takes customers to a new level of capabilities” “MySQL HeatWave Lakehouse can simplify the life of data management professionals and should improve the customer experience.” “In the era of AI, the ability to process data is the absolute demarcation between companies that are going to get productivity and outcomes and those that won't…” “The performance against the big names is pretty incredible you know…when you talk about a highly specialized accelerated workload, this is a tremendously powerful use case...”