Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The MySQL HeatWave Lakehouse Revolution - From Basic Query to Advanced Analytics and Machine Learning

The MySQL HeatWave Lakehouse Revolution - From Basic Query to Advanced Analytics and Machine Learning

Big Data & AI Paris 2023

Throughout its journey, MySQL has constantly reinvented itself, staying at the forefront of innovation in the ever-evolving technological landscape. For a multitude of companies, MySQL has become the essential technological anchor.

The arrival of HeatWave Lakehouse has propelled MySQL far beyond its traditional role as a database optimized for OLTP.
Imagine a platform where querying structured and semi-structured data happens in the blink of an eye, powered by the query accelerator HeatWave.

But innovation doesn't stop there. With MySQL HeatWave AutoML, machine learning seamlessly and efficiently integrates into your database.

Come and discover how, by combining advanced analysis, machine learning, and a visionary architecture, MySQL positions itself at the forefront of future data management!

Olivier DASINI

October 03, 2023
Tweet

More Decks by Olivier DASINI

Other Decks in Technology

Transcript

  1. The MySQL HeatWave Lakehouse Revolution Olivier Dasini MySQL Cloud Principal

    Solutions Architect EMEA [email protected] Blogs : www.dasini.net/blog/en : www.dasini.net/blog/fr Linkedin: www.linkedin.com/in/olivier-dasini Big Data & AI Paris - September 2023 From Basic Query to Advanced Analytics and Machine Learning
  2. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    2 Me, Myself & I  MySQL Geek  Addicted to MySQL for 15+ years  Playing with databases for 20+ years  MySQL Writer, Blogger and Speaker  Also: DBA, Consultant, Architect, Trainer, ...  MySQL Cloud Principal Solutions Architect EMEA at Oracle  Stay up to date!  Blog: www.dasini.net/blog/en  Linkedin: www.linkedin.com/in/olivier-dasini/  Twitter: @freshdaz Olivier DASINI
  3. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    Agenda 1. MySQL HeatWave Overview 2. MySQL HeatWave AutoML 3. MySQL HeatWave Lakehouse 4. Demo 3
  4. 4 Copyright © 2023, Oracle and/or its affiliates. All rights

    reserved. MySQL HeatWave 4 In-Memory Query Accelerator with Built-in ML
  5. MySQL HeatWave – Optimized for analytics, machine learning & OLTP

    5 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql OLTP Analytics Autopilot Machine Learning
  6. Already best performance in industry for data warehouse TPC-H 10TB

    TPC-H 10TB 13x better than Redshift 28x better than Snowflake 28x better than BigQuery 62x better than Databricks 10X ra3.4xlarge X-Large Cluster 800 slots Large Cluster Benchmark queries are derived from the TPC-H benchmarks, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specifications. 4.2x faster than Redshift 3.3x faster than Snowflake 5.6x faster than BigQuery 7.4x faster than Databricks 6 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/heatwave/performance/#heatwave-on-oci
  7. TPC-H 10TB - Price / performance comparison TPC-H 10TB -

    Price / performance comparison 13x better than Redshift 28x better than Snowflake 28x better than BigQuery 62x better than Databricks 3 year reserved, paid upfront Standard Edition 1 year reserved 1 year reserved Benchmark queries are derived from the TPC-H benchmarks, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specifications. 7 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/heatwave/performance/#heatwave-on-oci Already lowest cost in industry for data warehouse
  8. 8 Copyright © 2023, Oracle and/or its affiliates. All rights

    reserved. MySQL Autopilot 8 Machine Learning Based Automation
  9. MySQL Autopilot: machine learning-powered automation Workload aware automation for analytics,

    OLTP and Lakehouse Workload aware automation for analytics, OLTP and Lakehouse 9 Copyright © 2023, Oracle and/or its affiliates
  10. • MySQL Autopilot indexing (limited availability) – Helps customers eliminate

    the time-consuming tasks of creating optimal indexes for their OLTP workloads and maintaining those over time as workloads evolve. MySQL Autopilot automatically determines the indexes customers should create or drop from their tables to optimize their OLTP throughput, using machine learning to make a prediction based on individual application workloads. In addition, Autopilot indexing predicts the expected improvement with the recommended indexes without creating those indexes and without incurring compute or storage overhead on the users’ tenancy • Auto compression – Helps customers determine the optimal compression algorithm for each column, which improves load and query performance with faster data compression and decompression. By reducing memory usage, customers can cut costs by up to 25 percent • Adaptive query execution – Helps customers optimize the execution plan of a query after the query has started to execute, improving the performance of ad hoc queries by up to 25 percent. It uses information obtained from the partial execution of the query to adjust data structures and system resources and then independently optimizes query execution for each HeatWave node based on actual data distribution at run time • Auto load and unload – Autopilot automatically loads the columns being used in an application workload to HeatWave and automatically unload tables that were never or rarely queried. This helps free up memory and reduce costs for customers, without having to manually perform this task 10 Copyright © 2023, Oracle and/or its affiliates NEW MySQL Autopilot: machine learning-powered automation Help improve performance and scalability without requiring database tuning expertise Help improve performance and scalability without requiring database tuning expertise
  11. 11 Copyright © 2023, Oracle and/or its affiliates. All rights

    reserved. MySQL HeatWave AutoML 11 In-database machine learning with AutoML
  12. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    12 HeatWave AutoML automates the ML lifecycle & all models can be explained Dataset Data preprocessing Algorithm selection Adaptive sampling Feature selection Hyper-parameter tuning Tuned model Model explainer Prediction explainer Regulatory compliance Fairness Repeatability Causality Trust Leverages Leverages Oracle AutoML Oracle AutoML technology to automate the process of training a machine learning model technology to automate the process of training a machine learning model https://dev.mysql.com/doc/heatwave/en/heatwave-machine-learning.html
  13. Classification Classify warranty claims Identify similar users Recommend movies Recommender

    System Loan default prediction Predict flight delay Rain fall prediction Regression Predict Advt spend ROI Demand forecasting Anomaly Detection Detect anomalous credit card spend Identify game hacker Fully automated in-database machine learning Training, inference, explanation with MySQL HeatWave AutoML Training, inference, explanation with MySQL HeatWave AutoML • In-database • Secure • Multiple ML algorithms • Fully automated • 25x faster than Redshift ML • Explainable • No additional cost Time-series forecasting 13 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/#automl
  14. MySQL HeatWave is available in multiple clouds Optimized for price

    performance in each cloud 14 Copyright © 2023, Oracle and/or its affiliates
  15. 15 Copyright © 2023, Oracle and/or its affiliates. All rights

    reserved. MySQL HeatWave Lakehouse 15 Fast analytics across databases and object storage NEW!
  16. Object Store Query InnoDB AWS Aurora export Redshift export HeatWave

    Lakehouse can query object store and MySQL database OLTP Analytics Autopilot Machine Learning Lakehouse Data stays in object store, processed by HeatWave Data stays in object store, processed by HeatWave MySQL Autopilot 16 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/#lakehouse
  17. HeatWave Lakehouse Query data in object storage • Querying in

    HeatWave • Scale to 512 nodes, 512 TB • CSV, Parquet, Aurora & Redshift exports • Fastest load, Best price-performance Use standard MySQL syntax Combine OLTP data with object store data • 100% compatible with MySQL syntax • Use MySQL Autopilot to auto-infer schema, estimate capacity, load times, and generate load scripts • Treat data lake data as tables • Use select, join, aggregations, filters, etc… to combine data in OLTP tables with data lake tables Main benefits Main benefits 17 Copyright © 2023, Oracle and/or its affiliates 1 2 3
  18. Fully compatible MySQL syntax generated by Autopilot, no human required

    Fully compatible MySQL syntax generated by Autopilot, no human required Three simple steps to query data in object store 1. Run MySQL Autopilot on data in object store mysql> CALL sys.heatwave_load(<db_names>,<info_about_file_in_OS>); OUTPUT: DDLs automatically generated 2. Execute DDLs generated by Autopilot mysql> CREATE TABLE `cust1DB`.`Sensor` (date DATE, degree INT) -> ENGINE=Lakehouse SECONDARY_ENGINE=Rapid -> ENGINE_ATTRIBUTE = ‘{“file”:[{“prefix”:”sensor1-April”, “par”:”<PAR URL>”}]}’; mysql> ALTER TABLE `cust1DB`.`Sensor` SECONDARY_LOAD; 3. Query across file and table mysql> SELECT count(*) FROM Sensor, SALES WHERE Sensor.degrees > 30 AND Sensor.date = SALES.date; 18 Copyright © 2023, Oracle and/or its affiliates
  19. MySQL Autopilot for MySQL HeatWave Lakehouse Copyright © 2023, Oracle

    and/or its affiliates Auto Provisioning • Adaptively sample raw files and collect statistics • Estimate memory footprint of the data to be loaded Auto Schema Inference • Sample raw files to infer column data types • Generate DDLs to create tables Adaptive Sampling • Adaptively samples a fraction of files to collect stats • Use collected stats for various Autopilot features Auto Load • Predict load time • Load script generation Adaptive Data Flow • System adapts to the performance of object store • Improves system performance and reliability Auto Query Plan Improvement • Continuously collect statistics while running queries • Enhance future execution plans NEW NEW NEW NEW Machine learning-powered automation Machine learning-powered automation 19
  20. Auto provisioning with MySQL HeatWave Lakehouse How to determine the

    right cluster size required for processing data in object store? 20 Copyright © 2023, Oracle and/or its affiliates
  21. Auto schema inference with MySQL HeatWave Lakehouse …Even for files

    that don’t have metadata! 21 Copyright © 2023, Oracle and/or its affiliates
  22. HeatWave scales out Scale to any cluster size • Scale

    to any size up to 512 nodes • Scale up or scale down Real-time scaling Highly scalable • System is always available for all operations • Data in the cluster is always balanced • Query performance scales very well with cluster size • Load performance scales very well with cluster size Flexible, fast, and highly scalable Flexible, fast, and highly scalable 22 Copyright © 2023, Oracle and/or its affiliates
  23. Load & Query performance comparison – Best in the industry

    500 TB TPC-H HeatWave Lakehouse Snowflake Redshift Databricks Google BigQuery Annual Cost $1,742,036 $2,300,160 $1,544,268 $1,822,817 $1,446,900 Pricing Term PAYG Standard Edition 1 year upfront 1 year reserved 1 year reserved Load Time (hrs) 4.43 9.04 (2x slower) 40.86 (9.2x slower) 25.42 (5.7x slower) 38.2 (8.6x slower) Query Time 2,150 sec 39,040 sec (18x slower) 32,715 sec (15x slower) 37,729 sec (17x slower) 76,180 sec (35x slower) 23 Copyright © 2023, Oracle and/or its affiliates MySQL HeatWave is faster to load & query data and still less expensive MySQL HeatWave is faster to load & query data and still less expensive https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse
  24. Provides flexibility to develop applications on object store without any

    performance, cost impact Provides flexibility to develop applications on object store without any performance, cost impact Same price-performance when data inside MySQL or in object store HeatWave HeatWave Lakehouse Snowflake Redshift Google Big Query Databricks 0 10 20 30 40 50 60 70 80 90 100 1.5 1.5 41.9 20.2 41.4 92.5 10TB TPC-H Price-Performance Price-Performance (cents) • 10 HeatWave Nodes, X-Large cluster for Snowflake; 10 nodes of ra3.4xlarge for Redshift; 800 slots for Google BigQuery; Large cluster for Databricks • Standard edition price for Snowflake; 3 yr upfront price for Redshift; 1 year reserved price for Google BigQuery and Databricks 24 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/heatwave/performance/#heatwa ve-lakehouse
  25. 1. Designed to process non-MySQL workloads 2. Best query performance

    and load performance for data warehouse 3. Query data in object store and OLTP data in MySQL database 4. Data in object store remains in object store 5. MySQL Autopilot for automating data management 6. HeatWave scales to 512 HeatWave nodes and 1/2 Petabyte data MySQL HeatWave Lakehouse Functionality available with MySQL HeatWave in all OCI regions Functionality available with MySQL HeatWave in all OCI regions 25 Copyright © 2023, Oracle and/or its affiliates
  26. 26 Copyright © 2023, Oracle and/or its affiliates. All rights

    reserved. Demo 26 MySQL HeatWave Lakehouse
  27. Rendez-vous sur le stand A28 Mardi 26 septembre • 11h00

    - 11h15 / Stand ORACLE A28 Découvrez MySQL HeatWave AutoML: l'apprentissage automatique pour tous • 16h00 - 16h15 / Stand ORACLE A28 Déverrouillez le pouvoir de l'analyse Big Data avec MySQL HeatWave Lakehouse !
  28. Follow us on Social Media “Data is the Oxygen of

    Business” 29 Copyright © 2023, Oracle and/or its affiliates
  29. Get $300 in credits and try free for 30 days

    Get started with MySQL HeatWave oracle.com/mysql/free Learn more about MySQL HeatWave oracle.com/mysql Request a guided workshop Ask your account manager 30 Copyright © 2023, Oracle and/or its affiliates
  30. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    31 Merci! Q&R Olivier Dasini MySQL Cloud Principal Solutions Architect EMEA [email protected] Blogs : www.dasini.net/blog/en : www.dasini.net/blog/fr Linkedin: www.linkedin.com/in/olivier-dasini Twitter : @freshdaz
  31. 45 regions in 23 countries including Paris & Marseille; 12

    Azure Interconnect Regions Oracle Cloud Infrastructure Global Locations MySQL HeatWave Databases Service(s) is/are part of all of them MySQL HeatWave Databases Service(s) is/are part of all of them And also Cloud @Customer & EU Soveriegn Cloud 100% renewable energy by 2025 33 Copyright © 2023, Oracle and/or its affiliates August 2023 https://www.oracle.com/cloud/public-cloud-regions
  32. MySQL HeatWave on AWS 34 Copyright © 2023, Oracle and/or

    its affiliates • MySQL HeatWave runs natively on AWS, optimized for AWS infrastructure • Data doesn’t leave AWS – saves egress cost, and avoids compliance approvals • Lowest latency access to MySQL HeatWave • Tight integration with the AWS ecosystem – S3, CloudWatch, PrivateLink • Easier migration from other databases (e.g., Amazon Aurora, Redshift, Snowflake) OCI and AWS Regions – August 2023 Create & manage a MySQL DB System with a HeatWave Cluster to use with AWS applications Create & manage a MySQL DB System with a HeatWave Cluster to use with AWS applications https://dev.mysql.com/doc/heatwave-aws/en
  33. MySQL HeatWave on Azure 35 Copyright © 2023, Oracle and/or

    its affiliates • Familiar Azure-native user experience • Automated identity, networking, and monitoring integration • Private interconnect and networking with < 2 ms latency • Use Microsoft Azure services with MySQL HeatWave • Collaborative support https://www.oracle.com/cloud/azure/oracle-database-for-azure Connecting to MySQL HeatWave on OCI from Azure VNET Connecting to MySQL HeatWave on OCI from Azure VNET
  34. MySQL Autopilot boosts query performance of HeatWave Lakehouse Optimizer learns

    and improves query plan based on previous queries executed Optimizer learns and improves query plan based on previous queries executed A B C ⨝ ⨝ Query #1 A B C ⨝ ⨝ Autopilot Statistics Query #2 A B ⨝ U D A B D U ⨝ 36 Copyright © 2023, Oracle and/or its affiliates
  35. MySQL HeatWave customer momentum Data warehouse, machine learning and OLTP

    workloads Data warehouse, machine learning and OLTP workloads https://www.oracle.com/customers/?product=mpd-cld-infra:db-services:mysql-heatwave 37 Copyright © 2023, Oracle and/or its affiliates
  36. “HeatWave Lakehouse scales out very well for loading data from

    object storage and for running queries on object store… This scale out characteristic of HeatWave Lakehouse for data management is key to efficiently process very large amounts of data.” Henry Tullis Leader, Cloud Infrastructure and Engineering Deloitte Consulting
  37. What are industry analysts saying about MySQL HeatWave Lakehouse? “MySQL

    HeatWave demonstrates that Lakehouse performance can be identical to transaction query performance—unheard of and even unthinkable.” “For HeatWave Lakehouse to deliver record performance for both loading data and querying data is an unprecedented innovation in cloud data services.” “The ability of HeatWave to load and query data on such a massive number of nodes in parallel is the first in the industry.” “MySQL HeatWave Lakehouse is not your typical analytical database architecture, and its design engineering will continue to push the competitive market forward.” “Data lakehouses are meant to bridge the gap between data warehouses and data lakes... MySQL HeatWave Lakehouse takes that a step further by making cloud object storage a first-class citizen.” “Simply put: MySQL HeatWave Lakehouse enables you to stay ahead of the competition by taking swift action on meaningful business insights.” “Organizations looking for the best value in the cloud data lakehouse landscape must seriously consider MySQL HeatWave Lakehouse.” “MySQL HeatWave Lakehouse takes customers to a new level of capabilities” “MySQL HeatWave Lakehouse can simplify the life of data management professionals and should improve the customer experience.” “In the era of AI, the ability to process data is the absolute demarcation between companies that are going to get productivity and outcomes and those that won't…” “The performance against the big names is pretty incredible you know…when you talk about a highly specialized accelerated workload, this is a tremendously powerful use case...”