EMEA [email protected] Blogs : www.dasini.net/blog/en : www.dasini.net/blog/fr Linkedin: www.linkedin.com/in/olivier-dasini Déverrouillez le pouvoir de l'analyse Big Data avec MySQL HeatWave Lakehouse ! Big Data & AI Paris - September 2023
4 … the MySQL Cloud services made by the MySQL Team The MySQL HeatWave “Family” “MySQL HeatWave” MySQL HeatWave for OLTP a.k.a MDS: (OLTP) General Purpose Optimized for OLTP MySQL HeatWave Lakehouse (Lakehouse) MySQL HeatWave (Analytics / ML) OLTP + Analytics (OLAP) + Machine Learning InnoDB Lakehouse InnoDB RAPID InnoDB RAPID
7 MySQL HeatWave Lakehouse • Query data across MySQL, the object store, or both—using standard MySQL commands • Up to 400 TB of data—the HeatWave cluster scales to 512 nodes • Querying the database is as fast as querying the object store NEW Process and query 100’s of TB of data in the object store — in a variety of file formats
8 HeatWave Lakehouse Query data in object storage • Querying in HeatWave • Scale to 512 nodes, 512 TB • CSV, Parquet, Aurora & Redshift exports • Fastest load, Best price-performance Use standard MySQL syntax Combine OLTP data with object store data • 100% compatible with MySQL syntax • Use MySQL Autopilot to auto-infer schema, estimate capacity, load times, and generate load scripts • Treat data lake data as tables • Use select, join, aggregations, filters, etc… to combine data in OLTP tables with data lake tables Main benefits 1 2 3
9 InnoDB mysql> CREATE TABLE Sensor (date DATE, degree INT) ENGINE=DATALAKE ENGINE_ATTRIBUTE = ‘{“dialect”:{“format”:”csv”}, “file”: [{“prefix”:”temp_sensor_1.csv”}]}’ SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE Sensor SECONDARY_LOAD; mysql> SELECT count(*) FROM Sensor, SALES WHERE Sensor.degrees >30 and Sensor.date=SALES.date; Object Store Devices Social Voice Sensors Events Process and query 100’s of TB of data in the object store — in a variety of file formats MySQL HeatWave Lakehouse
10 MySQL HeatWave can process data from multiple data sources e.g. Oracle Golden Gate, ... OCI Object Storage AWS Aurora AWS Redshift Data can be in a file or other databases → No requirement to have data in MySQL https://www.mysql.com/products/mysqlheatwave/lakehouse
11 Load performance MySQL HeatWave is faster to load data and less expensive 500 TB TPCH HeatWave Lakehouse Snowflake Redshift Databricks Google BigQuery Annual Cost $1,742,036 $2,300,160 $1,544,268 $1,822,817 $1,446,900 Pricing Term PAYG Standard Edition 1 year upfront 1 year reserved 1 year reserved Load Time (hrs) 4.43 9.04 (2x) 40.86 (9.2x) 25.42 (5.7x) 38.2 (8.6x) https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse
12 Load & Query performance comparison – best in the industry 500 TB TPCH HeatWave Lakehouse Snowflake Redshift Databricks Google BigQuery Annual Cost $1,742,036 $2,300,160 $1,544,268 $1,822,817 $1,446,900 Pricing Term PAYG Standard Edition 1 year upfront 1 year reserved 1 year reserved Load Time (hrs) 4.43 9.04 (2x slower) 40.86 (9.2x slower) 25.42 (5.7x slower) 38.2 (8.6x slower) Query Time 2,150 sec 39,040 sec (18x slower) 32,715 sec (15x slower) 37,729 sec (17x slower) 76,180 sec (35x slower) MySQL HeatWave is faster to load & query data and still less expensive https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse
14 Fully compatible MySQL syntax generated by Autopilot, no human required Three simple steps to query data in object store 1. Run MySQL Autopilot on data in object store mysql> CALL sys.heatwave_load(<db_names>,<info_about_file_in_OS>); OUTPUT: DDLs automatically generated 2. Execute DDLs generated by Autopilot mysql> CREATE TABLE `cust1DB`.`Sensor` (date DATE, degree INT) -> ENGINE=Lakehouse SECONDARY_ENGINE=Rapid -> ENGINE_ATTRIBUTE = ‘{“file”:[{“prefix”:”sensor1-April”, “par”:”<PAR URL>”}]}’; mysql> ALTER TABLE `cust1DB`.`Sensor` SECONDARY_LOAD; 3. Query across file and table mysql> SELECT count(*) FROM Sensor, SALES WHERE Sensor.degrees > 30 AND Sensor.date = SALES.date;
16 1. Designed to process non-MySQL workloads 2. Best query performance and load performance for data warehouse 3. Query data in object store and OLTP data in MySQL database 4. Data in object store remains in object store 5. MySQL Autopilot for automating data management 6. HeatWave scales to 512 HeatWave nodes and 1/2 Petabyte data Process and query hundreds of terabytes of data in the object store MySQL HeatWave Lakehouse
21 “Oracle announced MySQL HeatWave with Autopilot last August, which may very well have been the single greatest innovation in open source cloud databases in the last 20 years to that point. Now Oracle has gone beyond its original unifying of OLTP and OLAP in HeatWave, with MySQL HeatWave ML. Oracle is bringing all of the machine learning processing and models inside the database, so that customers not only avoid managing ML databases apart from the core database, but also eliminate the hassles of ETL, gaining speed, accuracy, and cost-effectiveness in the bargain.” “This latest announcement from Oracle is the third major release of MySQL HeatWave in just over 12 months. Oracle has delivered more cloud database innovations during that timeframe than most cloud database vendors have delivered in the last decade. Not only does the in-database HeatWave ML make Redshift ML look like yesterday’s tech in terms of engineering, performance and cost, but the latest MySQL HeatWave TPC-DS benchmarks demonstrate that Amazon Redshift, Snowflake, Azure Synapse and Google BigQuery are all slower and more expensive. It’s rather clear who’s innovating in cloud databases and who’s being complacent.” Feedback from analysts
object storage and for running queries on object store… This scale out characteristic of HeatWave Lakehouse for data management is key to efficiently process very large amounts of data.” Henry Tullis Leader, Cloud Infrastructure and Engineering Deloitte Consulting
HeatWave demonstrates that Lakehouse performance can be identical to transaction query performance—unheard of and even unthinkable.” “For HeatWave Lakehouse to deliver record performance for both loading data and querying data is an unprecedented innovation in cloud data services.” “The ability of HeatWave to load and query data on such a massive number of nodes in parallel is the first in the industry.” “MySQL HeatWave Lakehouse is not your typical analytical database architecture, and its design engineering will continue to push the competitive market forward.” “Data lakehouses are meant to bridge the gap between data warehouses and data lakes... MySQL HeatWave Lakehouse takes that a step further by making cloud object storage a first-class citizen.” “Simply put: MySQL HeatWave Lakehouse enables you to stay ahead of the competition by taking swift action on meaningful business insights.” “Organizations looking for the best value in the cloud data lakehouse landscape must seriously consider MySQL HeatWave Lakehouse.” “MySQL HeatWave Lakehouse takes customers to a new level of capabilities” “MySQL HeatWave Lakehouse can simplify the life of data management professionals and should improve the customer experience.” “In the era of AI, the ability to process data is the absolute demarcation between companies that are going to get productivity and outcomes and those that won't…” “The performance against the big names is pretty incredible you know…when you talk about a highly specialized accelerated workload, this is a tremendously powerful use case...”