Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The MySQL HeatWave Lakehouse Revolution - From Basic Query to Advanced Analytics and Machine Learning

The MySQL HeatWave Lakehouse Revolution - From Basic Query to Advanced Analytics and Machine Learning

Big Data & AI Paris 2023

Throughout its journey, MySQL has constantly reinvented itself, staying at the forefront of innovation in the ever-evolving technological landscape. For a multitude of companies, MySQL has become the essential technological anchor.

The arrival of HeatWave Lakehouse has propelled MySQL far beyond its traditional role as a database optimized for OLTP.
Imagine a platform where querying structured and semi-structured data happens in the blink of an eye, powered by the query accelerator HeatWave.

But innovation doesn't stop there. With MySQL HeatWave AutoML, machine learning seamlessly and efficiently integrates into your database.

Come and discover how, by combining advanced analysis, machine learning, and a visionary architecture, MySQL positions itself at the forefront of future data management!

Olivier DASINI

October 03, 2023
Tweet

More Decks by Olivier DASINI

Other Decks in Technology

Transcript

  1. The MySQL HeatWave Lakehouse Revolution
    Olivier Dasini
    MySQL Cloud Principal Solutions Architect EMEA
    [email protected]
    Blogs : www.dasini.net/blog/en
    : www.dasini.net/blog/fr
    Linkedin: www.linkedin.com/in/olivier-dasini
    Big Data & AI Paris - September 2023
    From Basic Query to Advanced Analytics and Machine Learning

    View full-size slide

  2. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    2
    Me, Myself & I

    MySQL Geek
     Addicted to MySQL for 15+ years
     Playing with databases for 20+ years

    MySQL Writer, Blogger and Speaker
     Also: DBA, Consultant, Architect, Trainer, ...

    MySQL Cloud Principal Solutions Architect EMEA at Oracle

    Stay up to date!
     Blog: www.dasini.net/blog/en
     Linkedin: www.linkedin.com/in/olivier-dasini/
     Twitter: @freshdaz
    Olivier DASINI

    View full-size slide

  3. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    Agenda
    1. MySQL HeatWave Overview
    2. MySQL HeatWave AutoML
    3. MySQL HeatWave Lakehouse
    4. Demo
    3

    View full-size slide

  4. 4 Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    MySQL HeatWave
    4
    In-Memory Query Accelerator with Built-in ML

    View full-size slide

  5. MySQL HeatWave – Optimized for analytics, machine learning & OLTP
    5 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql
    OLTP Analytics Autopilot Machine Learning

    View full-size slide

  6. Already best performance in industry for data warehouse
    TPC-H 10TB
    TPC-H 10TB
    13x
    better than
    Redshift
    28x
    better than
    Snowflake
    28x
    better than
    BigQuery
    62x
    better than
    Databricks
    10X ra3.4xlarge X-Large Cluster 800 slots Large Cluster
    Benchmark queries are derived from the TPC-H benchmarks, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specifications.
    4.2x
    faster than
    Redshift
    3.3x
    faster than
    Snowflake
    5.6x
    faster than
    BigQuery
    7.4x
    faster than
    Databricks
    6 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/heatwave/performance/#heatwave-on-oci

    View full-size slide

  7. TPC-H 10TB - Price / performance comparison
    TPC-H 10TB - Price / performance comparison
    13x
    better than
    Redshift
    28x
    better than
    Snowflake
    28x
    better than
    BigQuery
    62x
    better than
    Databricks
    3 year
    reserved,
    paid upfront
    Standard
    Edition
    1 year
    reserved
    1 year
    reserved
    Benchmark queries are derived from the TPC-H benchmarks, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specifications.
    7 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/heatwave/performance/#heatwave-on-oci
    Already lowest cost in industry for data warehouse

    View full-size slide

  8. 8 Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    MySQL Autopilot
    8
    Machine Learning Based Automation

    View full-size slide

  9. MySQL Autopilot: machine learning-powered automation
    Workload aware automation for analytics, OLTP and Lakehouse
    Workload aware automation for analytics, OLTP and Lakehouse
    9 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide


  10. MySQL Autopilot indexing (limited availability)
    – Helps customers eliminate the time-consuming tasks of creating optimal indexes for their OLTP workloads
    and maintaining those over time as workloads evolve. MySQL Autopilot automatically determines the
    indexes customers should create or drop from their tables to optimize their OLTP throughput, using
    machine learning to make a prediction based on individual application workloads. In addition, Autopilot
    indexing predicts the expected improvement with the recommended indexes without creating those
    indexes and without incurring compute or storage overhead on the users’ tenancy

    Auto compression
    – Helps customers determine the optimal compression algorithm for each column, which improves load and
    query performance with faster data compression and decompression. By reducing memory usage,
    customers can cut costs by up to 25 percent

    Adaptive query execution
    – Helps customers optimize the execution plan of a query after the query has started to execute, improving
    the performance of ad hoc queries by up to 25 percent. It uses information obtained from the partial
    execution of the query to adjust data structures and system resources and then independently optimizes
    query execution for each HeatWave node based on actual data distribution at run time

    Auto load and unload
    – Autopilot automatically loads the columns being used in an application workload to HeatWave and
    automatically unload tables that were never or rarely queried. This helps free up memory and reduce
    costs for customers, without having to manually perform this task
    10 Copyright © 2023, Oracle and/or its affiliates
    NEW
    MySQL Autopilot: machine learning-powered automation
    Help improve performance and scalability without requiring database tuning expertise
    Help improve performance and scalability without requiring database tuning expertise

    View full-size slide

  11. 11 Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    MySQL HeatWave AutoML
    11
    In-database machine learning with AutoML

    View full-size slide

  12. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    12
    HeatWave AutoML automates the ML lifecycle & all models can be explained
    Dataset
    Data preprocessing
    Algorithm selection
    Adaptive sampling
    Feature selection
    Hyper-parameter tuning
    Tuned model
    Model explainer
    Prediction explainer
    Regulatory compliance
    Fairness
    Repeatability
    Causality
    Trust
    Leverages
    Leverages Oracle AutoML
    Oracle AutoML technology to automate the process of training a machine learning model
    technology to automate the process of training a machine learning model
    https://dev.mysql.com/doc/heatwave/en/heatwave-machine-learning.html

    View full-size slide

  13. Classification
    Classify warranty claims
    Identify similar users
    Recommend movies
    Recommender System
    Loan default prediction
    Predict flight delay
    Rain fall prediction
    Regression
    Predict Advt spend ROI
    Demand forecasting
    Anomaly Detection
    Detect anomalous credit
    card spend
    Identify game hacker
    Fully automated in-database machine learning
    Training, inference, explanation with MySQL HeatWave AutoML
    Training, inference, explanation with MySQL HeatWave AutoML
    • In-database
    • Secure
    • Multiple ML algorithms
    • Fully automated
    • 25x faster than Redshift ML
    • Explainable
    • No additional cost
    Time-series
    forecasting
    13 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/#automl

    View full-size slide

  14. MySQL HeatWave is available in multiple clouds
    Optimized for price performance in each cloud
    14 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  15. 15 Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    MySQL HeatWave Lakehouse
    15
    Fast analytics across databases and object storage
    NEW!

    View full-size slide

  16. Object Store
    Query
    InnoDB
    AWS Aurora
    export
    Redshift
    export
    HeatWave Lakehouse can query object store and MySQL database
    OLTP Analytics Autopilot Machine Learning Lakehouse
    Data stays in object store, processed by HeatWave
    Data stays in object store, processed by HeatWave
    MySQL
    Autopilot
    16 Copyright © 2023, Oracle and/or its affiliates https://www.oracle.com/mysql/#lakehouse

    View full-size slide

  17. HeatWave Lakehouse
    Query data in object storage
    • Querying in HeatWave
    • Scale to 512 nodes, 512 TB
    • CSV, Parquet, Aurora &
    Redshift exports
    • Fastest load,
    Best price-performance
    Use standard MySQL syntax Combine OLTP data with
    object store data
    • 100% compatible with
    MySQL syntax
    • Use MySQL Autopilot to
    auto-infer schema, estimate
    capacity, load times, and
    generate load scripts
    • Treat data lake data as tables
    • Use select, join, aggregations,
    filters, etc… to combine data
    in OLTP tables with data lake
    tables
    Main benefits
    Main benefits
    17 Copyright © 2023, Oracle and/or its affiliates
    1 2 3

    View full-size slide

  18. Fully compatible MySQL syntax generated by Autopilot, no human required
    Fully compatible MySQL syntax generated by Autopilot, no human required
    Three simple steps to query data in object store
    1. Run MySQL Autopilot on data in object store
    mysql> CALL sys.heatwave_load(,);
    OUTPUT: DDLs automatically generated
    2. Execute DDLs generated by Autopilot
    mysql> CREATE TABLE `cust1DB`.`Sensor` (date DATE, degree INT)
    -> ENGINE=Lakehouse SECONDARY_ENGINE=Rapid
    -> ENGINE_ATTRIBUTE = ‘{“file”:[{“prefix”:”sensor1-April”, “par”:””}]}’;
    mysql> ALTER TABLE `cust1DB`.`Sensor` SECONDARY_LOAD;
    3. Query across file and table
    mysql> SELECT count(*) FROM Sensor, SALES WHERE Sensor.degrees > 30 AND Sensor.date = SALES.date;
    18 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  19. MySQL Autopilot for MySQL HeatWave Lakehouse
    Copyright © 2023, Oracle and/or its affiliates
    Auto Provisioning
    • Adaptively sample raw
    files and collect statistics
    • Estimate memory
    footprint of the data to be
    loaded
    Auto Schema
    Inference
    • Sample raw files to infer
    column data types
    • Generate DDLs to create
    tables
    Adaptive Sampling
    • Adaptively samples a
    fraction of files to collect
    stats
    • Use collected stats for
    various Autopilot features
    Auto Load
    • Predict load time
    • Load script generation
    Adaptive Data Flow
    • System adapts to the
    performance of object
    store
    • Improves system
    performance and reliability
    Auto Query Plan
    Improvement
    • Continuously collect statistics
    while running queries
    • Enhance future execution
    plans
    NEW NEW
    NEW
    NEW
    Machine learning-powered automation
    Machine learning-powered automation
    19

    View full-size slide

  20. Auto provisioning with MySQL HeatWave Lakehouse
    How to determine the right cluster size required for processing data in object store?
    20 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  21. Auto schema inference with MySQL HeatWave Lakehouse
    …Even for files that don’t have metadata!
    21 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  22. HeatWave scales out
    Scale to any cluster size
    • Scale to any size up to
    512 nodes
    • Scale up or scale down
    Real-time scaling Highly scalable
    • System is always available
    for all operations
    • Data in the cluster is
    always balanced
    • Query performance scales
    very well with cluster size
    • Load performance scales
    very well with cluster size
    Flexible, fast, and highly scalable
    Flexible, fast, and highly scalable
    22 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  23. Load & Query performance comparison – Best in the industry
    500 TB TPC-H
    HeatWave
    Lakehouse Snowflake Redshift Databricks
    Google
    BigQuery
    Annual Cost $1,742,036 $2,300,160 $1,544,268 $1,822,817 $1,446,900
    Pricing Term PAYG Standard Edition 1 year upfront 1 year reserved 1 year reserved
    Load Time
    (hrs) 4.43
    9.04
    (2x slower)
    40.86
    (9.2x slower)
    25.42
    (5.7x slower)
    38.2
    (8.6x slower)
    Query Time 2,150 sec
    39,040 sec
    (18x slower)
    32,715 sec
    (15x slower)
    37,729 sec
    (17x slower)
    76,180 sec
    (35x slower)
    23 Copyright © 2023, Oracle and/or its affiliates
    MySQL HeatWave is faster to load & query data and still less expensive
    MySQL HeatWave is faster to load & query data and still less expensive
    https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse

    View full-size slide

  24. Provides flexibility to develop applications on object store without any performance, cost impact
    Provides flexibility to develop applications on object store without any performance, cost impact
    Same price-performance when data inside MySQL or in object store
    HeatWave HeatWave
    Lakehouse
    Snowflake Redshift Google Big Query Databricks
    0
    10
    20
    30
    40
    50
    60
    70
    80
    90
    100
    1.5 1.5
    41.9
    20.2
    41.4
    92.5
    10TB TPC-H Price-Performance
    Price-Performance (cents)
    • 10 HeatWave Nodes, X-Large cluster for Snowflake; 10 nodes of ra3.4xlarge for Redshift; 800 slots for Google BigQuery; Large cluster for Databricks
    • Standard edition price for Snowflake; 3 yr upfront price for Redshift; 1 year reserved price for Google BigQuery and Databricks
    24 Copyright © 2023, Oracle and/or its affiliates
    https://www.oracle.com/mysql/heatwave/performance/#heatwa
    ve-lakehouse

    View full-size slide

  25. 1. Designed to process non-MySQL workloads
    2. Best query performance and load performance for data warehouse
    3. Query data in object store and OLTP data in MySQL database
    4. Data in object store remains in object store
    5. MySQL Autopilot for automating data management
    6. HeatWave scales to 512 HeatWave nodes and 1/2 Petabyte data
    MySQL HeatWave Lakehouse
    Functionality available with MySQL HeatWave in all OCI regions
    Functionality available with MySQL HeatWave in all OCI regions
    25 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  26. 26 Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    Demo
    26
    MySQL HeatWave Lakehouse

    View full-size slide

  27. For more information:
    https://www.oracle.com/heatwave/#lakehouse
    27 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  28. Rendez-vous sur le stand A28
    Mardi 26 septembre

    11h00 - 11h15 / Stand ORACLE A28
    Découvrez MySQL HeatWave AutoML: l'apprentissage automatique pour tous

    16h00 - 16h15 / Stand ORACLE A28
    Déverrouillez le pouvoir de l'analyse Big Data avec MySQL HeatWave Lakehouse !

    View full-size slide

  29. Follow us on Social Media
    “Data is the Oxygen of Business”
    29 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  30. Get $300 in credits and try free for 30 days
    Get started with
    MySQL HeatWave
    oracle.com/mysql/free
    Learn more about MySQL HeatWave
    oracle.com/mysql
    Request a guided workshop
    Ask your account manager
    30 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  31. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    31
    Merci!
    Q&R
    Olivier Dasini
    MySQL Cloud Principal Solutions Architect EMEA
    [email protected]
    Blogs : www.dasini.net/blog/en
    : www.dasini.net/blog/fr
    Linkedin: www.linkedin.com/in/olivier-dasini
    Twitter : @freshdaz

    View full-size slide

  32. 45 regions in 23 countries including Paris & Marseille;
    12 Azure Interconnect Regions
    Oracle Cloud Infrastructure Global Locations
    MySQL HeatWave Databases Service(s) is/are part of all of them
    MySQL HeatWave Databases Service(s) is/are part of all of them
    And also Cloud @Customer & EU Soveriegn Cloud
    100% renewable energy by 2025
    33 Copyright © 2023, Oracle and/or its affiliates
    August 2023
    https://www.oracle.com/cloud/public-cloud-regions

    View full-size slide

  33. MySQL HeatWave on AWS
    34 Copyright © 2023, Oracle and/or its affiliates
    • MySQL HeatWave runs natively on AWS, optimized for AWS infrastructure
    • Data doesn’t leave AWS – saves egress cost, and avoids compliance approvals
    • Lowest latency access to MySQL HeatWave
    • Tight integration with the AWS ecosystem – S3, CloudWatch, PrivateLink
    • Easier migration from other databases (e.g., Amazon Aurora, Redshift, Snowflake)
    OCI and AWS Regions – August 2023
    Create & manage a MySQL DB System with a HeatWave Cluster to use with AWS applications
    Create & manage a MySQL DB System with a HeatWave Cluster to use with AWS applications
    https://dev.mysql.com/doc/heatwave-aws/en

    View full-size slide

  34. MySQL HeatWave on Azure
    35 Copyright © 2023, Oracle and/or its affiliates
    • Familiar Azure-native user
    experience
    • Automated identity,
    networking, and monitoring
    integration
    • Private interconnect and
    networking with < 2 ms
    latency
    • Use Microsoft Azure services
    with MySQL HeatWave
    • Collaborative support
    https://www.oracle.com/cloud/azure/oracle-database-for-azure
    Connecting to MySQL HeatWave on OCI from Azure VNET
    Connecting to MySQL HeatWave on OCI from Azure VNET

    View full-size slide

  35. MySQL Autopilot boosts query performance of HeatWave Lakehouse
    Optimizer learns and improves query plan based on previous queries executed
    Optimizer learns and improves query plan based on previous queries executed
    A B
    C


    Query #1
    A B C
    ⨝ ⨝
    Autopilot
    Statistics Query #2
    A B
    ⨝ U D
    A B
    D
    U

    36 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  36. MySQL HeatWave customer momentum
    Data warehouse, machine learning and OLTP workloads
    Data warehouse, machine learning and OLTP workloads
    https://www.oracle.com/customers/?product=mpd-cld-infra:db-services:mysql-heatwave
    37 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  37. “HeatWave Lakehouse scales out very well for loading data
    from object storage and for running queries on object store…
    This scale out characteristic of HeatWave Lakehouse for data
    management is key to efficiently process very large amounts of
    data.”
    Henry Tullis
    Leader, Cloud Infrastructure and Engineering
    Deloitte Consulting

    View full-size slide

  38. What are industry analysts saying about MySQL HeatWave
    Lakehouse?
    “MySQL HeatWave demonstrates that
    Lakehouse performance can be
    identical to transaction query
    performance—unheard of and even
    unthinkable.”
    “For HeatWave Lakehouse to deliver record
    performance for both loading data and
    querying data is an unprecedented innovation
    in cloud data services.”
    “The ability of HeatWave to load and query data on such
    a massive number of nodes in parallel is the first in the
    industry.”
    “MySQL HeatWave Lakehouse is not your
    typical analytical database architecture, and
    its design engineering will continue to
    push the competitive market forward.”
    “Data lakehouses are meant to bridge the gap between
    data warehouses and data lakes... MySQL HeatWave
    Lakehouse takes that a step further by making cloud
    object storage a first-class citizen.”
    “Simply put: MySQL HeatWave Lakehouse enables you
    to stay ahead of the competition by taking swift action
    on meaningful business insights.”
    “Organizations looking for the best value in the cloud
    data lakehouse landscape must seriously consider
    MySQL HeatWave Lakehouse.”
    “MySQL HeatWave Lakehouse takes customers
    to a new level of capabilities”
    “MySQL HeatWave Lakehouse can simplify
    the life of data management professionals
    and should improve the customer experience.”
    “In the era of
    AI, the ability
    to process
    data is the
    absolute
    demarcation
    between
    companies
    that are going
    to get
    productivity
    and outcomes
    and those that
    won't…”
    “The performance against the big names is pretty
    incredible you know…when you talk about a highly
    specialized accelerated workload, this is a
    tremendously powerful use case...”

    View full-size slide