Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL HeatWave Lakehouse - Modernize MySQL & non-MySQL workloads with MySQL HeatWave

MySQL HeatWave Lakehouse - Modernize MySQL & non-MySQL workloads with MySQL HeatWave

MySQL HeatWave enables users to process and query hundreds of terabytes of data in the object store — in a variety of file formats, such as CSV, Parquet, and Aurora/Redshift export files.
The data remains in the object store and customer can query it with standard SQL syntax.

With this capability, MySQL HeatWave provides one service for transaction processing, analytics across data warehouses and data lakes, and machine learning — without ETL across cloud services.
And with no additional cost for this capability except the cost of storing the data in object store.

Olivier DASINI

October 03, 2023
Tweet

More Decks by Olivier DASINI

Other Decks in Technology

Transcript

  1. MySQL HeatWave Lakehouse
    Olivier Dasini
    MySQL Cloud Principal Solutions Architect EMEA
    [email protected]
    Blogs : www.dasini.net/blog/en
    : www.dasini.net/blog/fr
    Linkedin: www.linkedin.com/in/olivier-dasini
    August 2023
    Modernize MySQL & non-MySQL workloads with MySQL HeatWave

    View full-size slide

  2. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    2
    Me, Myself & I

    MySQL Geek
     Addicted to MySQL for 15+ years
     Playing with databases for 20+ years

    MySQL Writer, Blogger and Speaker
     Also: DBA, Consultant, Architect, Trainer, ...

    MySQL Cloud Principal Solutions Architect EMEA at Oracle

    Stay up to date!
     Blog: www.dasini.net/blog/en
     Linkedin: www.linkedin.com/in/olivier-dasini/
     Twitter: @freshdaz
    Olivier DASINI

    View full-size slide

  3. 45 regions in 23 countries including Paris & Marseille;
    12 Azure Interconnect Regions
    Oracle Cloud Infrastructure Global Locations
    MySQL HeatWave Databases Service(s) is/are part of all of them
    MySQL HeatWave Databases Service(s) is/are part of all of them
    And also Cloud @Customer & EU Soveriegn Cloud
    100% renewable energy by 2025
    3 Copyright © 2023, Oracle and/or its affiliates
    August 2023
    https://www.oracle.com/cloud/public-cloud-regions

    View full-size slide

  4. Oracle Cloud Infrastructure Europe Locations
    MySQL HeatWave Databases Service(s) is/are part of all of them
    MySQL HeatWave Databases Service(s) is/are part of all of them
    4 Copyright © 2023, Oracle and/or its affiliates
    https://www.oracle.com/cloud/public-cloud-regions
    August 2023

    View full-size slide

  5. MySQL HeatWave – optimized for analytics, machine learning
    and OLTP
    5 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  6. Existing applications work without changes
    Oracle Analytics Cloud is integrated with MySQL HeatWave
    Oracle Analytics Cloud is integrated with MySQL HeatWave
    6 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  7. Already best performance in industry for data warehouse
    TPC-H 10TB
    TPC-H 10TB
    13x
    better than
    Redshift
    28x
    better than
    Snowflake
    28x
    better than
    BigQuery
    62x
    better than
    Databricks
    10X ra3.4xlarge X-Large Cluster 800 slots Large Cluster
    Benchmark queries are derived from the TPC-H benchmarks, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specifications.
    4.2x
    faster than
    Redshift
    3.3x
    faster than
    Snowflake
    5.6x
    faster than
    BigQuery
    7.4x
    faster than
    Databricks
    7 Copyright © 2023, Oracle and/or its affiliates
    https://www.oracle.com/mysql/heatwave/performance/#heatwave-on-oci

    View full-size slide

  8. Already lowest cost in industry for data warehouse
    TPC-H 10TB price performance comparison
    TPC-H 10TB price performance comparison
    13x
    better than
    Redshift
    28x
    better than
    Snowflake
    28x
    better than
    BigQuery
    62x
    better than
    Databricks
    3 year
    reserved,
    paid upfront
    Standard
    Edition
    1 year
    reserved
    1 year
    reserved
    Benchmark queries are derived from the TPC-H benchmarks, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specifications.
    8 Copyright © 2023, Oracle and/or its affiliates
    https://www.oracle.com/mysql/heatwave/performance/#heatwave-on-oci

    View full-size slide

  9. Classification
    Classify warranty claims
    Identify similar users
    Recommend movies
    Recommender System
    Loan default prediction
    Predict flight delay
    Rain fall prediction
    Regression
    Predict Advt spend ROI
    Demand forecasting
    Anomaly Detection
    Detect anomalous credit
    card spend
    Identify game hacker
    Fully automated in-database machine learning
    Training, inference, explanation with HeatWave AutoML
    Training, inference, explanation with HeatWave AutoML
    • In-database
    • Secure
    • Fully automated
    • 25x faster than Redshift ML
    • Explainable
    • No additional cost
    Time-series
    forecasting
    9 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  10. MySQL HeatWave is available in multiple clouds
    Optimized for price performance in each cloud
    10 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  11. MySQL HeatWave on AWS
    11 Copyright © 2023, Oracle and/or its affiliates
    • MySQL HeatWave runs natively on AWS, optimized for AWS infrastructure
    • Data doesn’t leave AWS – saves egress cost, and avoids compliance approvals
    • Lowest latency access to MySQL HeatWave
    • Tight integration with the AWS ecosystem – S3, CloudWatch, PrivateLink
    • Easier migration from other databases (e.g., Amazon Aurora, Redshift, Snowflake)
    OCI and AWS Regions – August 2023
    Create & manage a MySQL DB System with a HeatWave Cluster to use with AWS applications
    Create & manage a MySQL DB System with a HeatWave Cluster to use with AWS applications
    https://dev.mysql.com/doc/heatwave-aws/en

    View full-size slide

  12. MySQL HeatWave on Azure
    12 Copyright © 2023, Oracle and/or its affiliates
    • Familiar Azure-native user
    experience
    • Automated identity,
    networking, and monitoring
    integration
    • Private interconnect and
    networking with < 2 ms
    latency
    • Use Microsoft Azure services
    with MySQL HeatWave
    • Collaborative support
    https://www.oracle.com/cloud/azure/oracle-database-for-azure
    Connecting to MySQL HeatWave on OCI from Azure VNET
    Connecting to MySQL HeatWave on OCI from Azure VNET

    View full-size slide

  13. MySQL HeatWave customer momentum
    Data warehouse, machine learning and OLTP workloads
    Data warehouse, machine learning and OLTP workloads
    https://www.oracle.com/customers/?product=mpd-cld-infra:db-services:mysql-heatwave
    13 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  14. • Databases are systems of record
    • Files are repository for other types of data (e.g IoT, web content, log files)
    • 99.5% of collected data remains unused
    Massive amount of data stored outside databases
    Object Store
    Devices
    Sensors
    Social
    Events
    14 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  15. Announced General Availability
    of
    MySQL HeatWave Lakehouse
    15 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  16. Object Store
    Query
    InnoDB
    AWS Aurora
    export
    Redshift
    export
    HeatWave Lakehouse can query object store and MySQL database
    OLTP Analytics Autopilot Machine Learning Lakehouse
    Data stays in object store, processed by HeatWave
    Data stays in object store, processed by HeatWave
    MySQL
    Autopilot
    16 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  17. HeatWave Lakehouse
    Query data in object storage
    • Querying in HeatWave
    • Scale to 512 nodes, 512 TB
    • CSV, Parquet, Aurora &
    Redshift exports
    • Fastest load,
    Best price-performance
    Use standard MySQL syntax Combine OLTP data with
    object store data
    • 100% compatible with
    MySQL syntax
    • Use MySQL Autopilot to
    auto-infer schema, estimate
    capacity, load times, and
    generate load scripts
    • Treat data lake data as tables
    • Use select, join, aggregations,
    filters, etc… to combine data
    in OLTP tables with data lake
    tables
    Main benefits
    Main benefits
    17 Copyright © 2023, Oracle and/or its affiliates
    1 2 3

    View full-size slide

  18. HeatWave scales out
    Scale to any cluster size
    • Scale to any size upto
    512 nodes
    • Scale up or scale down
    Real-time scaling Highly scalable
    • System is always available
    for all operations
    • Data in the cluster is
    always balanced
    • Query performance scales
    very well with cluster size
    • Load performance scales
    very well with cluster size
    Flexible, fast, and highly scalable
    Flexible, fast, and highly scalable
    18 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  19. Fully compatible MySQL syntax generated by Autopilot, no human required
    Fully compatible MySQL syntax generated by Autopilot, no human required
    Three simple steps to query data in object store
    1. Run MySQL Autopilot on data in object store
    mysql> CALL sys.heatwave_load(,);
    OUTPUT: DDLs automatically generated
    2. Execute DDLs generated by Autopilot
    mysql> CREATE TABLE `cust1DB`.`Sensor` (date DATE, degree INT)
    -> ENGINE=Lakehouse SECONDARY_ENGINE=Rapid
    -> ENGINE_ATTRIBUTE = ‘{“file”:[{“prefix”:”sensor1-April”, “par”:””}]}’;
    mysql> ALTER TABLE `cust1DB`.`Sensor` SECONDARY_LOAD;
    3. Query across file and table
    mysql> SELECT count(*) FROM Sensor, SALES WHERE Sensor.degrees > 30 AND Sensor.date = SALES.date;
    19 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  20. MySQL Autopilot: machine learning-powered automation
    Workload aware automation for analytics, OLTP and object store
    Workload aware automation for analytics, OLTP and object store
    20 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  21. Auto provisioning with MySQL HeatWave Lakehouse
    How to determine the right cluster size required for processing data in object store?
    21 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  22. Auto schema inference with HeatWave Lakehouse
    …Even for files that don’t have metadata!
    22 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  23. Inferring schema mapping from file metadata has limitations
    Adaptive data sampling in MySQL Autopilot is fast and accurate
    Adaptive data sampling in MySQL Autopilot is fast and accurate
    MySQL Autopilot performance on 1 node

    Adaptive data
    sampling
    N
    1
    C
    2
    C
    N

    TPCH LINEITEM (TB) Autopilot time (sec)
    1 TB 8
    20 TB 13
    75 TB 15
    200 TB 25
    300 TB 40
    400 TB 47
    23 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  24. Load performance
    MySQL HeatWave is faster to load data and less expensive
    MySQL HeatWave is faster to load data and less expensive
    500 TB TPCH
    HeatWave
    Lakehouse Snowflake Redshift Databricks
    Google
    BigQuery
    Annual Cost $1,742,036 $2,300,160 $1,544,268 $1,822,817 $1,446,900
    Pricing Term PAYG Standard Edition 1 year upfront 1 year reserved 1 year reserved
    Load Time
    (hrs) 4.43
    9.04
    (2x)
    40.86
    (9.2x)
    25.42
    (5.7x)
    38.2
    (8.6x)
    24 Copyright © 2023, Oracle and/or its affiliates
    https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse

    View full-size slide

  25. Load & Query performance comparison – best in the industry
    500 TB TPCH
    HeatWave
    Lakehouse Snowflake Redshift Databricks
    Google
    BigQuery
    Annual Cost $1,742,036 $2,300,160 $1,544,268 $1,822,817 $1,446,900
    Pricing Term PAYG Standard Edition 1 year upfront 1 year reserved 1 year reserved
    Load Time (hrs) 4.43
    9.04
    (2x slower)
    40.86
    (9.2x slower)
    25.42
    (5.7x slower)
    38.2
    (8.6x slower)
    Query Time 2,150 sec
    39,040 sec
    (18x slower)
    32,715 sec
    (15x slower)
    37,729 sec
    (17x slower)
    76,180 sec
    (35x slower)
    25 Copyright © 2023, Oracle and/or its affiliates
    MySQL HeatWave is faster to load & query data and still less expensive
    MySQL HeatWave is faster to load & query data and still less expensive
    https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse

    View full-size slide

  26. MySQL Autopilot boosts query performance of HeatWave Lakehouse
    Optimizer learns and improves query plan based on previous queries executed
    Optimizer learns and improves query plan based on previous queries executed
    A B
    C


    Query #1
    A B C
    ⨝ ⨝
    Autopilot
    Statistics Query #2
    A B
    ⨝ U D
    A B
    D
    U

    26 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  27. Provides flexibility to develop applications on object store without any performance, cost impact
    Provides flexibility to develop applications on object store without any performance, cost impact
    Same query performance when data inside MySQL or in object store
    HeatWave HeatWave
    Lakehouse
    Snowflake Redshift Google Big Query Databricks
    0
    20
    40
    60
    80
    100
    120
    14.2 14.2
    47
    59
    79
    105
    10 TB TPC-H Query performance
    Query time (seconds)
    10 HeatWave nodes, X-Large cluster for Snowflake; 10 nodes of ra3.4xlarge for Redshift; 800 slots for Google BigQuery; Large cluster for Databricks
    *Benchmark queries are derived from the TPC-H benchmark, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specifications.
    27 Copyright © 2023, Oracle and/or its affiliates
    https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse

    View full-size slide

  28. Provides flexibility to develop applications on object store without any performance, cost impact
    Provides flexibility to develop applications on object store without any performance, cost impact
    Same price-performance when data inside MySQL or in object store
    HeatWave HeatWave
    Lakehouse
    Snowflake Redshift Google Big Query Databricks
    0
    10
    20
    30
    40
    50
    60
    70
    80
    90
    100
    1.5 1.5
    41.9
    20.2
    41.4
    92.5
    10TB TPC-H Price-Performance
    Price-Performance (cents)
    • 10 HeatWave Nodes, X-Large cluster for Snowflake; 10 nodes of ra3.4xlarge for Redshift; 800 slots for Google BigQuery; Large cluster for Databricks
    • Standard edition price for Snowflake; 3 yr upfront price for Redshift; 1 year reserved price for Google BigQuery and Databricks
    28 Copyright © 2023, Oracle and/or its affiliates
    https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse

    View full-size slide

  29. HeatWave Lakehouse is integrated with VS-Code for MySQL shell
    29 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  30. 1. Designed to process non-MySQL workloads
    2. Best query performance and load performance for data warehouse
    3. Query data in object store and OLTP data in MySQL database
    4. Data in object store remains in object store
    5. MySQL Autopilot for automating data management
    6. HeatWave scales to 512 HeatWave nodes and 1/2 Petabyte data
    Summary
    Functionality available with MySQL HeatWave in all OCI regions
    Functionality available with MySQL HeatWave in all OCI regions
    30 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  31. For more information:
    https://www.oracle.com/heatwave/#lakehouse
    31 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  32. Get $300 in credits and try free for 30 days
    Get started with
    MySQL HeatWave
    oracle.com/mysql/free
    Learn more about MySQL HeatWave
    oracle.com/mysql
    Request a guided workshop
    Ask your account manager
    32 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  33. Follow us on Social Media
    “Data is the Oxygen of Business”
    33 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  34. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    34
    Merci!
    Q&R
    Olivier Dasini
    MySQL Cloud Principal Solutions Architect EMEA
    [email protected]
    Blogs : www.dasini.net/blog/en
    : www.dasini.net/blog/fr
    Linkedin: www.linkedin.com/in/olivier-dasini
    Twitter : @freshdaz

    View full-size slide

  35. “HeatWave Lakehouse scales out very well for loading data
    from object storage and for running queries on object store…
    This scale out characteristic of HeatWave Lakehouse for data
    management is key to efficiently process very large amounts of
    data.”
    Henry Tullis
    Leader, Cloud Infrastructure and Engineering
    Deloitte Consulting

    View full-size slide

  36. What are industry analysts saying about MySQL HeatWave
    Lakehouse?
    “MySQL HeatWave demonstrates that
    Lakehouse performance can be
    identical to transaction query
    performance—unheard of and even
    unthinkable.”
    “For HeatWave Lakehouse to deliver record
    performance for both loading data and
    querying data is an unprecedented innovation
    in cloud data services.”
    “The ability of HeatWave to load and query data on such
    a massive number of nodes in parallel is the first in the
    industry.”
    “MySQL HeatWave Lakehouse is not your
    typical analytical database architecture, and
    its design engineering will continue to
    push the competitive market forward.”
    “Data lakehouses are meant to bridge the gap between
    data warehouses and data lakes... MySQL HeatWave
    Lakehouse takes that a step further by making cloud
    object storage a first-class citizen.”
    “Simply put: MySQL HeatWave Lakehouse enables you
    to stay ahead of the competition by taking swift action
    on meaningful business insights.”
    “Organizations looking for the best value in the cloud
    data lakehouse landscape must seriously consider
    MySQL HeatWave Lakehouse.”
    “MySQL HeatWave Lakehouse takes customers
    to a new level of capabilities”
    “MySQL HeatWave Lakehouse can simplify
    the life of data management professionals
    and should improve the customer experience.”
    “In the era of
    AI, the ability
    to process
    data is the
    absolute
    demarcation
    between
    companies
    that are going
    to get
    productivity
    and outcomes
    and those that
    won't…”
    “The performance against the big names is pretty
    incredible you know…when you talk about a highly
    specialized accelerated workload, this is a
    tremendously powerful use case...”

    View full-size slide