Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL HeatWave AutoML - Build, train, deploy, & explain machine learning models within MySQL HeatWave

MySQL HeatWave AutoML - Build, train, deploy, & explain machine learning models within MySQL HeatWave

MySQL HeatWave is the only MySQL database service for OLTP, OLAP, ML, and Lakehouse. In this webinar, we will focus on the machine learning capabilities and will explain why it makes it easy to create new ML models for your MySQL applications.

MySQL HeatWave AutoML includes everything users need to build, train, deploy, and explain machine learning models within MySQL HeatWave, at no additional cost.

You will learn:
- What is needed to start using ML with MySQL
- How MySQL HeatWave AutoML works
- What ML algorithms can be used with MySQL HeatWave AutoML
- How to configure ML models with your data
- How to explain the results provided by MySQL HeatWave AutoML

Olivier DASINI

October 03, 2023
Tweet

More Decks by Olivier DASINI

Other Decks in Technology

Transcript

  1. MySQL HeatWave AutoML
    Olivier Dasini
    MySQL Cloud Principal Solutions Architect EMEA
    [email protected]
    Blogs : www.dasini.net/blog/en
    : www.dasini.net/blog/fr
    Linkedin: www.linkedin.com/in/olivier-dasini
    September 2023
    Build, train, deploy, & explain machine learning models within MySQL HeatWave

    View full-size slide

  2. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    2
    Me, Myself & I

    MySQL Geek
     Addicted to MySQL for 15+ years
     Playing with databases for 20+ years

    MySQL Writer, Blogger and Speaker
     Also: DBA, Consultant, Architect, Trainer, ...

    MySQL Cloud Principal Solutions Architect EMEA at Oracle

    Stay up to date!
     Blog: www.dasini.net/blog/en
     Linkedin: www.linkedin.com/in/olivier-dasini/
     Twitter: @freshdaz
    Olivier DASINI

    View full-size slide

  3. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    Agenda
    1. MySQL HeatWave Overview
    2. MySQL HeatWave AutoML
    3. Demo
    4. Summary
    3

    View full-size slide

  4. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    4
    MySQL HeatWave
    Overview

    View full-size slide

  5. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    5
    MySQL Versions Release Cadence
    https://blogs.oracle.com/mysql/post/introducing-mysql-innovation-and-longterm-support-lts-versions
    MySQL Innovation and Long-Term Support (LTS) versions

    View full-size slide

  6. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    6
    MySQL Innovation and Long-Term Support (LTS) versions

    We are now transitioning to a versioning model where you can choose between 2 kinds of releases:
    – Innovation
    – Long-Term Support (LTS)

    Both Innovation and LTS are production-grade quality

    If you desire access to the latest features and improvements and enjoy staying on top of the latest
    technologies, the MySQL Innovation releases might be the best fit for you
    – These releases are ideal for developers and DBAs working in fast-paced development environments with high
    levels of automated tests and modern continuous integration techniques for faster upgrade cycles

    On the other hand, if your environment requires established behavior, then the LTS releases are the way
    to go
    – These releases contain only necessary fixes, so they reduce the risks associated with changes in the database
    software's behavior
    https://blogs.oracle.com/mysql/post/introducing-mysql-innovation-and-longterm-support-lts-versions

    View full-size slide

  7. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    7
    The MySQL universe - The view from the moon…
    MySQL Commercial/Enterprise
    MySQL Community +
    MySQL Enterprise Backup
    MySQL Enterprise Monitor
    MySQL Enterprise Authentication
    MySQL Enterprise Audit
    MySQL Enterprise TDE
    MySQL Enterprise Masking
    MySQL Enterprise Firewall
    MySQL Technical Support

    MySQL Cluster CGE
    MySQL Cluster NDB + MySQL Enterprise +
    MySQL Cluster Manager
    MySQL Community
    MySQL Server
    MySQL Client, Workbench
    MySQL Shell
    MySQL GR plugin & InnoDB Cluster & Router
    MySQL Operator for Kubernetes
    MySQL Connector (C API, Java, Node.js, others)
    MySQL Support for MS VS Code (Preview)

    MySQL Cluster NDB
    MySQL NDB Storage Engine
    MySQL NDB Operator for Kubernetes
    MySQL Cloud Services
    MySQL HeatWave Databases Services (for OLTP)
    MySQL HeatWave (for Analytics)
    MySQL HeatWave AutoML (for Machine Learning)
    MySQL HeatWave on AWS
    MySQL HeatWave on Azure
    MySQL HeatWave Lakehouse
    Community, Enterprise, Cloud Service (HeatWave)
    https://www.mysql.com/products

    View full-size slide

  8. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    8
    … the MySQL Cloud services made by the MySQL Team
    The MySQL HeatWave “Family”
    “MySQL HeatWave”
    MySQL HeatWave for OLTP
    a.k.a MDS: (OLTP)
    General Purpose
    Optimized for OLTP
    MySQL HeatWave Lakehouse
    (Lakehouse)
    MySQL Heatwave
    (Analytics)
    OLTP + Analytics (OLAP)
    + Machine Learning
    InnoDB
    Lakehouse
    InnoDB RAPID InnoDB RAPID

    View full-size slide

  9. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    9
    MySQL HeatWave Cloud Service
    100% developed, managed, and supported by the MySQL team
    Automation MySQL HeatWave
    Database
    High Availability
    Read Replicas
    Backup
    Query Acceleration
    AutoPilot
    AutoML
    Security Patch & Upgrade
    Provision & Configure
    OS
    OS Security Patch & Upgrade
    OS Installation
    Server Hardware Provisioning & Maintenance
    Storage Storage Provisioning & Maintenance
    Data Center
    Rack & Space
    Power, HVAC, Networking

    View full-size slide

  10. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    10
    *Benchmark queries are derived from the TPC-H benchmark, but results are not comparable to published TPC-H benchmark results since they do not comply with the TPC-H specification
    400G, 64 cores
    MySQL HeatWave dramatically speeds up analytic queries

    View full-size slide

  11. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    11
    See documented performance comparisons that show how HeatWave is 6.5X faster than Amazon Redshift at half the cost, 1400X faster than Amazon Aurora at half the cost, and 5400X faster than Amazon RDS for MySQL at two-thirds the cost
    30TB TPCH, MySQL HeatWave is faster, cheaper & easier to use than all the competitive database services
    MySQL HeatWave performance and price comparison
    www.oracle.com/mysql/heatwave/performance

    View full-size slide

  12. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    12
    45 regions in 23 countries including Paris & Marseille;
    12 Azure Interconnect Regions
    Oracle Cloud Infrastructure Global Locations
    MySQL HeatWave Databases Service(s) is/are part of all of them
    And also Cloud @Customer & EU Soveriegn Cloud
    100% renewable energy by 2025
    August 2023
    https://www.oracle.com/cloud/public-cloud-regions

    View full-size slide

  13. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    13
    Oracle Cloud Infrastructure Europe Locations
    MySQL HeatWave Databases Service(s) is/are part of all of them
    https://www.oracle.com/cloud/public-cloud-regions
    August 2023

    View full-size slide

  14. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    14
    MySQL HeatWave is available in multiple clouds
    Optimized for price performance in each cloud

    View full-size slide

  15. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    15
    MySQL HeatWave on AWS
    • MySQL HeatWave runs natively on AWS, optimized for AWS infrastructure
    • Data doesn’t leave AWS – saves egress cost, and avoids compliance approvals
    • Lowest latency access to MySQL HeatWave
    • Tight integration with the AWS ecosystem – S3, CloudWatch, PrivateLink
    • Easier migration from other databases (e.g., Amazon Aurora, Redshift, Snowflake)
    OCI and AWS Regions – August 2023
    Create & manage a MySQL DB System with a HeatWave Cluster to use with AWS applications
    https://dev.mysql.com/doc/heatwave-aws/en

    View full-size slide

  16. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    16
    MySQL HeatWave on Azure
    Connecting to MySQL HeatWave on OCI from Azure VNET
    • Familiar Azure-native user
    experience
    • Automated identity,
    networking, and monitoring
    integration
    • Private interconnect and
    networking with < 2 ms
    latency
    • Use Microsoft Azure services
    with MySQL HeatWave
    • Collaborative support
    https://www.oracle.com/cloud/azure/oracle-database-for-azure

    View full-size slide

  17. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    17
    MySQL HeatWave Lakehouse
    • Query data across MySQL, the
    object store, or both—using
    standard MySQL commands
    • Up to 400 TB of data—the HeatWave
    cluster scales to 512 nodes
    • Querying the database is as fast as
    querying the object store
    NEW
    Process and query 100’s of TB of data in the object store — in a variety of file formats

    View full-size slide

  18. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    18
    MySQL HeatWave can process data from multiple data sources
    e.g. Oracle Golden Gate, ...
    OCI Object Storage
    AWS Aurora
    AWS Redshift
    Data can be in a file or other databases → No requirement to have data in MySQL
    https://www.mysql.com/products/mysqlheatwave/lakehouse

    View full-size slide

  19. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    19
    Provides flexibility to develop applications on object store without any performance, cost impact
    Same price-performance when data inside MySQL or in object store
    HeatWave HeatWave
    Lakehouse
    Snowflake Redshift Google Big Query Databricks
    0
    10
    20
    30
    40
    50
    60
    70
    80
    90
    100
    1.5 1.5
    41.9
    20.2
    41.4
    92.5
    10TB TPC-H Price-Performance
    Price-Performance (cents)
    • 10 HeatWave Nodes, X-Large cluster for Snowflake; 10 nodes of ra3.4xlarge for Redshift; 800 slots for Google BigQuery; Large cluster for Databricks
    • Standard edition price for Snowflake; 3 yr upfront price for Redshift; 1 year reserved price for Google BigQuery and Databricks
    https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse

    View full-size slide

  20. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    20
    MySQL Autopilot: Machine Learning-powered automation
    Workload aware automation for analytics, OLTP and object store

    View full-size slide

  21. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    21
    MySQL HeatWave AutoML
    Build, train, deploy, & explain machine learning
    models within
    MySQL HeatWave, at no additional cost

    View full-size slide

  22. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    22
    Machine Learning workflow

    View full-size slide

  23. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    23
    Need to ETL data to a separate ML solution for training & inference
    And it gets worse when
    using other databases...
    • Complex, time-consuming
    • Increases costs and risks
    • Need to learn new tools/languages

    View full-size slide

  24. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    24
    In-database machine learning with MySQL HeatWave
    Accelerate ML initiatives, increase security, and reduce costs

    View full-size slide

  25. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    25
    HeatWave AutoML automates the ML lifecycle & all models can be explained
    Dataset
    Data preprocessing
    Algorithm selection
    Adaptive sampling
    Feature selection
    Hyper-parameter tuning
    Tuned model
    Model explainer
    Prediction explainer
    Regulatory compliance
    Fairness
    Repeatability
    Causality
    Trust
    Leverages Oracle AutoML technology to automate the process of training a machine learning model
    https://dev.mysql.com/doc/heatwave/en/heatwave-machine-learning.html

    View full-size slide

  26. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    26
    MySQL HeatWave AutoML uses a set of SQL routines
    Machine Learning with MySQL HeatWave is so simple

    You only need to use a limited set of SQL routines:

    ML_TRAIN: Trains a machine learning model for a given training dataset

    ML_PREDICT_ROW: Makes predictions for one or more rows of data

    ML_PREDICT_TABLE: Makes predictions for a table of data

    ML_EXPLAIN_ROW: Explains predictions for one or more rows of data

    ML_EXPLAIN_TABLE: Explains predictions for a table of data

    ML_SCORE: Computes the quality of a model

    ML_MODEL_LOAD: Loads a machine learning model for predictions and explanations

    ML_MODEL_UNLOAD: Unloads a machine learning model

    In addition, with MySQL HeatWave ML, there is no need to move or reformat your data

    Data and machine learning models never leave the MySQL HeatWave Database Service,
    which saves you time and effort while keeping your data and models secure

    View full-size slide

  27. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    27
    Classification
    Classify warranty claims
    Identify similar users
    Recommend movies
    Recommender System
    Loan default prediction
    Predict flight delay
    Rain fall prediction
    Regression
    Predict Advt spend ROI
    Demand forecasting
    Anomaly Detection
    Detect anomalous credit
    card spend
    Identify game hacker
    Fully automated in-database machine learning
    • In-database
    • Secure
    • Fully automated
    • 25x faster than Redshift ML
    • Explainable
    • No additional cost
    Time-series
    forecasting
    Training, inference, explanation with HeatWave AutoML

    View full-size slide

  28. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    28
    MySQL HeatWave AutoML is 25x faster than Redshift ML
    25x faster on average without compromising accuracy
    www.oracle.com/mysql/heatwave/performance

    View full-size slide

  29. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    29
    MySQL HeatWave AutoML is 1% of cost of Redshift ML
    No additional cost for MySQL HeatWave customers
    www.oracle.com/mysql/heatwave/performance

    View full-size slide

  30. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    30
     Produces more accurate results
     Trains models 25X faster on average
     1% of the cost
     Scales as more modes are added
    HeatWave AutoML vs Redshift ML: Benchmarks summary
    See Benchmark details:
    https://www.oracle.com/mysql/heatwave/performance/

    View full-size slide

  31. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    31
    Machine learning in action with MySQL HeatWave
    OLTP
    Applications
    Social
    ECommerce
    FinTech
    SaaS
    Analytics
    Tools
    Real-time ML
    recommendations
    Real-time
    analytics on
    trends

    View full-size slide

  32. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    32
    MySQL HeatWave - OLTP + OLAP + ML
    Example : Loan Approval

    View full-size slide

  33. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    33
    MySQL HeatWave AutoML – Run Machine Learning on existing cluster
    Build, train, deploy, & explain ML models within MySQL HeatWave, at no additional cost
    Single MySQL database for all
    applications
    All existing applications work
    without any changes
    Train, inference & explain
    within the database
    No need to learn new
    language or ML packages

    View full-size slide

  34. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    34
    Demo
    xxxxxxxx

    View full-size slide

  35. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    35
    Classification task – Iris dataset
    Some domain expertise – a little botany!
    Live Demo
    An Iris
    • Variant shown is an Iris Versicolor
    • Iris Virginica & Iris Setosa also available in the dataset
    The parts of an iris that we believe might
    help identify the iris variant:
    • Petal length
    • Petal width
    • Sepal length
    • Sepal width

    View full-size slide

  36. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    36
    MySQL HeatWave AutoML
    Set up the environment
    • MySQL HeatWave
    Create Model
    • Prepare and load data
    • Train a machine learning model (use training data)
    • Explain how the model works & score it for accuracy
    (use validation data)
    Load & Invoke Model
    • Load the model into HeatWave
    • Make predictions on new sets of data
    • Explain the reasons for the predictions
    Check Model Quality
    • With new current validation data
    • Score the model for accuracy
    • If the score has deteriorated
    • Revisit model training, etc.
    Usage Overview

    View full-size slide

  37. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    37
    Connect
    Train
    Predict
    Explain
    Environment Setup
    • Interactive data science environments such as Zeppelin or Jupyter can be used
    • Alternatively just use MySQL Shell ( or any MySQL Client )
    Client Connectivity

    View full-size slide

  38. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    38
    HeatWave cluster, ML schema and user creation
    Create an instance of HeatWave in either OCI (preferred) or AWS
    • In OCI use 1 or more 512GB cluster nodes
    • In AWS you must use the 256GB nodes
    • Version options
    • Either the stable release (8.0 series): currently 8.0.34; next release will be 8.0.35
    • Or the latest innovation release (8.n where n >= 1): currently 8.1, next release will be 8.2
    Connect to HeatWave (as the admin user) and create a schema and user
    • The schema is where data used to train and validate the model will be held.
    • It will later hold a table of test data on which the model will make predictions
    • In the example below the schema is called ml_iris; it could be called anything
    • User must have select and alter grants on this schema
    • User must have select and execute grants on the sys schema objects
    • The sys schema is where the HeatWave ML routines reside
    mysql> CREATE SCHEMA ml_iris;
    mysql> GRANT SELECT, ALTER ON ml_iris.* TO 'freshdaz'@'%';
    mysql> GRANT SELECT, EXECUTE ON sys.* TO 'freshdaz'@'%';
    Environment Setup

    View full-size slide

  39. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    39
    Data Preparation
    Generic constraints on data
    • Maximum size of tables
    • 10GB, up to100 million rows and1017 columns
    • Only supported data types can be used
    • Check here https://dev.mysql.com/doc/heatwave/en/hwml-supported-data-types.html
    • Date types – use numeric equivalents or possibly convert then to char or varchar
    • NaN values must be replaced with NULL
    Category based constraints
    • Each ML Category (classification, regression, forecasting, anomaly detection &
    recommendation) has requirements with respect to the column makeup of the training,
    validation and test tables, for example:
    • Regression use cases require that the target column must be of a numeric type
    • Forecasting requires a datetime index column
    • Classification use cases demand that the target column has at least 2 distinct values and each
    distinct value should appear in 5 or more rows
    • Always read the documentation for the category you are investigating
    Constraints and limitations

    View full-size slide

  40. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    40
    Data Preparation
    Tables
    • Training table
    • From which we will derive our model
    • Structure consists of a number of feature columns and one target column
    • The feature columns are inputs
    • The target column is an output (a.k.a. the label)
    • Validation table
    • To allow us to score the model’s accuracy and reliability
    • Has the same structure as the training table (i.e. with target/label column)
    • Ground truths – i.e. we know the output (target/label) is correct for the inputs (features)
    • Test table (not immediately required)
    • Holds the data which we will apply to the model and make predictions on
    • Structure is the same as the training table but without a target column
    Tables

    View full-size slide

  41. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    41
    Labelled and unlabelled data illustrated (in the context of our classification use-case)
    Data Preparation
    inputs (feature columns)
    outputs
    (target column)
    Sepal Length Sepal Width Petal Length Petal Width Class
    6.4 2.8 5.6 2.2 Iris-virginica
    5 2.3 3.3 1 Iris-setosa
    4.9 2.5 4.5 1.7 Iris-virginica
    4.9 3.1 1.5 0.1 Iris-versicolor
    7.7 3.8 1.7 0.3 Iris-versicolor
    … … … … …
    Sepal Length Sepal Width Petal Length Petal Width
    5.9 3 4.2 1.5
    6.9 3.1 5.4 2.1
    5.1 3.3 1.7 0.5
    6 3.4 4.5 1.6
    5.5 2.3 4 1.3
    … … … …
    inputs only
    Labelled Data
    • Each row has a label (target column):
    • iris-virginica, iris-setosa, etc., are labels
    • The training table will take this form
    • The validation table will take this form
    • different row data to the training table
    • known truths
    Unlabelled Data
    • Only inputs, no target column (labels)
    • Used to store test data
    • Model will be applied to table and for each row a
    prediction will be made as to which type of Iris can
    be inferred from the petal/sepal data
    • When explained both the prediction and
    determining features will be provided

    View full-size slide

  42. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    42
    Summary
    1. Understand your use case and select the appropriate category of machine learning
    2. Understand which inputs you are going to use and the output (label) for each row of inputs
    3. Source your data for training and validation, for example
    i. From a flat file (e.g. a csv)
    ii. From another table in another schema (e.g. CREATE TABLE AS SELECT)
    iii. Or by some other method (e.g. sql script)
    4. Remember that you will need sufficient data for both training and validation and that the
    data in the validation table must be different to that in the training table
    Data Preparation

    View full-size slide

  43. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    43
    Training
    Using the ML_TRAIN stored procedure
    Training a model is as simple as calling a stored procedure:
    • In the following example we want to use machine learning to identify/differentiate iris plants based on
    their features:
    • A classification type use case
    • Assume the following
    • Our machine learning schema is called, ml_iris (which was created in the data preparation stage)
    • Our training table is called, train
    • The target column in the train table is called, class
    • The variable @iris_model will be assigned the resultant model’s handle
    mysql> CALL sys.ML_TRAIN('ml_iris.train', 'class', JSON_OBJECT('task','classification'),@iris_model);
    Query OK, 0 rows affected (17.765 sec)
    • The JSON_OBJECT parameter provides additional information to allow training to take place
    • Its arguments take the form of zero, one or more comma separated key-value pairs
    • If an argument is not specified then defaults will be used
    • It makes sense to always provide the task key-value pair which describes the category of ML
    • Some arguments are task specific, others like ‘model_list’ are more generic – read the docs

    View full-size slide

  44. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    44
    Training
    Each user of HeatWave ML will have their own model catalog
    The first time ML_TRAIN is called by a user (e.g. freshdaz@’%’) it will:
    • Create a schema whose name will take the form:
    • ML_SCHEMA_ , e.g. ML_SCHEMA_freshdaz
    • Create a table called, MODEL_CATALOG, in this schema
    • In this table it will create a row with a handle to the created model
    • Note the model’s handle will also be assign to the variable, @iris_model, by ML_TRAIN
    Subsequent ML_TRAIN calls will create further rows in the MODEL_CATALOG table
    Predictions and explanations will use the model handle.
    The model catalog

    View full-size slide

  45. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    45
    Load the Model into HeatWave
    Before we can make test the model and make predictions we need to load the model
    into HeatWave
    • NULL argument indicates procedure is being run by the schema/model owner
    • See also https://dev.mysql.com/doc/heatwave/en/hwml-model-sharing.html
    mysql> CALL sys.ML_LOAD(@iris_model,NULL);
    Query OK, 0 rows affected (0.8615 sec)
    Remember to unload the model
    • When not in use – saves resource
    • When updating the model
    • It’s easy to create a new model and continue to use the old model by mistake
    mysql> CALL sys.ML_UNLOAD(@iris_model);
    Query OK, 0 rows affected (0.704 sec)
    All interactions with the model use the HeatWave cluster nodes

    View full-size slide

  46. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    46
    Understanding how the model works
    By default ML_TRAIN will train the Permutation Importance explainer for both model and
    prediction explanations (version 8.0.34, earlier and later versions may vary)
    HeatWave ML provides alternative explainers:
    • Partial Dependence
    • Shap
    • Fast Shap
    • Permutation Importance
    The explainer your model is using can be found by querying the MODEL_CATALOG table
    • This tells us which features are the most important in making a prediction
    Checking the model for accuracy
    mysql> SELECT model_explanation FROM ML_SCHEMA_freshdaz.MODEL_CATALOG WHERE model_handle=@iris_model\G
    *************************** 1. row ***************************
    model_explanation: {"permutation_importance": {"petal width": 0.478, "sepal width": 0.0, "petal length":
    0.3338, "sepal length": 0.0}}
    1 row in set (0.0006 sec)

    View full-size slide

  47. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    47
    Changing explainers
    To change and retrain the explainer for either the model or predictions or both, use
    ML_EXPLAIN() and provide the following parameters:
    • The same training data used to create the model (ml_iris.train)
    • The model’s label/target column (class)
    • A JSON_OBJECT to specify the model and prediction explainers to be used
    • Explainer specific options can also be added to this object – read the docs
    Review the outcome by querying the model catalog
    Checking the model for accuracy
    mysql> CALL sys.ML_EXPLAIN('ml_iris.train', 'class', @iris_model, JSON_OBJECT('model_explainer',
    'shap', 'prediction_explainer', 'permutation_importance'));
    Query OK, 0 rows affected (8.6091 sec)
    mysql> SELECT model_explanation FROM ML_SCHEMA_freshdaz.MODEL_CATALOG WHERE model_handle=@iris_model\G
    *************************** 1. row ***************************
    model_explanation: {"shap": {"petal width": 0.358, "sepal width": 0.0, "petal length": 0.2815, "sepal
    length": 0.0}, "permutation_importance": {"petal width": 0.478, "sepal width": 0.0, "petal length":
    0.3338, "sepal length": 0.0}}
    1 row in set (0.0006 sec)

    View full-size slide

  48. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    48
    Scoring the model
    To score the model we need to use the ML_SCORE procedure and pass it the following
    parameters:
    • A table of test data that is different to the training data – our validation data table, ml_iris.validate
    • The label/target column, class
    • The model, @iris_model
    • A metric type – balanced_accuracy is used others exist – read the docs
    • A variable to store the result in, @score
    • The last parameter is used with anomaly_detection tasks and has no utility here, hence it is set to NULL
    Checking the model for accuracy
    mysql> CALL sys.ML_SCORE('ml_iris.validate', 'class', @iris_model, 'balanced_accuracy', @score, NULL);
    Query OK, 0 rows affected (1.0497 sec)
    mysql> SELECT @score;
    +--------------------+
    | @score |
    +--------------------+
    | 0.9583333134651184 |
    +--------------------+
    1 row in set (0.0005 sec)

    View full-size slide

  49. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    49
    Using the Model
    There are two methods for making predictions
    1. ML_PREDICT_ROW (implemented as a function)
    • Generates predictions for one or more rows of input features (unlabeled data) specified in JSON format. For
    convenience assign the JSON object to a variable, e.g. @row_input
    • The third (options) parameter should be set to NULL for all tasks (categories) other than anomaly detection
    and recommendation. Read the docs – be aware of MySQL version differences (8.0.34 shown).
    • Output (i.e. the prediction(s)) is printed to screen.
    2. ML_PREDICT_TABLE (implemented as a stored procedure)
    • Takes a table (e.g. ml_iris.test) with one or more rows of input features.
    • The fourth (options) parameter should be set to NULL for all tasks (categories) other than anomaly
    detection and recommendation. Read the docs – be aware of MySQL version differences (8.0.34 shown).
    • Output is written to the table given by the third parameter (e.g. ml_iris.predict). This table will be created.
    mysql> SET @row_input = JSON_OBJECT("sepal length", 7.3, "sepal width", 2.9, "petal length", 6.3,
    "petal width", 1.8);
    mysql> SELECT sys.ML_PREDICT_ROW(@row_input, @iris_model, NULL);
    ...
    mysql> CALL sys.ML_PREDICT_TABLE('ml_iris.test', @iris_model, 'ml_iris.predict', NULL);
    ...
    Making predictions

    View full-size slide

  50. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    50
    Prediction output
    Using the Model
    mysql> CALL sys.ML_PREDICT_TABLE('ml_iris.test, @iris_model, 'ml_iris.predict', NULL);
    Query OK, 0 rows affected (1.0630 sec)
    mysql> SELECT * FROM ml_iris.predict LIMIT 2\G
    *************************** 1. row ***************************
    _id: 1
    sepal length: 5.9
    sepal width: 3
    petal length: 4.2
    petal width: 1.5
    Prediction: Iris-setosa
    ml_results: {"predictions": {"class": "Iris-setosa"}, "probabilities": {"Iris-setosa": 0.99,
    "Iris-versicolor": 0.0, "Iris-virginica": 0.01}}
    *************************** 2. row ***************************
    _id: 2
    sepal length: 6.9
    sepal width: 3.1
    petal length: 5.4
    petal width: 2.1
    Prediction: Iris-virginica
    ml_results: {"predictions": {"class": "Iris-virginica"}, "probabilities": {"Iris-setosa": 0.0,
    "Iris-versicolor": 0.0, "Iris-virginica": 1.0}}
    2 rows in set (0.0006 sec)
    mysql>
    inputs
    inputs
    Prediction: the model is 99% sure the Iris is an Iris-setosa,
    but there is a 1% chance it could be an Iris-virginica
    Prediction: the model is certain the Iris is an Iris-virginica
    (but remember the model’s accuracy score is 95.8%)

    View full-size slide

  51. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    51
    Explaining the predictions
    So far we only have predictions and probabilities as to their accuracy, we now need to explain them. Two methods:
    1. ML_EXPLAIN_ROW (implemented as a function)
    • The first parameter must be the data used in the ML_PREDICT_ROW call, in our case the variable
    @row_input
    • The second parameter must be the model, in our case, @iris_model
    • The third parameter is a JSON object reiterating the prediction_explainer being used
    2. ML_EXPLAIN_TABLE (implemented as a procedure)
    • Similar set of parameters to ML_EXPLAIN_ROW but
    ‒ instead of @row_input we will use the test table that was used with ML_PREDICT_TABLE
    ‒ And we insert a third parameter which gives the name of the table, explanations, where we will store the
    explanations. Note the stored procedure call will create this table.
    Using the Model
    mysql> SELECT sys.ML_EXPLAIN_ROW(@row_input, @iris_model, JSON_OBJECT('prediction_explainer',
    'permutation_importance'));
    ...
    mysql> CALL sys.ML_EXPLAIN_TABLE('ml_iris.test', @iris_model, 'ml_iris.explanations',
    JSON_OBJECT('prediction_explainer', 'permutation_importance'));
    ...
    mysql>

    View full-size slide

  52. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    52
    Examining explanation output
    Using the Model
    mysql CALL sys.ML_EXPLAIN_TABLE('ml_iris.test', @iris_model, 'ml_iris.explanations',
    JSON_OBJECT('prediction_explainer', 'permutation_importance'));
    Query OK, 0 rows affected (7.4690 sec)
    mysql> SELECT * FROM ml_iris.explanations LIMIT 2,1\G
    _id: 2
    sepal length: 6.9
    sepal width: 3.1
    petal length: 5.4
    petal width: 2.1
    Prediction: Iris-virginica
    Notes: petal length (5.4) had the largest impact towards predicting Iris-virginica
    petal length_attribution: 0.97
    petal width_attribution: 0.44
    ml_results: {'attributions': {'petal length': 0.97, 'petal width': 0.44},
    'predictions': {'class': 'Iris-virginica'}, 'notes': 'petal length (5.4) had the largest impact
    towards predicting Iris-virginica'}
    2 rows in set (0.0006 sec)
    mysql>
    inputs
    Explanation – plain English summary (notes) with attributions
    (reiterated in JSON formatted ml_results column)

    View full-size slide

  53. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    53
    Using the Model
    • Explained output differs from predicted output because it shows attributions
    • Attributions detail the amount of influence a feature has (or has not) on a prediction
    • Attributions range from -1 to 1
    • Positive values indicate that a feature contributed towards the prediction
    • Negative values indicate that a feature contributed towards a different prediction
    • Zero or near-zero values indicate that the feature made no impact on the prediction
    Explanation Attribution

    View full-size slide

  54. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    54
    Summary
    MySQL HeatWave AutoML democratizes ML

    View full-size slide

  55. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    55
    MySQL HeatWave AutoML democratizes machine learning
    • Fully automated training enables citizen data scientists
    • Keeping everything in the database simplifies the solution and reduces cost
    • No ETL to implement and maintain
    • No additional licenses
    • No dependency matrix of software versions
    • MySQL HeatWave AutoML is affordable
    • New customers pay 1-2% compared to RedShift ML
    • Customers already using MySQL HeatWave effectively get it for free
    • Enables small-medium sized business to gain competitive advantage from machine learning
    • MySQL HeatWave AutoML is explainable
    • Both model and predictions
    • Consumers will trust and regulators will approve of
    • MySQL HeatWave is secure
    • Data remains in the database
    Summary
    Documentation: https://dev.mysql.com/doc/heatwave/en/mys-hwaml-machine-learning.html
    In-database Machine Learning

    View full-size slide

  56. Get $300 in credits and try free for 30 days
    Get started with
    MySQL HeatWave
    oracle.com/mysql/free
    Learn more about MySQL HeatWave
    oracle.com/mysql
    Request a guided workshop
    Ask your account manager
    56 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  57. Follow us on Social Media
    “Data is the Oxygen of Business”
    57 Copyright © 2023, Oracle and/or its affiliates

    View full-size slide

  58. Merci!
    Q&R
    Olivier Dasini
    MySQL Cloud Principal Solutions Architect EMEA
    [email protected]
    Blogs : www.dasini.net/blog/en
    : www.dasini.net/blog/fr
    Linkedin: www.linkedin.com/in/olivier-dasini
    Twitter : @freshdaz

    View full-size slide

  59. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    60
    “Oracle announced MySQL HeatWave with Autopilot last August, which may very well have been the
    single greatest innovation in open source cloud databases in the last 20 years to that point. Now Oracle
    has gone beyond its original unifying of OLTP and OLAP in HeatWave, with MySQL HeatWave ML. Oracle
    is bringing all of the machine learning processing and models inside the database, so that customers not
    only avoid managing ML databases apart from the core database, but also eliminate the hassles of ETL,
    gaining speed, accuracy, and cost-effectiveness in the bargain.”
    “This latest announcement from Oracle is the third major release of MySQL HeatWave in just over 12
    months. Oracle has delivered more cloud database innovations during that timeframe than most cloud
    database vendors have delivered in the last decade. Not only does the in-database HeatWave ML make
    Redshift ML look like yesterday’s tech in terms of engineering, performance and cost, but the latest
    MySQL HeatWave TPC-DS benchmarks demonstrate that Amazon Redshift, Snowflake, Azure Synapse
    and Google BigQuery are all slower and more expensive. It’s rather clear who’s innovating in cloud
    databases and who’s being complacent.”
    Feedback from analysts

    View full-size slide

  60. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.
    61
    "We recently had an opportunity to use the machine learning capabilities of HeatWave ML. We found it
    very innovative, easy to use, very fast and most important it is secure since the data or the model don’t
    leave the database. We believe that providing native in-database machine learning is of significant
    interest to our clients and will further accelerate the adoption of MySQL HeatWave“
    Arvind Rajan, CEO
    “To satisfy the growing need for explainability of ML models and outcomes, HeatWave ML delivers robust
    and comprehensive explanation capabilities focused on usability, interpretability, quality, performance,
    and repeatability at scale…it’s no wonder that enterprises continue to look to HeatWave to set
    themselves up for transformational data success.”
    Feedback from analysts, customers

    View full-size slide