Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL HeatWave AutoML - Build, train, deploy, &...

MySQL HeatWave AutoML - Build, train, deploy, & explain machine learning models within MySQL HeatWave

MySQL HeatWave is the only MySQL database service for OLTP, OLAP, ML, and Lakehouse. In this webinar, we will focus on the machine learning capabilities and will explain why it makes it easy to create new ML models for your MySQL applications.

MySQL HeatWave AutoML includes everything users need to build, train, deploy, and explain machine learning models within MySQL HeatWave, at no additional cost.

You will learn:
- What is needed to start using ML with MySQL
- How MySQL HeatWave AutoML works
- What ML algorithms can be used with MySQL HeatWave AutoML
- How to configure ML models with your data
- How to explain the results provided by MySQL HeatWave AutoML

Olivier DASINI

October 03, 2023
Tweet

More Decks by Olivier DASINI

Other Decks in Technology

Transcript

  1. MySQL HeatWave AutoML Olivier Dasini MySQL Cloud Principal Solutions Architect

    EMEA [email protected] Blogs : www.dasini.net/blog/en : www.dasini.net/blog/fr Linkedin: www.linkedin.com/in/olivier-dasini September 2023 Build, train, deploy, & explain machine learning models within MySQL HeatWave
  2. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    2 Me, Myself & I  MySQL Geek  Addicted to MySQL for 15+ years  Playing with databases for 20+ years  MySQL Writer, Blogger and Speaker  Also: DBA, Consultant, Architect, Trainer, ...  MySQL Cloud Principal Solutions Architect EMEA at Oracle  Stay up to date!  Blog: www.dasini.net/blog/en  Linkedin: www.linkedin.com/in/olivier-dasini/  Twitter: @freshdaz Olivier DASINI
  3. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    Agenda 1. MySQL HeatWave Overview 2. MySQL HeatWave AutoML 3. Demo 4. Summary 3
  4. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    5 MySQL Versions Release Cadence https://blogs.oracle.com/mysql/post/introducing-mysql-innovation-and-longterm-support-lts-versions MySQL Innovation and Long-Term Support (LTS) versions
  5. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    6 MySQL Innovation and Long-Term Support (LTS) versions • We are now transitioning to a versioning model where you can choose between 2 kinds of releases: – Innovation – Long-Term Support (LTS) • Both Innovation and LTS are production-grade quality • If you desire access to the latest features and improvements and enjoy staying on top of the latest technologies, the MySQL Innovation releases might be the best fit for you – These releases are ideal for developers and DBAs working in fast-paced development environments with high levels of automated tests and modern continuous integration techniques for faster upgrade cycles • On the other hand, if your environment requires established behavior, then the LTS releases are the way to go – These releases contain only necessary fixes, so they reduce the risks associated with changes in the database software's behavior https://blogs.oracle.com/mysql/post/introducing-mysql-innovation-and-longterm-support-lts-versions
  6. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    7 The MySQL universe - The view from the moon… MySQL Commercial/Enterprise MySQL Community + MySQL Enterprise Backup MySQL Enterprise Monitor MySQL Enterprise Authentication MySQL Enterprise Audit MySQL Enterprise TDE MySQL Enterprise Masking MySQL Enterprise Firewall MySQL Technical Support … MySQL Cluster CGE MySQL Cluster NDB + MySQL Enterprise + MySQL Cluster Manager MySQL Community MySQL Server MySQL Client, Workbench MySQL Shell MySQL GR plugin & InnoDB Cluster & Router MySQL Operator for Kubernetes MySQL Connector (C API, Java, Node.js, others) MySQL Support for MS VS Code (Preview) … MySQL Cluster NDB MySQL NDB Storage Engine MySQL NDB Operator for Kubernetes MySQL Cloud Services MySQL HeatWave Databases Services (for OLTP) MySQL HeatWave (for Analytics) MySQL HeatWave AutoML (for Machine Learning) MySQL HeatWave on AWS MySQL HeatWave on Azure MySQL HeatWave Lakehouse Community, Enterprise, Cloud Service (HeatWave) https://www.mysql.com/products
  7. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    8 … the MySQL Cloud services made by the MySQL Team The MySQL HeatWave “Family” “MySQL HeatWave” MySQL HeatWave for OLTP a.k.a MDS: (OLTP) General Purpose Optimized for OLTP MySQL HeatWave Lakehouse (Lakehouse) MySQL Heatwave (Analytics) OLTP + Analytics (OLAP) + Machine Learning InnoDB Lakehouse InnoDB RAPID InnoDB RAPID
  8. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    9 MySQL HeatWave Cloud Service 100% developed, managed, and supported by the MySQL team Automation MySQL HeatWave Database High Availability Read Replicas Backup Query Acceleration AutoPilot AutoML Security Patch & Upgrade Provision & Configure OS OS Security Patch & Upgrade OS Installation Server Hardware Provisioning & Maintenance Storage Storage Provisioning & Maintenance Data Center Rack & Space Power, HVAC, Networking
  9. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    10 *Benchmark queries are derived from the TPC-H benchmark, but results are not comparable to published TPC-H benchmark results since they do not comply with the TPC-H specification 400G, 64 cores MySQL HeatWave dramatically speeds up analytic queries
  10. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    11 See documented performance comparisons that show how HeatWave is 6.5X faster than Amazon Redshift at half the cost, 1400X faster than Amazon Aurora at half the cost, and 5400X faster than Amazon RDS for MySQL at two-thirds the cost 30TB TPCH, MySQL HeatWave is faster, cheaper & easier to use than all the competitive database services MySQL HeatWave performance and price comparison www.oracle.com/mysql/heatwave/performance
  11. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    12 45 regions in 23 countries including Paris & Marseille; 12 Azure Interconnect Regions Oracle Cloud Infrastructure Global Locations MySQL HeatWave Databases Service(s) is/are part of all of them And also Cloud @Customer & EU Soveriegn Cloud 100% renewable energy by 2025 August 2023 https://www.oracle.com/cloud/public-cloud-regions
  12. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    13 Oracle Cloud Infrastructure Europe Locations MySQL HeatWave Databases Service(s) is/are part of all of them https://www.oracle.com/cloud/public-cloud-regions August 2023
  13. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    14 MySQL HeatWave is available in multiple clouds Optimized for price performance in each cloud
  14. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    15 MySQL HeatWave on AWS • MySQL HeatWave runs natively on AWS, optimized for AWS infrastructure • Data doesn’t leave AWS – saves egress cost, and avoids compliance approvals • Lowest latency access to MySQL HeatWave • Tight integration with the AWS ecosystem – S3, CloudWatch, PrivateLink • Easier migration from other databases (e.g., Amazon Aurora, Redshift, Snowflake) OCI and AWS Regions – August 2023 Create & manage a MySQL DB System with a HeatWave Cluster to use with AWS applications https://dev.mysql.com/doc/heatwave-aws/en
  15. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    16 MySQL HeatWave on Azure Connecting to MySQL HeatWave on OCI from Azure VNET • Familiar Azure-native user experience • Automated identity, networking, and monitoring integration • Private interconnect and networking with < 2 ms latency • Use Microsoft Azure services with MySQL HeatWave • Collaborative support https://www.oracle.com/cloud/azure/oracle-database-for-azure
  16. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    17 MySQL HeatWave Lakehouse • Query data across MySQL, the object store, or both—using standard MySQL commands • Up to 400 TB of data—the HeatWave cluster scales to 512 nodes • Querying the database is as fast as querying the object store NEW Process and query 100’s of TB of data in the object store — in a variety of file formats
  17. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    18 MySQL HeatWave can process data from multiple data sources e.g. Oracle Golden Gate, ... OCI Object Storage AWS Aurora AWS Redshift Data can be in a file or other databases → No requirement to have data in MySQL https://www.mysql.com/products/mysqlheatwave/lakehouse
  18. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    19 Provides flexibility to develop applications on object store without any performance, cost impact Same price-performance when data inside MySQL or in object store HeatWave HeatWave Lakehouse Snowflake Redshift Google Big Query Databricks 0 10 20 30 40 50 60 70 80 90 100 1.5 1.5 41.9 20.2 41.4 92.5 10TB TPC-H Price-Performance Price-Performance (cents) • 10 HeatWave Nodes, X-Large cluster for Snowflake; 10 nodes of ra3.4xlarge for Redshift; 800 slots for Google BigQuery; Large cluster for Databricks • Standard edition price for Snowflake; 3 yr upfront price for Redshift; 1 year reserved price for Google BigQuery and Databricks https://www.oracle.com/mysql/heatwave/performance/#heatwave-lakehouse
  19. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    20 MySQL Autopilot: Machine Learning-powered automation Workload aware automation for analytics, OLTP and object store
  20. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    21 MySQL HeatWave AutoML Build, train, deploy, & explain machine learning models within MySQL HeatWave, at no additional cost
  21. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    23 Need to ETL data to a separate ML solution for training & inference And it gets worse when using other databases... • Complex, time-consuming • Increases costs and risks • Need to learn new tools/languages
  22. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    24 In-database machine learning with MySQL HeatWave Accelerate ML initiatives, increase security, and reduce costs
  23. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    25 HeatWave AutoML automates the ML lifecycle & all models can be explained Dataset Data preprocessing Algorithm selection Adaptive sampling Feature selection Hyper-parameter tuning Tuned model Model explainer Prediction explainer Regulatory compliance Fairness Repeatability Causality Trust Leverages Oracle AutoML technology to automate the process of training a machine learning model https://dev.mysql.com/doc/heatwave/en/heatwave-machine-learning.html
  24. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    26 MySQL HeatWave AutoML uses a set of SQL routines Machine Learning with MySQL HeatWave is so simple • You only need to use a limited set of SQL routines: ✔ ML_TRAIN: Trains a machine learning model for a given training dataset ✔ ML_PREDICT_ROW: Makes predictions for one or more rows of data ✔ ML_PREDICT_TABLE: Makes predictions for a table of data ✔ ML_EXPLAIN_ROW: Explains predictions for one or more rows of data ✔ ML_EXPLAIN_TABLE: Explains predictions for a table of data ✔ ML_SCORE: Computes the quality of a model ✔ ML_MODEL_LOAD: Loads a machine learning model for predictions and explanations ✔ ML_MODEL_UNLOAD: Unloads a machine learning model • In addition, with MySQL HeatWave ML, there is no need to move or reformat your data • Data and machine learning models never leave the MySQL HeatWave Database Service, which saves you time and effort while keeping your data and models secure
  25. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    27 Classification Classify warranty claims Identify similar users Recommend movies Recommender System Loan default prediction Predict flight delay Rain fall prediction Regression Predict Advt spend ROI Demand forecasting Anomaly Detection Detect anomalous credit card spend Identify game hacker Fully automated in-database machine learning • In-database • Secure • Fully automated • 25x faster than Redshift ML • Explainable • No additional cost Time-series forecasting Training, inference, explanation with HeatWave AutoML
  26. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    28 MySQL HeatWave AutoML is 25x faster than Redshift ML 25x faster on average without compromising accuracy www.oracle.com/mysql/heatwave/performance
  27. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    29 MySQL HeatWave AutoML is 1% of cost of Redshift ML No additional cost for MySQL HeatWave customers www.oracle.com/mysql/heatwave/performance
  28. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    30  Produces more accurate results  Trains models 25X faster on average  1% of the cost  Scales as more modes are added HeatWave AutoML vs Redshift ML: Benchmarks summary See Benchmark details: https://www.oracle.com/mysql/heatwave/performance/
  29. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    31 Machine learning in action with MySQL HeatWave OLTP Applications Social ECommerce FinTech SaaS Analytics Tools Real-time ML recommendations Real-time analytics on trends
  30. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    32 MySQL HeatWave - OLTP + OLAP + ML Example : Loan Approval
  31. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    33 MySQL HeatWave AutoML – Run Machine Learning on existing cluster Build, train, deploy, & explain ML models within MySQL HeatWave, at no additional cost Single MySQL database for all applications All existing applications work without any changes Train, inference & explain within the database No need to learn new language or ML packages
  32. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    35 Classification task – Iris dataset Some domain expertise – a little botany! Live Demo An Iris • Variant shown is an Iris Versicolor • Iris Virginica & Iris Setosa also available in the dataset The parts of an iris that we believe might help identify the iris variant: • Petal length • Petal width • Sepal length • Sepal width
  33. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    36 MySQL HeatWave AutoML Set up the environment • MySQL HeatWave Create Model • Prepare and load data • Train a machine learning model (use training data) • Explain how the model works & score it for accuracy (use validation data) Load & Invoke Model • Load the model into HeatWave • Make predictions on new sets of data • Explain the reasons for the predictions Check Model Quality • With new current validation data • Score the model for accuracy • If the score has deteriorated • Revisit model training, etc. Usage Overview
  34. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    37 Connect Train Predict Explain Environment Setup • Interactive data science environments such as Zeppelin or Jupyter can be used • Alternatively just use MySQL Shell ( or any MySQL Client ) Client Connectivity
  35. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    38 HeatWave cluster, ML schema and user creation Create an instance of HeatWave in either OCI (preferred) or AWS • In OCI use 1 or more 512GB cluster nodes • In AWS you must use the 256GB nodes • Version options • Either the stable release (8.0 series): currently 8.0.34; next release will be 8.0.35 • Or the latest innovation release (8.n where n >= 1): currently 8.1, next release will be 8.2 Connect to HeatWave (as the admin user) and create a schema and user • The schema is where data used to train and validate the model will be held. • It will later hold a table of test data on which the model will make predictions • In the example below the schema is called ml_iris; it could be called anything • User must have select and alter grants on this schema • User must have select and execute grants on the sys schema objects • The sys schema is where the HeatWave ML routines reside mysql> CREATE SCHEMA ml_iris; mysql> GRANT SELECT, ALTER ON ml_iris.* TO 'freshdaz'@'%'; mysql> GRANT SELECT, EXECUTE ON sys.* TO 'freshdaz'@'%'; Environment Setup
  36. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    39 Data Preparation Generic constraints on data • Maximum size of tables • 10GB, up to100 million rows and1017 columns • Only supported data types can be used • Check here https://dev.mysql.com/doc/heatwave/en/hwml-supported-data-types.html • Date types – use numeric equivalents or possibly convert then to char or varchar • NaN values must be replaced with NULL Category based constraints • Each ML Category (classification, regression, forecasting, anomaly detection & recommendation) has requirements with respect to the column makeup of the training, validation and test tables, for example: • Regression use cases require that the target column must be of a numeric type • Forecasting requires a datetime index column • Classification use cases demand that the target column has at least 2 distinct values and each distinct value should appear in 5 or more rows • Always read the documentation for the category you are investigating Constraints and limitations
  37. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    40 Data Preparation Tables • Training table • From which we will derive our model • Structure consists of a number of feature columns and one target column • The feature columns are inputs • The target column is an output (a.k.a. the label) • Validation table • To allow us to score the model’s accuracy and reliability • Has the same structure as the training table (i.e. with target/label column) • Ground truths – i.e. we know the output (target/label) is correct for the inputs (features) • Test table (not immediately required) • Holds the data which we will apply to the model and make predictions on • Structure is the same as the training table but without a target column Tables
  38. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    41 Labelled and unlabelled data illustrated (in the context of our classification use-case) Data Preparation inputs (feature columns) outputs (target column) Sepal Length Sepal Width Petal Length Petal Width Class 6.4 2.8 5.6 2.2 Iris-virginica 5 2.3 3.3 1 Iris-setosa 4.9 2.5 4.5 1.7 Iris-virginica 4.9 3.1 1.5 0.1 Iris-versicolor 7.7 3.8 1.7 0.3 Iris-versicolor … … … … … Sepal Length Sepal Width Petal Length Petal Width 5.9 3 4.2 1.5 6.9 3.1 5.4 2.1 5.1 3.3 1.7 0.5 6 3.4 4.5 1.6 5.5 2.3 4 1.3 … … … … inputs only Labelled Data • Each row has a label (target column): • iris-virginica, iris-setosa, etc., are labels • The training table will take this form • The validation table will take this form • different row data to the training table • known truths Unlabelled Data • Only inputs, no target column (labels) • Used to store test data • Model will be applied to table and for each row a prediction will be made as to which type of Iris can be inferred from the petal/sepal data • When explained both the prediction and determining features will be provided
  39. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    42 Summary 1. Understand your use case and select the appropriate category of machine learning 2. Understand which inputs you are going to use and the output (label) for each row of inputs 3. Source your data for training and validation, for example i. From a flat file (e.g. a csv) ii. From another table in another schema (e.g. CREATE TABLE AS SELECT) iii. Or by some other method (e.g. sql script) 4. Remember that you will need sufficient data for both training and validation and that the data in the validation table must be different to that in the training table Data Preparation
  40. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    43 Training Using the ML_TRAIN stored procedure Training a model is as simple as calling a stored procedure: • In the following example we want to use machine learning to identify/differentiate iris plants based on their features: • A classification type use case • Assume the following • Our machine learning schema is called, ml_iris (which was created in the data preparation stage) • Our training table is called, train • The target column in the train table is called, class • The variable @iris_model will be assigned the resultant model’s handle mysql> CALL sys.ML_TRAIN('ml_iris.train', 'class', JSON_OBJECT('task','classification'),@iris_model); Query OK, 0 rows affected (17.765 sec) • The JSON_OBJECT parameter provides additional information to allow training to take place • Its arguments take the form of zero, one or more comma separated key-value pairs • If an argument is not specified then defaults will be used • It makes sense to always provide the task key-value pair which describes the category of ML • Some arguments are task specific, others like ‘model_list’ are more generic – read the docs
  41. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    44 Training Each user of HeatWave ML will have their own model catalog The first time ML_TRAIN is called by a user (e.g. freshdaz@’%’) it will: • Create a schema whose name will take the form: • ML_SCHEMA_<username> , e.g. ML_SCHEMA_freshdaz • Create a table called, MODEL_CATALOG, in this schema • In this table it will create a row with a handle to the created model • Note the model’s handle will also be assign to the variable, @iris_model, by ML_TRAIN Subsequent ML_TRAIN calls will create further rows in the MODEL_CATALOG table Predictions and explanations will use the model handle. The model catalog
  42. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    45 Load the Model into HeatWave Before we can make test the model and make predictions we need to load the model into HeatWave • NULL argument indicates procedure is being run by the schema/model owner • See also https://dev.mysql.com/doc/heatwave/en/hwml-model-sharing.html mysql> CALL sys.ML_LOAD(@iris_model,NULL); Query OK, 0 rows affected (0.8615 sec) Remember to unload the model • When not in use – saves resource • When updating the model • It’s easy to create a new model and continue to use the old model by mistake mysql> CALL sys.ML_UNLOAD(@iris_model); Query OK, 0 rows affected (0.704 sec) All interactions with the model use the HeatWave cluster nodes
  43. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    46 Understanding how the model works By default ML_TRAIN will train the Permutation Importance explainer for both model and prediction explanations (version 8.0.34, earlier and later versions may vary) HeatWave ML provides alternative explainers: • Partial Dependence • Shap • Fast Shap • Permutation Importance The explainer your model is using can be found by querying the MODEL_CATALOG table • This tells us which features are the most important in making a prediction Checking the model for accuracy mysql> SELECT model_explanation FROM ML_SCHEMA_freshdaz.MODEL_CATALOG WHERE model_handle=@iris_model\G *************************** 1. row *************************** model_explanation: {"permutation_importance": {"petal width": 0.478, "sepal width": 0.0, "petal length": 0.3338, "sepal length": 0.0}} 1 row in set (0.0006 sec)
  44. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    47 Changing explainers To change and retrain the explainer for either the model or predictions or both, use ML_EXPLAIN() and provide the following parameters: • The same training data used to create the model (ml_iris.train) • The model’s label/target column (class) • A JSON_OBJECT to specify the model and prediction explainers to be used • Explainer specific options can also be added to this object – read the docs Review the outcome by querying the model catalog Checking the model for accuracy mysql> CALL sys.ML_EXPLAIN('ml_iris.train', 'class', @iris_model, JSON_OBJECT('model_explainer', 'shap', 'prediction_explainer', 'permutation_importance')); Query OK, 0 rows affected (8.6091 sec) mysql> SELECT model_explanation FROM ML_SCHEMA_freshdaz.MODEL_CATALOG WHERE model_handle=@iris_model\G *************************** 1. row *************************** model_explanation: {"shap": {"petal width": 0.358, "sepal width": 0.0, "petal length": 0.2815, "sepal length": 0.0}, "permutation_importance": {"petal width": 0.478, "sepal width": 0.0, "petal length": 0.3338, "sepal length": 0.0}} 1 row in set (0.0006 sec)
  45. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    48 Scoring the model To score the model we need to use the ML_SCORE procedure and pass it the following parameters: • A table of test data that is different to the training data – our validation data table, ml_iris.validate • The label/target column, class • The model, @iris_model • A metric type – balanced_accuracy is used others exist – read the docs • A variable to store the result in, @score • The last parameter is used with anomaly_detection tasks and has no utility here, hence it is set to NULL Checking the model for accuracy mysql> CALL sys.ML_SCORE('ml_iris.validate', 'class', @iris_model, 'balanced_accuracy', @score, NULL); Query OK, 0 rows affected (1.0497 sec) mysql> SELECT @score; +--------------------+ | @score | +--------------------+ | 0.9583333134651184 | +--------------------+ 1 row in set (0.0005 sec)
  46. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    49 Using the Model There are two methods for making predictions 1. ML_PREDICT_ROW (implemented as a function) • Generates predictions for one or more rows of input features (unlabeled data) specified in JSON format. For convenience assign the JSON object to a variable, e.g. @row_input • The third (options) parameter should be set to NULL for all tasks (categories) other than anomaly detection and recommendation. Read the docs – be aware of MySQL version differences (8.0.34 shown). • Output (i.e. the prediction(s)) is printed to screen. 2. ML_PREDICT_TABLE (implemented as a stored procedure) • Takes a table (e.g. ml_iris.test) with one or more rows of input features. • The fourth (options) parameter should be set to NULL for all tasks (categories) other than anomaly detection and recommendation. Read the docs – be aware of MySQL version differences (8.0.34 shown). • Output is written to the table given by the third parameter (e.g. ml_iris.predict). This table will be created. mysql> SET @row_input = JSON_OBJECT("sepal length", 7.3, "sepal width", 2.9, "petal length", 6.3, "petal width", 1.8); mysql> SELECT sys.ML_PREDICT_ROW(@row_input, @iris_model, NULL); ... mysql> CALL sys.ML_PREDICT_TABLE('ml_iris.test', @iris_model, 'ml_iris.predict', NULL); ... Making predictions
  47. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    50 Prediction output Using the Model mysql> CALL sys.ML_PREDICT_TABLE('ml_iris.test, @iris_model, 'ml_iris.predict', NULL); Query OK, 0 rows affected (1.0630 sec) mysql> SELECT * FROM ml_iris.predict LIMIT 2\G *************************** 1. row *************************** _id: 1 sepal length: 5.9 sepal width: 3 petal length: 4.2 petal width: 1.5 Prediction: Iris-setosa ml_results: {"predictions": {"class": "Iris-setosa"}, "probabilities": {"Iris-setosa": 0.99, "Iris-versicolor": 0.0, "Iris-virginica": 0.01}} *************************** 2. row *************************** _id: 2 sepal length: 6.9 sepal width: 3.1 petal length: 5.4 petal width: 2.1 Prediction: Iris-virginica ml_results: {"predictions": {"class": "Iris-virginica"}, "probabilities": {"Iris-setosa": 0.0, "Iris-versicolor": 0.0, "Iris-virginica": 1.0}} 2 rows in set (0.0006 sec) mysql> inputs inputs Prediction: the model is 99% sure the Iris is an Iris-setosa, but there is a 1% chance it could be an Iris-virginica Prediction: the model is certain the Iris is an Iris-virginica (but remember the model’s accuracy score is 95.8%)
  48. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    51 Explaining the predictions So far we only have predictions and probabilities as to their accuracy, we now need to explain them. Two methods: 1. ML_EXPLAIN_ROW (implemented as a function) • The first parameter must be the data used in the ML_PREDICT_ROW call, in our case the variable @row_input • The second parameter must be the model, in our case, @iris_model • The third parameter is a JSON object reiterating the prediction_explainer being used 2. ML_EXPLAIN_TABLE (implemented as a procedure) • Similar set of parameters to ML_EXPLAIN_ROW but ‒ instead of @row_input we will use the test table that was used with ML_PREDICT_TABLE ‒ And we insert a third parameter which gives the name of the table, explanations, where we will store the explanations. Note the stored procedure call will create this table. Using the Model mysql> SELECT sys.ML_EXPLAIN_ROW(@row_input, @iris_model, JSON_OBJECT('prediction_explainer', 'permutation_importance')); ... mysql> CALL sys.ML_EXPLAIN_TABLE('ml_iris.test', @iris_model, 'ml_iris.explanations', JSON_OBJECT('prediction_explainer', 'permutation_importance')); ... mysql>
  49. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    52 Examining explanation output Using the Model mysql CALL sys.ML_EXPLAIN_TABLE('ml_iris.test', @iris_model, 'ml_iris.explanations', JSON_OBJECT('prediction_explainer', 'permutation_importance')); Query OK, 0 rows affected (7.4690 sec) mysql> SELECT * FROM ml_iris.explanations LIMIT 2,1\G _id: 2 sepal length: 6.9 sepal width: 3.1 petal length: 5.4 petal width: 2.1 Prediction: Iris-virginica Notes: petal length (5.4) had the largest impact towards predicting Iris-virginica petal length_attribution: 0.97 petal width_attribution: 0.44 ml_results: {'attributions': {'petal length': 0.97, 'petal width': 0.44}, 'predictions': {'class': 'Iris-virginica'}, 'notes': 'petal length (5.4) had the largest impact towards predicting Iris-virginica'} 2 rows in set (0.0006 sec) mysql> inputs Explanation – plain English summary (notes) with attributions (reiterated in JSON formatted ml_results column)
  50. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    53 Using the Model • Explained output differs from predicted output because it shows attributions • Attributions detail the amount of influence a feature has (or has not) on a prediction • Attributions range from -1 to 1 • Positive values indicate that a feature contributed towards the prediction • Negative values indicate that a feature contributed towards a different prediction • Zero or near-zero values indicate that the feature made no impact on the prediction Explanation Attribution
  51. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    54 Summary MySQL HeatWave AutoML democratizes ML
  52. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    55 MySQL HeatWave AutoML democratizes machine learning • Fully automated training enables citizen data scientists • Keeping everything in the database simplifies the solution and reduces cost • No ETL to implement and maintain • No additional licenses • No dependency matrix of software versions • MySQL HeatWave AutoML is affordable • New customers pay 1-2% compared to RedShift ML • Customers already using MySQL HeatWave effectively get it for free • Enables small-medium sized business to gain competitive advantage from machine learning • MySQL HeatWave AutoML is explainable • Both model and predictions • Consumers will trust and regulators will approve of • MySQL HeatWave is secure • Data remains in the database Summary Documentation: https://dev.mysql.com/doc/heatwave/en/mys-hwaml-machine-learning.html In-database Machine Learning
  53. Get $300 in credits and try free for 30 days

    Get started with MySQL HeatWave oracle.com/mysql/free Learn more about MySQL HeatWave oracle.com/mysql Request a guided workshop Ask your account manager 56 Copyright © 2023, Oracle and/or its affiliates
  54. Follow us on Social Media “Data is the Oxygen of

    Business” 57 Copyright © 2023, Oracle and/or its affiliates
  55. Merci! Q&R Olivier Dasini MySQL Cloud Principal Solutions Architect EMEA

    [email protected] Blogs : www.dasini.net/blog/en : www.dasini.net/blog/fr Linkedin: www.linkedin.com/in/olivier-dasini Twitter : @freshdaz
  56. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    60 “Oracle announced MySQL HeatWave with Autopilot last August, which may very well have been the single greatest innovation in open source cloud databases in the last 20 years to that point. Now Oracle has gone beyond its original unifying of OLTP and OLAP in HeatWave, with MySQL HeatWave ML. Oracle is bringing all of the machine learning processing and models inside the database, so that customers not only avoid managing ML databases apart from the core database, but also eliminate the hassles of ETL, gaining speed, accuracy, and cost-effectiveness in the bargain.” “This latest announcement from Oracle is the third major release of MySQL HeatWave in just over 12 months. Oracle has delivered more cloud database innovations during that timeframe than most cloud database vendors have delivered in the last decade. Not only does the in-database HeatWave ML make Redshift ML look like yesterday’s tech in terms of engineering, performance and cost, but the latest MySQL HeatWave TPC-DS benchmarks demonstrate that Amazon Redshift, Snowflake, Azure Synapse and Google BigQuery are all slower and more expensive. It’s rather clear who’s innovating in cloud databases and who’s being complacent.” Feedback from analysts
  57. Copyright © 2023, Oracle and/or its affiliates. All rights reserved.

    61 "We recently had an opportunity to use the machine learning capabilities of HeatWave ML. We found it very innovative, easy to use, very fast and most important it is secure since the data or the model don’t leave the database. We believe that providing native in-database machine learning is of significant interest to our clients and will further accelerate the adoption of MySQL HeatWave“ Arvind Rajan, CEO “To satisfy the growing need for explainability of ML models and outcomes, HeatWave ML delivers robust and comprehensive explanation capabilities focused on usability, interpretability, quality, performance, and repeatability at scale…it’s no wonder that enterprises continue to look to HeatWave to set themselves up for transformational data success.” Feedback from analysts, customers