Introduction to Treasure AutoML

Introduction to Treasure AutoML Senior Principal Engineer Makoto Yui @myui

Integrated to Dag-based Workflow engine 2 Workflow definition in YAML
DAG execution

AutoML operator 3 support notebooks prepared by TD ML experts
specify training data specify model name to be saved target column to predict AutoML does (almost) everything that Data Scientists do from feature preprocessing, feature selection, model selection, model ensembling, EDA, feature importance visualization, hyperparameter tuning, … → For data scientists, they can focus on business value creation. bring your own notebook and docker image are not supported due to security concerns. Instead, TD prepare what customers want

Supported task memory options 4 support 64g,128g,256g,384g,512g (128GiB by the
default) Max memory size depends on your contracts

Treasure Data  AutoML  How Treasure AutoML works 5 PlazmaDB  read
data to build a model  Workﬂow (Orchestration)  write (prediction) results back to TD  Advertising Marketing CRM Ad Network DSP Email Messaging Push Notification Facebook / Twitter Call Center Web Personalization syndicate ➢ Automl is implemented as Workﬂow Operators ◦ Operation friendly: scheduling, job orchestration, connectors 👍 ➢ TD tables in, TD tables out. ➢ Runs as isolated container tasks on Amazon ECS ➢ Run notebook as a code prepared by distinguished ML experts

Auto-Scaling AutoML ECS Cluster 6 Task allocation to a container
is done by bin-packing by task memory

(Jupyter) Notebook in Nutshell 7 Notebooks combine software code, computational
output, explanatory text, and rich visualization in a single document. In notebooks, code is written in blocks called "cells" (starting with In[n]), and codes are executed sequentially in each cell. Future reading: Beyond Interactive: Notebook Innovation at Netﬂix Notebook as a Code

Notebook as a Code using Papermill 8

MLOps using Notebook at Netﬂix 9 Beyond Interactive: Notebook Innovation
at Netﬂix

What TD Automl covers (Current) 10 Type of problems Application
examples Regression Demand forecasting (e.g., Electronic power)   LTV prediction  Multi-class classﬁcation Predicting gender-age Binary classiﬁcation Predicting Churn or not Timeseries forecasting Predicting stock prices, store sales

AutoGluon Automl notebook

ML adoption process 12 Business problems ML problem framing Data
collection /ingestion Data preparation Explanationatory Data Analysis Feature Engineering Model selection and training Production business goal evaluation Model evaluation yes no

collection /ingestion Data preparation Exploratory Data Analysis Feature Engineering Model selection and training Production business goal evaluation Model evaluation yes Automl automates this iterational data scientists process no

collection /ingestion Data preparation Exploratory Data Analysis Feature Engineering Model selection and training Production business goal evaluation Model evaluation yes Seamless integration with existing TD components is big advantage of running Automl inside TD. no Data Connector Activation (Connector) Data workbench (Presto/Hive/Workflow)

15 Convention PDCA cycles for Applying ML ML problem framing
EDA Data Preparation Model tuning and creation Evaluation Productizatio n (optional) Activation Prototyping on Data Scientist's laptop Marketers Business problems Business goals Data Engineers collect data from Silos Product development and deployment by Engineers, and operations by SREs Marketers Time-consuming iterational process. 1-3 data scientists, 2~5 weeks 2 or more engineers+SREs 1 mo or more for production-level quolity and operation ask for other departments or vendor took several months to achieve his goal

16 PDCA cycles with/without Treasure Automl ML problem framing EDA
Data Preparation Model tuning and creation Evaluation Productizatio n (optional) Activation Business problems Business goals ML problem framing Data Preparation Business problems From EDA to Produtization Activ ation Business goals w/ TD Automl Conventional can focus on business value creation Less PDCA cycle. 3h ~ few days 2 or more engineers+SREs 1 mo or more 1-3 data scientists, 2~5 weeks Large reduction in time spent for iterational ML process

Performance of AutoGluon on ML datasets 17 Comparison by 39
ML benchmark datasets https://arxiv.org/abs/2003.06505 Note: we plan to support H2O AutoML later as well. better than GCP AutoML tables one of our customer conﬁrmed competitive performance of TD Automl to DataRobot

18 Performance of AutoGluon on Kaggle competitions Comparison by 11
Kaggle competitions https://arxiv.org/abs/2003.06505 AutoGluon beat 99% data scientists in Kaggle competition

Why AutoGluon so good? A: Stacked Generalization 19 Ensemble technique
used by top Kagglers Concatenates prediction by each model output as a new feature Models used for ensembling are automatically selected in each layer

Why AutoGluon so good? A: n-repeated k-fold bagging 20 stacking
ensemble 👍👍 Generalization ability

TD Automl goes farther than the plain AutoGluon 21 ➢
Apply bug ﬁxes and workarounds for underlying OSS libraries (SHAP, autogluon) and bundling the best of knowledge of us for EDA, model explanation (XAI) steps ➢ We’ll extend AutoGluon for feature engineering (e.g., adding holiday features, identity feature elimination), supported new models (e.g. tabnet), applies bug ﬁxes. ➢ We plan to support other supervised learning options such as H2O automl, custom Automl implementation. ➢ TD Automl goes beyond Supervised learning ◦ support Clustering, Timeseries forecasting, Recommendation, Causal Analysis ◦ support ML-based solution notebooks such as PII data masking, Timeseries forecasting, Text-analysis, Next-best action solutions

Solution notebooks

Timeseries Forecasting 23

Timeseries Forecasting notebook 24 Internally uses Flaml Automl. Model and
hyperparameters are auto tuned. Supported models are Arima, Prophet, LightGBM, XGBoost (, Sarimax) Hyperparameters are automatically tuned and best model is selected. Forecast timeseries such as store sales, stock prices

Timeseries: EDA bundled 25 Seasonal decomposition Anomaly detection

Multi-touch attribution (MTA) notebook 26 Multi-touch attribution (MTA) is a
method of marketing measurement, determining the value of each customer touchpoint that leads to a conversion. MTA helps marketers to ﬁgure out which marketing channels or campaigns should be credited with the conversion. Marketers can optimize marketing channels using this notebook. conversion Customer journey Social Display Paid Search Credit allocation

Multi-touch attribution (MTA) notebook 27 tstamp user channel conversion 1596012307
yl38g61s2x sfmc 0 1596012340 d4dbvpwcyj instagram 0 1596012427 egeaf1po46 facebook 0 1596012553 gls9vyk2de google 1 1596012645 ps6cc25f24 instagram 0 ... ... ... ...

Multi-touch attribution notebook 28 Attribution of channel to conversions are
estimated using various models: last touch, ﬁrst touch, time decay, liner, position-based (U-shape), shapley, markov

Multi-touch attribution notebook 29 Importance of channel to conversions prob.
of interaction between pairs of channels in transitions

NBA notebook performs reinforcement learning (Q-learning) to infer next best
actions based on the past rewards on states. Can be used for recommending next best marketing actions. 30 ActionB1 ActionB2 ActionB3 ActionB4 ActionB5 StateB1 [+1.0]. StateB3 [+0]. StateB5 [+4.0]. StateB2 [+3.0]. StateB4 [+1.5]. What is the next best Action? Nest Best Action

Next Best Action (NBA) notebook 31 user_id tstamp state action
reward (optional) 2a644f3f-ad33-48b17 2021-06-14 08:58:59 /custom-demo/ client_domain_organic_vis it 1.0 1740eb3c-03de-485b 2021-06-14 08:58:25 /customers/lion/ google ads 0.0 bd378622-0905-44d4 2021-06-14 08:25:57 /learn/cdp-vs-dmp/ facebook 0.0 user_id ... next_action (recommended) 2a644f3f-ad33-48b3 ... cpc 1740eb3c-03de-4856 ... social bd378622-0905-44d4 ... display Given this input table outputs this output (next best action)

Next Best Action (NBA) notebook

(stanalone) Shapley notebook 34 Can explain predictions (give reasoning for
each prediction) using Shapley additive expansions. Usage is simple as follows:

Shapley: visual explanations of predictions 35

Shapley: visual explanations of predictions 36

UI of TD Automl (demo) 37

Introduction to Treasure AutoML

Introduction to Treasure AutoML

More Decks by Makoto Yui

Other Decks in Programming

Featured

Transcript