Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Treasure AutoML

Introduction to Treasure AutoML

Introduction to Treasure AutoML

Avatar for Makoto Yui

Makoto Yui

July 04, 2025
Tweet

More Decks by Makoto Yui

Other Decks in Programming

Transcript

  1. AutoML operator 3 support notebooks prepared by TD ML experts

    specify training data specify model name to be saved target column to predict AutoML does (almost) everything that Data Scientists do from feature preprocessing, feature selection, model selection, model ensembling, EDA, feature importance visualization, hyperparameter tuning, … → For data scientists, they can focus on business value creation. bring your own notebook and docker image are not supported due to security concerns. Instead, TD prepare what customers want
  2. Supported task memory options 4 support 64g,128g,256g,384g,512g (128GiB by the

    default) Max memory size depends on your contracts
  3. Treasure Data
 AutoML
 How Treasure AutoML works 5 PlazmaDB
 read

    data to build a model
 Workflow (Orchestration)
 write (prediction) results back to TD
 Advertising Marketing CRM Ad Network DSP Email Messaging Push Notification Facebook / Twitter Call Center Web Personalization syndicate ➢ Automl is implemented as Workflow Operators ◦ Operation friendly: scheduling, job orchestration, connectors 👍 ➢ TD tables in, TD tables out. ➢ Runs as isolated container tasks on Amazon ECS ➢ Run notebook as a code prepared by distinguished ML experts
  4. (Jupyter) Notebook in Nutshell 7 Notebooks combine software code, computational

    output, explanatory text, and rich visualization in a single document. In notebooks, code is written in blocks called "cells" (starting with In[n]), and codes are executed sequentially in each cell. Future reading: Beyond Interactive: Notebook Innovation at Netflix Notebook as a Code
  5. What TD Automl covers (Current) 10 Type of problems Application

    examples Regression Demand forecasting (e.g., Electronic power) 
 LTV prediction
 Multi-class classfication Predicting gender-age Binary classification Predicting Churn or not Timeseries forecasting Predicting stock prices, store sales
  6. ML adoption process 12 Business problems ML problem framing Data

    collection /ingestion Data preparation Explanationatory Data Analysis Feature Engineering Model selection and training Production business goal evaluation Model evaluation yes no
  7. ML adoption process 13 Business problems ML problem framing Data

    collection /ingestion Data preparation Exploratory Data Analysis Feature Engineering Model selection and training Production business goal evaluation Model evaluation yes Automl automates this iterational data scientists process no
  8. ML adoption process 14 Business problems ML problem framing Data

    collection /ingestion Data preparation Exploratory Data Analysis Feature Engineering Model selection and training Production business goal evaluation Model evaluation yes Seamless integration with existing TD components is big advantage of running Automl inside TD. no Data Connector Activation (Connector) Data workbench (Presto/Hive/Workflow)
  9. 15 Convention PDCA cycles for Applying ML ML problem framing

    EDA Data Preparation Model tuning and creation Evaluation Productizatio n (optional) Activation Prototyping on Data Scientist's laptop Marketers Business problems Business goals Data Engineers collect data from Silos Product development and deployment by Engineers, and operations by SREs Marketers Time-consuming iterational process. 1-3 data scientists, 2~5 weeks 2 or more engineers+SREs 1 mo or more for production-level quolity and operation ask for other departments or vendor took several months to achieve his goal
  10. 16 PDCA cycles with/without Treasure Automl ML problem framing EDA

    Data Preparation Model tuning and creation Evaluation Productizatio n (optional) Activation Business problems Business goals ML problem framing Data Preparation Business problems From EDA to Produtization Activ ation Business goals w/ TD Automl Conventional can focus on business value creation Less PDCA cycle. 3h ~ few days 2 or more engineers+SREs 1 mo or more 1-3 data scientists, 2~5 weeks Large reduction in time spent for iterational ML process
  11. Performance of AutoGluon on ML datasets 17 Comparison by 39

    ML benchmark datasets https://arxiv.org/abs/2003.06505 Note: we plan to support H2O AutoML later as well. better than GCP AutoML tables one of our customer confirmed competitive performance of TD Automl to DataRobot
  12. 18 Performance of AutoGluon on Kaggle competitions Comparison by 11

    Kaggle competitions https://arxiv.org/abs/2003.06505 AutoGluon beat 99% data scientists in Kaggle competition
  13. Why AutoGluon so good? A: Stacked Generalization 19 Ensemble technique

    used by top Kagglers Concatenates prediction by each model output as a new feature Models used for ensembling are automatically selected in each layer
  14. Why AutoGluon so good? A: n-repeated k-fold bagging 20 stacking

    ensemble 👍👍 Generalization ability
  15. TD Automl goes farther than the plain AutoGluon 21 ➢

    Apply bug fixes and workarounds for underlying OSS libraries (SHAP, autogluon) and bundling the best of knowledge of us for EDA, model explanation (XAI) steps ➢ We’ll extend AutoGluon for feature engineering (e.g., adding holiday features, identity feature elimination), supported new models (e.g. tabnet), applies bug fixes. ➢ We plan to support other supervised learning options such as H2O automl, custom Automl implementation. ➢ TD Automl goes beyond Supervised learning ◦ support Clustering, Timeseries forecasting, Recommendation, Causal Analysis ◦ support ML-based solution notebooks such as PII data masking, Timeseries forecasting, Text-analysis, Next-best action solutions
  16. Timeseries Forecasting notebook 24 Internally uses Flaml Automl. Model and

    hyperparameters are auto tuned. Supported models are Arima, Prophet, LightGBM, XGBoost (, Sarimax) Hyperparameters are automatically tuned and best model is selected. Forecast timeseries such as store sales, stock prices
  17. Multi-touch attribution (MTA) notebook 26 Multi-touch attribution (MTA) is a

    method of marketing measurement, determining the value of each customer touchpoint that leads to a conversion. MTA helps marketers to figure out which marketing channels or campaigns should be credited with the conversion. Marketers can optimize marketing channels using this notebook. conversion Customer journey Social Display Paid Search Credit allocation
  18. Multi-touch attribution (MTA) notebook 27 tstamp user channel conversion 1596012307

    yl38g61s2x sfmc 0 1596012340 d4dbvpwcyj instagram 0 1596012427 egeaf1po46 facebook 0 1596012553 gls9vyk2de google 1 1596012645 ps6cc25f24 instagram 0 ... ... ... ...
  19. Multi-touch attribution notebook 28 Attribution of channel to conversions are

    estimated using various models: last touch, first touch, time decay, liner, position-based (U-shape), shapley, markov
  20. Multi-touch attribution notebook 29 Importance of channel to conversions prob.

    of interaction between pairs of channels in transitions
  21. NBA notebook performs reinforcement learning (Q-learning) to infer next best

    actions based on the past rewards on states. Can be used for recommending next best marketing actions. 30 ActionB1 ActionB2 ActionB3 ActionB4 ActionB5 StateB1 [+1.0]. StateB3 [+0]. StateB5 [+4.0]. StateB2 [+3.0]. StateB4 [+1.5]. What is the next best Action? Nest Best Action
  22. Next Best Action (NBA) notebook 31 user_id tstamp state action

    reward (optional) 2a644f3f-ad33-48b17 2021-06-14 08:58:59 /custom-demo/ client_domain_organic_vis it 1.0 1740eb3c-03de-485b 2021-06-14 08:58:25 /customers/lion/ google ads 0.0 bd378622-0905-44d4 2021-06-14 08:25:57 /learn/cdp-vs-dmp/ facebook 0.0 user_id ... next_action (recommended) 2a644f3f-ad33-48b3 ... cpc 1740eb3c-03de-4856 ... social bd378622-0905-44d4 ... display Given this input table outputs this output (next best action)
  23. (stanalone) Shapley notebook 34 Can explain predictions (give reasoning for

    each prediction) using Shapley additive expansions. Usage is simple as follows: