Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OML feature highlight: New OML Notebook templates for ML Feature Extraction

OML feature highlight: New OML Notebook templates for ML Feature Extraction

In this weekly Office Hours for Oracle Machine Learning on Autonomous Database, we introduced the latest Notebook templates for Machine Learning Feature Extraction problems. This was a follow-along Session, since the OML Notebook templates are available to any Autonomous Database tenancy, and people were able to run it while we demonstrated it.

Marcos Arancibia

January 18, 2022
Tweet

More Decks by Marcos Arancibia

Other Decks in Technology

Transcript

  1. OML feature highlight: New OML Notebook templates for ML Feature

    Extraction OML Office Hours Marcos Arancibia, Senior Principal Product Manager, Machine Learning Supported by Mark Hornick and Sherry LaMonica Senior Director, Product Management, Data Science and Machine Learning Move the Algorithms; Not the Data! Copyright © 2022, Oracle and/or its affiliates. This Session will be Recorded
  2. • Upcoming Sessions • Follow-along OML Notebooks ML Feature Extraction

    demos • Q&A Topics for today Copyright © 2022, Oracle and/or its affiliates 2
  3. We will begin a Series of Follow-along reviews of the

    Example Template notebooks every week, with one subject per week. These will be Hands-on if you have access to any Autonomous Database, even the Always-Free one. • Classification - done • Regression - done • Clustering • Feature Extraction I – Dimensionality Reduction • Feature Extraction II - Explicit Semantic Analysis • Time Series Upcoming Sessions Copyright © 2022, Oracle and/or its affiliates 3
  4. • Plus a Hands-on Lab Session on OML4Py in Portuguese

    to be scheduled on February 11, 2022 at 8AM Pacific, 11Am Eastern, 1PM Brazil time. Upcoming Sessions Copyright © 2022, Oracle and/or its affiliates 4
  5. Algorithms Some of the methods for Feature Extraction include: -

    Attribute importance using Minimum Description Length - Feature Extraction methods that use a transformation/translation/rotation of the original attribute axis, or a decomposition of the original variables into a set of matrices, like: - (PCA) Principal Component Analysis, - (SVD) Singular Value Decomposition, - (NMF) Non-Negative Matrix Factorization, - (EM) Expectation-Maximization, - CUR Matrix Decomposition, - Explicit Semantic Analysis for NLP and information retrieval. Using transformations or simply the exclusion of variables/columns with lower relationship with the target is helpful when building predictive models with machine learning, and because good data preparation is usually 90% of the work, Feature Extraction might be a key element to assist in a better model. Feature Extraction Copyright © 2022, Oracle and/or its affiliates 5
  6. • Compute the relative importance of predictor variables for predicting

    a response (target) variable • Gain insight into relevance of variables to guide manual variable selection or reduction, with the goal to reduce predictive model build time and/or improve model accuracy • Attribute Importance uses a Minimum Description Length (MDL) based algorithm that ranks the relative importance of predictor variables in predicting a specified response (target) variable • Pairwise only – each predictor with the target • Supports categorical target (classification) and numeric target (regression) Attribute Importance Copyright © 2022, Oracle and/or its affiliates 6
  7. • Includes Auto Data Preparation (Normalization, binning) • Can allow

    or ignore missing values • Supports partitioned model builds • Does NOT have a "Scoring" action, only presents the attribute importance so the user can explore the results • Returns a relative metric indicating how much the variable contributes to predicting the target • Values > 0 contribute to prediction • Values <= do not contribute or add noise OML Attribute Importance Copyright © 2022, Oracle and/or its affiliates 7
  8. • Feature extraction algorithm • Orthogonal linear transformations capture the

    underlying variance of data by decomposing a rectangular matrix into three matrices: U, D and V • Matrix D is a diagonal matrix and its singular values reflect the amount of data variance captured by the singular vectors Singular Value Decomposition - SVD Copyright © 2022, Oracle and/or its affiliates 8
  9. • Supports narrow data via Tall and Skinny solvers •

    Supports wide data via stochastic solvers • Provides Eigen Solvers for faster analysis with sparse data • Provides traditional SVD for more stable results • Provides PCA (Principal Components Analysis) scores Oracle Machine Learning SVD implementation Copyright © 2022, Oracle and/or its affiliates 9
  10. • State-of-the-art algorithm for Feature Extraction • Dimensionality reduction technique

    • Creates new features of existing attributes (in contrast to Attribute Importance, which reduces attributes by taking a subset) • NMF derives fewer new “features” taking into account interactions among original attributes • Supports text mining, life sciences and marketing applications Non-negative Matrix Factorization - NMF Copyright © 2022, Oracle and/or its affiliates 10
  11. Two options to access the Template Examples Copyright © 2022,

    Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 11
  12. There are Several Feature Extraction demos specific to certain algorithms

    Attribute Importance using MDL Attribute Importance using EM Type "attr"
  13. Dimensionality Reduction using SVD Type "dim" There are Several Feature

    Extraction demos specific to certain algorithms Dimensionality Reduction using NMF Dimensionality Reduction using SVD
  14. Feature Extraction using ESA Type "extr" There are Several Feature

    Extraction demos specific to certain algorithms Feature Extraction using ESA Wikipedia model
  15. Click Create Notebook to create a copy for yourself Copyright

    © 2022, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 15 Click on "Create Notebook" Give it a name (and optionally a description) Click OK
  16. The new notebook will show up in the notebooks listing.

    Copyright © 2022, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 16