Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OML feature highlight: New OML Notebook templates for ML Feature Extraction

OML feature highlight: New OML Notebook templates for ML Feature Extraction

In this weekly Office Hours for Oracle Machine Learning on Autonomous Database, we introduced the latest Notebook templates for Machine Learning Feature Extraction problems. This was a follow-along Session, since the OML Notebook templates are available to any Autonomous Database tenancy, and people were able to run it while we demonstrated it.

Marcos Arancibia

January 18, 2022
Tweet

More Decks by Marcos Arancibia

Other Decks in Technology

Transcript

  1. OML feature highlight: New OML Notebook
    templates for ML Feature Extraction
    OML Office Hours
    Marcos Arancibia, Senior Principal Product Manager, Machine Learning
    Supported by Mark Hornick and Sherry LaMonica
    Senior Director, Product Management, Data Science and Machine Learning
    Move the Algorithms; Not the Data!
    Copyright © 2022, Oracle and/or its affiliates.
    This Session will
    be Recorded

    View Slide

  2. • Upcoming Sessions
    • Follow-along OML Notebooks ML Feature Extraction demos
    • Q&A
    Topics for today
    Copyright © 2022, Oracle and/or its affiliates
    2

    View Slide

  3. We will begin a Series of Follow-along reviews of the Example Template notebooks every week, with
    one subject per week. These will be Hands-on if you have access to any Autonomous Database, even
    the Always-Free one.
    • Classification - done
    • Regression - done
    • Clustering
    • Feature Extraction I – Dimensionality Reduction
    • Feature Extraction II - Explicit Semantic Analysis
    • Time Series
    Upcoming Sessions
    Copyright © 2022, Oracle and/or its affiliates
    3

    View Slide

  4. • Plus a Hands-on Lab Session on OML4Py in
    Portuguese to be scheduled on February 11,
    2022 at 8AM Pacific, 11Am Eastern, 1PM Brazil
    time.
    Upcoming Sessions
    Copyright © 2022, Oracle and/or its affiliates
    4

    View Slide

  5. Algorithms
    Some of the methods for Feature Extraction include:
    - Attribute importance using Minimum Description Length
    - Feature Extraction methods that use a transformation/translation/rotation of the original attribute
    axis, or a decomposition of the original variables into a set of matrices, like:
    - (PCA) Principal Component Analysis,
    - (SVD) Singular Value Decomposition,
    - (NMF) Non-Negative Matrix Factorization,
    - (EM) Expectation-Maximization,
    - CUR Matrix Decomposition,
    - Explicit Semantic Analysis for NLP and information retrieval.
    Using transformations or simply the exclusion of variables/columns with lower relationship with the
    target is helpful when building predictive models with machine learning, and because good data
    preparation is usually 90% of the work, Feature Extraction might be a key element to assist in a better
    model.
    Feature Extraction
    Copyright © 2022, Oracle and/or its affiliates
    5

    View Slide

  6. • Compute the relative importance of predictor variables for predicting
    a response (target) variable
    • Gain insight into relevance of variables to guide manual variable selection
    or reduction, with the goal to reduce predictive model build time and/or
    improve model accuracy
    • Attribute Importance uses a Minimum Description Length (MDL) based
    algorithm that ranks the relative importance of predictor variables in
    predicting a specified response (target) variable
    • Pairwise only – each predictor with the target
    • Supports categorical target (classification) and numeric target (regression)
    Attribute Importance
    Copyright © 2022, Oracle and/or its affiliates
    6

    View Slide

  7. • Includes Auto Data Preparation (Normalization, binning)
    • Can allow or ignore missing values
    • Supports partitioned model builds
    • Does NOT have a "Scoring" action, only presents the attribute importance
    so the user can explore the results
    • Returns a relative metric indicating how much the variable contributes to
    predicting the target
    • Values > 0 contribute to prediction
    • Values <= do not contribute or add noise
    OML Attribute Importance
    Copyright © 2022, Oracle and/or its affiliates
    7

    View Slide

  8. • Feature extraction algorithm
    • Orthogonal linear transformations capture the underlying variance of data
    by decomposing a rectangular matrix into three matrices: U, D and V
    • Matrix D is a diagonal matrix and its singular values reflect the amount of
    data variance captured by the singular vectors
    Singular Value Decomposition - SVD
    Copyright © 2022, Oracle and/or its affiliates
    8

    View Slide

  9. • Supports narrow data via Tall and Skinny solvers
    • Supports wide data via stochastic solvers
    • Provides Eigen Solvers for faster analysis with sparse data
    • Provides traditional SVD for more stable results
    • Provides PCA (Principal Components Analysis) scores
    Oracle Machine Learning SVD implementation
    Copyright © 2022, Oracle and/or its affiliates
    9

    View Slide

  10. • State-of-the-art algorithm for Feature Extraction
    • Dimensionality reduction technique
    • Creates new features of existing attributes (in contrast to Attribute
    Importance, which reduces attributes by taking a subset)
    • NMF derives fewer new “features” taking into account interactions among
    original attributes
    • Supports text mining, life sciences and marketing applications
    Non-negative Matrix Factorization - NMF
    Copyright © 2022, Oracle and/or its affiliates
    10

    View Slide

  11. Two options to access the Template Examples
    Copyright © 2022, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted
    11

    View Slide

  12. There are Several Feature Extraction demos specific to certain algorithms
    Attribute Importance using MDL Attribute Importance using EM
    Type "attr"

    View Slide

  13. Dimensionality Reduction using SVD
    Type "dim"
    There are Several Feature Extraction demos specific to certain algorithms
    Dimensionality Reduction using NMF
    Dimensionality Reduction using SVD

    View Slide

  14. Feature Extraction using ESA
    Type "extr"
    There are Several Feature Extraction demos specific to certain algorithms
    Feature Extraction using ESA Wikipedia model

    View Slide

  15. Click Create Notebook to create a copy for yourself
    Copyright © 2022, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted
    15
    Click on "Create
    Notebook"
    Give it a name
    (and optionally
    a description)
    Click OK

    View Slide

  16. The new notebook will show up in the notebooks listing.
    Copyright © 2022, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted
    16

    View Slide

  17. Q & A
    Copyright © 2022, Oracle and/or its affiliates
    17

    View Slide

  18. Copyright © 2022, Oracle and/or its affiliates.
    18
    Thank you

    View Slide