Slide 1

Slide 1 text

OML feature highlight: New OML Notebook templates for ML Feature Extraction OML Office Hours Marcos Arancibia, Senior Principal Product Manager, Machine Learning Supported by Mark Hornick and Sherry LaMonica Senior Director, Product Management, Data Science and Machine Learning Move the Algorithms; Not the Data! Copyright © 2022, Oracle and/or its affiliates. This Session will be Recorded

Slide 2

Slide 2 text

• Upcoming Sessions • Follow-along OML Notebooks ML Feature Extraction demos • Q&A Topics for today Copyright © 2022, Oracle and/or its affiliates 2

Slide 3

Slide 3 text

We will begin a Series of Follow-along reviews of the Example Template notebooks every week, with one subject per week. These will be Hands-on if you have access to any Autonomous Database, even the Always-Free one. • Classification - done • Regression - done • Clustering • Feature Extraction I – Dimensionality Reduction • Feature Extraction II - Explicit Semantic Analysis • Time Series Upcoming Sessions Copyright © 2022, Oracle and/or its affiliates 3

Slide 4

Slide 4 text

• Plus a Hands-on Lab Session on OML4Py in Portuguese to be scheduled on February 11, 2022 at 8AM Pacific, 11Am Eastern, 1PM Brazil time. Upcoming Sessions Copyright © 2022, Oracle and/or its affiliates 4

Slide 5

Slide 5 text

Algorithms Some of the methods for Feature Extraction include: - Attribute importance using Minimum Description Length - Feature Extraction methods that use a transformation/translation/rotation of the original attribute axis, or a decomposition of the original variables into a set of matrices, like: - (PCA) Principal Component Analysis, - (SVD) Singular Value Decomposition, - (NMF) Non-Negative Matrix Factorization, - (EM) Expectation-Maximization, - CUR Matrix Decomposition, - Explicit Semantic Analysis for NLP and information retrieval. Using transformations or simply the exclusion of variables/columns with lower relationship with the target is helpful when building predictive models with machine learning, and because good data preparation is usually 90% of the work, Feature Extraction might be a key element to assist in a better model. Feature Extraction Copyright © 2022, Oracle and/or its affiliates 5

Slide 6

Slide 6 text

• Compute the relative importance of predictor variables for predicting a response (target) variable • Gain insight into relevance of variables to guide manual variable selection or reduction, with the goal to reduce predictive model build time and/or improve model accuracy • Attribute Importance uses a Minimum Description Length (MDL) based algorithm that ranks the relative importance of predictor variables in predicting a specified response (target) variable • Pairwise only – each predictor with the target • Supports categorical target (classification) and numeric target (regression) Attribute Importance Copyright © 2022, Oracle and/or its affiliates 6

Slide 7

Slide 7 text

• Includes Auto Data Preparation (Normalization, binning) • Can allow or ignore missing values • Supports partitioned model builds • Does NOT have a "Scoring" action, only presents the attribute importance so the user can explore the results • Returns a relative metric indicating how much the variable contributes to predicting the target • Values > 0 contribute to prediction • Values <= do not contribute or add noise OML Attribute Importance Copyright © 2022, Oracle and/or its affiliates 7

Slide 8

Slide 8 text

• Feature extraction algorithm • Orthogonal linear transformations capture the underlying variance of data by decomposing a rectangular matrix into three matrices: U, D and V • Matrix D is a diagonal matrix and its singular values reflect the amount of data variance captured by the singular vectors Singular Value Decomposition - SVD Copyright © 2022, Oracle and/or its affiliates 8

Slide 9

Slide 9 text

• Supports narrow data via Tall and Skinny solvers • Supports wide data via stochastic solvers • Provides Eigen Solvers for faster analysis with sparse data • Provides traditional SVD for more stable results • Provides PCA (Principal Components Analysis) scores Oracle Machine Learning SVD implementation Copyright © 2022, Oracle and/or its affiliates 9

Slide 10

Slide 10 text

• State-of-the-art algorithm for Feature Extraction • Dimensionality reduction technique • Creates new features of existing attributes (in contrast to Attribute Importance, which reduces attributes by taking a subset) • NMF derives fewer new “features” taking into account interactions among original attributes • Supports text mining, life sciences and marketing applications Non-negative Matrix Factorization - NMF Copyright © 2022, Oracle and/or its affiliates 10

Slide 11

Slide 11 text

Two options to access the Template Examples Copyright © 2022, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 11

Slide 12

Slide 12 text

There are Several Feature Extraction demos specific to certain algorithms Attribute Importance using MDL Attribute Importance using EM Type "attr"

Slide 13

Slide 13 text

Dimensionality Reduction using SVD Type "dim" There are Several Feature Extraction demos specific to certain algorithms Dimensionality Reduction using NMF Dimensionality Reduction using SVD

Slide 14

Slide 14 text

Feature Extraction using ESA Type "extr" There are Several Feature Extraction demos specific to certain algorithms Feature Extraction using ESA Wikipedia model

Slide 15

Slide 15 text

Click Create Notebook to create a copy for yourself Copyright © 2022, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 15 Click on "Create Notebook" Give it a name (and optionally a description) Click OK

Slide 16

Slide 16 text

The new notebook will show up in the notebooks listing. Copyright © 2022, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 16

Slide 17

Slide 17 text

Q & A Copyright © 2022, Oracle and/or its affiliates 17

Slide 18

Slide 18 text

Copyright © 2022, Oracle and/or its affiliates. 18 Thank you