Machine Learning 101: Feature Extraction

Slide 1

Slide 1 text

With Marcos Arancibia, Product Manager, Data Science and Big Data @MarcosArancibia Mark Hornick, Senior Director, Product Management, Data Science and Machine Learning @MarkHornick oracle.com/machine-learning Oracle Machine Learning Office Hours Machine Learning 101 – Feature Extraction Copyright © 2020, Oracle and/or its affiliates. All rights reserved

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Next Session January 2021: Oracle Machine Office Hours Machine Learning 102 – Feature Extraction This eighth session in the series will cover Feature Extraction 102, where we continue to learn about methods to extract meaningful attributes from a large number of columns in datasets, explore dimensionality reduction in wider datasets and documents using Explicit Semantic Analysis, and compare the benefits of Feature Extraction as data pre-processing for Machine Learning models. Copyright © 2020, Oracle and/or its affiliates. All rights reserved

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Today’s Session: Machine Learning 101 – Feature Extraction This seventh session in the series will cover Feature Extraction 101, where we learn about the methods to extract meaningful attributes from a large number of columns in datasets, explore dimensionality reduction and how it can be beneficial as a pre- processing for machine learning models. Copyright © 2020, Oracle and/or its affiliates. All rights reserved

Slide 7

Slide 7 text

What is Feature Extraction? "Feature extraction involves reducing the number of resources required to describe a large set of data" Wikipedia: https://en.wikipedia.org/wiki/Feature_extraction The term "Feature Extraction" is used to denote several different methods that try to extract the most "information" possible from a set of Data by using a combination of the original variables/columns. In general, we can consider Feature Extraction in machine learning to be part of the machine learning pre-processing/data preparation cycle that would go back-and-forth against the modeling stage. Feature Extraction - Introduction Copyright © 2020, Oracle and/or its affiliates 7 Cross-industry standard process for data mining Data Understanding Data Preparation Business Understanding Modeling Evaluation Deployment

Slide 8

Slide 8 text

Part of the capabilities of Feature Extraction tools Feature Selection , also known as dimensionality reduction , variable selection or attribute selection is the process of selecting a subset of relevant features (variables, predictors, columns) for use in machine learning model construction. Basic benefits of reducing the number of features are: • Simplification of models to the core relevant features • Faster to train and score • Potentially reduce the variance and avoid overfitting (and the curse of dimensionality) Several supervised machine learning algorithms can do a "natural" selection of the best attributes via a "weight" given to the features. Other methods can do an unsupervised selection of features by looking at the natural dispersion and trying to select features that translate most of the information (variability) of the entire dataset with as few features as possible. Feature Selection Copyright © 2020, Oracle and/or its affiliates 8

Slide 9

Slide 9 text

Algorithms Some of the methods for Feature Extraction include: - Attribute importance using Minimum Description Length - Feature Extraction methods that use a transformation/translation/rotation of the original attribute axis, or a decomposition of the original variables into a set of matrices, like: - Principal Component Analysis, - Singular Value Decomposition, - Non-Negative Matrix Factorization, - CUR Matrix Decomposition, - Explicit Semantic Analysis for NLP and information retrieval. Using transformations or simply the exclusion of variables/columns with lower relationship with the target is helpful when building predictive models with machine learning, and because good data preparation is usually 90% of the work, Feature Extraction might be a key element to assist in a better model. Feature Extraction Copyright © 2020, Oracle and/or its affiliates 9

Slide 10

Slide 10 text

• Compute the relative importance of predictor variables for predicting a response (target) variable • Gain insight into relevance of variables to guide manual variable selection or reduction, with the goal to reduce predictive model build time and/or improve model accuracy • Attribute Importance uses a Minimum Description Length (MDL) based algorithm that ranks the relative importance of predictor variables in predicting a specified response (target) variable • Pairwise only – each predictor with the target • Supports categorical target (classification) and numeric target (regression) Attribute Importance Copyright © 2020, Oracle and/or its affiliates 10

Slide 11

Slide 11 text

• Includes Auto Data Preparation (Normalization, binning) • Can allow or ignore missing values • Supports partitioned model builds • Does NOT have a "Scoring" action, only presents the attribute importance so the user can explore the results • Returns a relative metric indicating how much the variable contributes to predicting the target • Values > 0 contribute to prediction • Values <= do not contribute or add noise OML Attribute Importance Copyright © 2020, Oracle and/or its affiliates 11

Slide 12

Slide 12 text

• Feature extraction algorithm • Orthogonal linear transformations capture the underlying variance of data by decomposing a rectangular matrix into three matrices: U, D and V • Matrix D is a diagonal matrix and its singular values reflect the amount of data variance captured by the singular vectors Singular Value Decomposition - SVD Copyright © 2020, Oracle and/or its affiliates 12

Slide 13

Slide 13 text

• Supports narrow data via Tall and Skinny solvers • Supports wide data via stochastic solvers • Provides Eigen Solvers for faster analysis with sparse data • Provides traditional SVD for more stable results Oracle Machine Learning SVD implementation Copyright © 2020, Oracle and/or its affiliates 13

Slide 14

Slide 14 text

• State-of-the-art algorithm for Feature Extraction • Dimensionality reduction technique • Creates new features of existing attributes • Compare to Attribute Importance, which reduces attributes by taking a subset • NMF derives fewer new “features” taking into account interactions among original attributes • Supports text mining, life sciences and marketing applications Non-negative Matrix Factorization - NMF Copyright © 2020, Oracle and/or its affiliates 14

Slide 15

Slide 15 text

• Useful where there are many attributes • Each has weak predictability, even ambiguous • But when taken in combination, produce meaningful patterns, topics, or themes • Example: Text • Same word can predict different documents e.g., “hike” can be applied to the outdoors or interest rates • NMF introduces context which is essential for predictive power e.g., “hike” + “mountain” -> “outdoors sports” “hike” + “interest” -> “interest rates” Intuition on NMF Copyright © 2020, Oracle and/or its affiliates 15

Slide 16

Slide 16 text

Attribute values Attribute values Intuition on NMF a b c d e f g h x y z … 1 2 Target values a b c d e f g h x y z … Feat 1 Feat 2 Extracted features Target values 1 2 Feat 3 Feat 4 Copyright © 2020, Oracle and/or its affiliates 16

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

A more interpretable model than LDA (Latent Dirichlet Allocation) In NLP and information retrieval, ESA is a vectorial representation of text (individual words or entire documents) that uses a document corpus as a knowledge base • A word is represented as a column vector in the TF-IDF matrix of the text corpus • A document (string of words) is represented as the centroid of the vectors representing its words Text corpus often is English Wikipedia, though other corpora can be used Designed to improve text categorization • Computes "semantic relatedness" using cosine similarity between aforementioned vectors, collectively interpreted as a space of "concepts explicitly defined and described by humans“ • Wikipedia articles are equated with concepts Usual Objectives: • Calculate semantic similarity between text documents or between mixed data • Explicit topic modeling for text Explicit Semantic Analysis (ESA) Copyright © 2020, Oracle and/or its affiliates 20

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Thank You Marcos Arancibia | [email protected] Mark Hornick | [email protected] Oracle Machine Learning Product Management