Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning 101: Feature Extraction

Machine Learning 101: Feature Extraction

Have you always been curious about what machine learning can do for your business problem, but could never find the time to learn the practical necessary skills? Do you wish to learn what Classification, Regression, Clustering and Feature Extraction techniques do, and how to apply them using the Oracle Machine Learning family of products?

Join us for this special series “Oracle Machine Learning Office Hours – Machine Learning 101”, where we will go through the main steps of solving a Business Problem from beginning to end, using the different components available in Oracle Machine Learning: programming languages and interfaces, including Notebooks with SQL, UI, and languages like R and Python.

This seventh session in the series covered Extraction 101, and we learned about the methods to extract meaningful attributes from a large number of columns in datasets, explore Dimensionality Reduction and how it can be beneficial as a pre-processing for Machine Learning models.

Marcos Arancibia

December 17, 2020
Tweet

More Decks by Marcos Arancibia

Other Decks in Technology

Transcript

  1. With Marcos Arancibia, Product Manager, Data Science and Big Data

    @MarcosArancibia Mark Hornick, Senior Director, Product Management, Data Science and Machine Learning @MarkHornick oracle.com/machine-learning Oracle Machine Learning Office Hours Machine Learning 101 – Feature Extraction Copyright © 2020, Oracle and/or its affiliates. All rights reserved
  2. Today’s Agenda Upcoming session Speaker Marcos Arancibia – Machine Learning

    101: Feature Extraction Q&A Copyright © 2020 Oracle and/or its affiliates.
  3. Next Session January 2021: Oracle Machine Office Hours Machine Learning

    102 – Feature Extraction This eighth session in the series will cover Feature Extraction 102, where we continue to learn about methods to extract meaningful attributes from a large number of columns in datasets, explore dimensionality reduction in wider datasets and documents using Explicit Semantic Analysis, and compare the benefits of Feature Extraction as data pre-processing for Machine Learning models. Copyright © 2020, Oracle and/or its affiliates. All rights reserved
  4. Today’s Session: Machine Learning 101 – Feature Extraction This seventh

    session in the series will cover Feature Extraction 101, where we learn about the methods to extract meaningful attributes from a large number of columns in datasets, explore dimensionality reduction and how it can be beneficial as a pre- processing for machine learning models. Copyright © 2020, Oracle and/or its affiliates. All rights reserved
  5. What is Feature Extraction? "Feature extraction involves reducing the number

    of resources required to describe a large set of data" Wikipedia: https://en.wikipedia.org/wiki/Feature_extraction The term "Feature Extraction" is used to denote several different methods that try to extract the most "information" possible from a set of Data by using a combination of the original variables/columns. In general, we can consider Feature Extraction in machine learning to be part of the machine learning pre-processing/data preparation cycle that would go back-and-forth against the modeling stage. Feature Extraction - Introduction Copyright © 2020, Oracle and/or its affiliates 7 Cross-industry standard process for data mining Data Understanding Data Preparation Business Understanding Modeling Evaluation Deployment
  6. Part of the capabilities of Feature Extraction tools Feature Selection

    , also known as dimensionality reduction , variable selection or attribute selection is the process of selecting a subset of relevant features (variables, predictors, columns) for use in machine learning model construction. Basic benefits of reducing the number of features are: • Simplification of models to the core relevant features • Faster to train and score • Potentially reduce the variance and avoid overfitting (and the curse of dimensionality) Several supervised machine learning algorithms can do a "natural" selection of the best attributes via a "weight" given to the features. Other methods can do an unsupervised selection of features by looking at the natural dispersion and trying to select features that translate most of the information (variability) of the entire dataset with as few features as possible. Feature Selection Copyright © 2020, Oracle and/or its affiliates 8
  7. Algorithms Some of the methods for Feature Extraction include: -

    Attribute importance using Minimum Description Length - Feature Extraction methods that use a transformation/translation/rotation of the original attribute axis, or a decomposition of the original variables into a set of matrices, like: - Principal Component Analysis, - Singular Value Decomposition, - Non-Negative Matrix Factorization, - CUR Matrix Decomposition, - Explicit Semantic Analysis for NLP and information retrieval. Using transformations or simply the exclusion of variables/columns with lower relationship with the target is helpful when building predictive models with machine learning, and because good data preparation is usually 90% of the work, Feature Extraction might be a key element to assist in a better model. Feature Extraction Copyright © 2020, Oracle and/or its affiliates 9
  8. • Compute the relative importance of predictor variables for predicting

    a response (target) variable • Gain insight into relevance of variables to guide manual variable selection or reduction, with the goal to reduce predictive model build time and/or improve model accuracy • Attribute Importance uses a Minimum Description Length (MDL) based algorithm that ranks the relative importance of predictor variables in predicting a specified response (target) variable • Pairwise only – each predictor with the target • Supports categorical target (classification) and numeric target (regression) Attribute Importance Copyright © 2020, Oracle and/or its affiliates 10
  9. • Includes Auto Data Preparation (Normalization, binning) • Can allow

    or ignore missing values • Supports partitioned model builds • Does NOT have a "Scoring" action, only presents the attribute importance so the user can explore the results • Returns a relative metric indicating how much the variable contributes to predicting the target • Values > 0 contribute to prediction • Values <= do not contribute or add noise OML Attribute Importance Copyright © 2020, Oracle and/or its affiliates 11
  10. • Feature extraction algorithm • Orthogonal linear transformations capture the

    underlying variance of data by decomposing a rectangular matrix into three matrices: U, D and V • Matrix D is a diagonal matrix and its singular values reflect the amount of data variance captured by the singular vectors Singular Value Decomposition - SVD Copyright © 2020, Oracle and/or its affiliates 12
  11. • Supports narrow data via Tall and Skinny solvers •

    Supports wide data via stochastic solvers • Provides Eigen Solvers for faster analysis with sparse data • Provides traditional SVD for more stable results Oracle Machine Learning SVD implementation Copyright © 2020, Oracle and/or its affiliates 13
  12. • State-of-the-art algorithm for Feature Extraction • Dimensionality reduction technique

    • Creates new features of existing attributes • Compare to Attribute Importance, which reduces attributes by taking a subset • NMF derives fewer new “features” taking into account interactions among original attributes • Supports text mining, life sciences and marketing applications Non-negative Matrix Factorization - NMF Copyright © 2020, Oracle and/or its affiliates 14
  13. • Useful where there are many attributes • Each has

    weak predictability, even ambiguous • But when taken in combination, produce meaningful patterns, topics, or themes • Example: Text • Same word can predict different documents e.g., “hike” can be applied to the outdoors or interest rates • NMF introduces context which is essential for predictive power e.g., “hike” + “mountain” -> “outdoors sports” “hike” + “interest” -> “interest rates” Intuition on NMF Copyright © 2020, Oracle and/or its affiliates 15
  14. Attribute values Attribute values Intuition on NMF a b c

    d e f g h x y z … 1 2 Target values a b c d e f g h x y z … Feat 1 Feat 2 Extracted features Target values 1 2 Feat 3 Feat 4 Copyright © 2020, Oracle and/or its affiliates 16
  15. Vector Quantization Methods for Face Representation Copyright © 2020, Oracle

    and/or its affiliates 17 reconstruction encoding (0,0,0,…,1,…,0,0) original ´ =
  16. Principal Component Analysis Methods for Face Representation Copyright © 2020,

    Oracle and/or its affiliates 18 ´ reconstruction encoding (.9,.6,-.5,…,.9,-.3) original =
  17. Non-negative Matrix Factorization Methods for Face Representation Copyright © 2020,

    Oracle and/or its affiliates 19 ´ reconstruction encoding (0,.5,.3,0,1,…,.3,0) original =
  18. A more interpretable model than LDA (Latent Dirichlet Allocation) In

    NLP and information retrieval, ESA is a vectorial representation of text (individual words or entire documents) that uses a document corpus as a knowledge base • A word is represented as a column vector in the TF-IDF matrix of the text corpus • A document (string of words) is represented as the centroid of the vectors representing its words Text corpus often is English Wikipedia, though other corpora can be used Designed to improve text categorization • Computes "semantic relatedness" using cosine similarity between aforementioned vectors, collectively interpreted as a space of "concepts explicitly defined and described by humans“ • Wikipedia articles are equated with concepts Usual Objectives: • Calculate semantic similarity between text documents or between mixed data • Explicit topic modeling for text Explicit Semantic Analysis (ESA) Copyright © 2020, Oracle and/or its affiliates 20
  19. Copyright © 2020, Oracle and/or its affiliates 21 Demo of

    Feature Extraction 101 on OML4Py (ESA will be part of the 102 tutorial)