Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2016 - Rumman Chowdhury - Deep Dive into Princi...

PyBay
August 21, 2016

2016 - Rumman Chowdhury - Deep Dive into Principal Components Analysis

Description
Commonly used in image recognition, speech to text and text analysis, Principal Components Analysis (or PCA) separates the signal from the noise in your data and reduces your dimensionality so that meaningful analyses can be performed.

Abstract
PCA is vital for reducing high dimensional models with sparsity issues, without sacrificing the information contributed by each feature. In this talk, I will be explaining what happens under the hood during PCA, making the code and math accessible and interpretable.

Bio
Rumman comes to data science from a quantitative social science background. Prior to joining Metis, she was a data scientist at Quotient Technology, where she used retailer transaction data to build an award-winning media targeting model. Her industry experience ranges from public policy, to economics, and consulting. Her prior clients include the World Bank, the Vera Institute of Justice, and the Los Angeles County Museum of the Arts. She holds two undergraduate degrees from MIT, a Masters in Quantitative Methods of the Social Sciences from Columbia, and she is currently finishing her Political Science PhD from the University of California, San Diego. Her dissertation uses machine learning techniques to determine whether single-industry towns have a broken political process. Her passion lies in teaching and learning from teaching. In her spare time, she teaches and practices yoga, reads comic books, and works on her podcast.

PyBay

August 21, 2016
Tweet

More Decks by PyBay

Other Decks in Programming

Transcript

  1. Dimensionality Reduction using
 Principal Components Analysis 
 Rumman Chowdhury, Senior

    Data Scientist @ruchowdh rummanchowdhury.com thisismetis.com
  2. Me: Political Science PhD, Data Scientist, Teacher, Do- Gooder. Check

    me out on twitter: @ruchowdh, or on my website: rummanchowdhury.com (psst, I post cool jobs there) What’s Metis? Metis accelerates the careers of data scientists by providing full-time immersive bootcamps, evening part-time professional development courses, online training, and corporate programs. Who is Rumman? What’s a Metis?
  3. What is PCA? Why do we need dimensionality reduction? Intuition

    behind Principal Components Analysis Coding example
  4. What is PCA? - A shift in perspective - A

    reduction in the number of dimensions
  5. Height Two dimensions: More space but still not so much

    Being close not improbable Cigarettes per day Curse of Dimensionality
  6. Height Three dimensions: Much larger space Being close less probable

    Cigarettes per day Exercise Curse of Dimensionality
  7. Age Height Four dimensions: Omg so much space Being close

    quite improbable Cigarettes per day Exercise Curse of Dimensionality
  8. Thousand dimensions: I specified you with such high resolution, with

    so much detail, that you don’t look like anybody else anymore. You’re unique. Curse of Dimensionality
  9. Height Classification, clustering and other analysis methods become exponentially difficult

    with increasing dimensions. Cigarettes per day Curse of Dimensionality
  10. Height Classification, clustering and other analysis methods become exponentially difficult

    with increasing dimensions. To understand how to divide that huge space, we need a whole lot more data (usually much more than we do or can have). Cigarettes per day Curse of Dimensionality
  11. Height Lots of features, lots of data is best. But

    what if you don’t have the luxury of ginormous amounts of data? Not all features provide the same amount of information. We can reduce the dimensions (compress the data) without necessarily losing too much information. Cigarettes per day Dimensionality Reduction
  12. Feature Extraction Do I have to choose the dimensions among

    existing features? Height Cigarettes per day
  13. Feature Extraction Do I have to choose the dimensions among

    existing features? Height Cigarettes per day
  14. Why do we need dimensionality reduction? - To better perform

    analyses - …without sacrificing the information we get from our features - To better visualize our data
  15. Advantage: You retain more information Disadvantage: You lose interpretability 2D

    Healthy_or_not = logit( β1 (Height) + β2 (Cigarettes per day) ) Feature selection 1D Healthy_or_not = logit( β1 (Height) ) Feature extraction 1D Healthy_or_not = logit( β1 (0.4*Height + 0.6*Cigarettes per day) )
  16. Cigarettes Height 3D → 2D Feature Extraction (PCA) Optimum plane

    Exercise A 1 *(Height) + B 1 *(cigarettes) + C 1 *(Exercise) A 2 *(Height) + B 2 *(Cigarettes) + C 2 *(Exercise)
  17. Singular Value Decomposition The eigenvectors and eigenvalues of a covariance

    (or correlation) matrix represent the "core" of a PCA: The eigenvectors (principal components) determine the directions of the new feature space, and the eigenvalues determine their magnitude. In other words, the eigenvalues explain the variance of the data along the new feature axes. PCA Math
  18. Correlation or Covariance Matrix? Use the correlation matrix to calculate

    the principal components if variables are measured by different scales and you want to standardize them or if the variances differ widely between variables. You can use the covariance or correlation matrix in all other situations. Matrix Selection
  19. Kaiser Method Retain any components with eigenvector values greater than

    1 Scree Test Bar plot that shows the variance explained by each component. Ideally you will see a clear drop-off (elbow). Percent Variance Explained Calculate the sum of variance explained by each component, stop when you reach a point. How do I know how many dimensions to reduce by?
  20. What is the intuition behind PCA? - We are attempting

    to resolve the curse of dimensionality - by shifting our perspective - and keeping the eigenvectors that explain the highest amount of variance. - We select those components based on our end goal, or by particular methods (Kaiser, Scree, % Variance).