Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Guide to Dimension Reduction

Leland McInnes
October 18, 2018
1.2k

A Guide to Dimension Reduction

Talk given at PyData NYC 2018 on Dimension Reduction: a quick tour of a broad swathe of the field with a focus on core ideas and intuitions rather than technical details.

Leland McInnes

October 18, 2018
Tweet

Transcript

  1. Matrix Factorization Principal Component Analysis Non-negative Matrix Factorization Latent Dirichlet

    Allocation Word2Vec GloVe Generalised Low Rank Models Linear Autoencoder Probablistic PCA Sparse PCA
  2. Neighbour Graphs Locally Linear Embedding Laplacian Eigenmaps Hessian Eigenmaps Local

    Tangent Space Alignment t-SNE UMAP Isomap JSE Spectral Embedding LargeVis NerV
  3. X ≈ UV Where X is an NxD matrix U

    is an Nxd matrix V is an dxD matrix X U V N × D N × d d × D
  4. N ∑ i=1 D ∑ j=1 Loss (Xij , (UV)ij)

    Subject to constraints… Minimize
  5. Classic PCA N ∑ i=1 D ∑ j=1 (Xij −

    (UV)ij) 2 with no constraints Minimize
  6. Sparse PCA N ∑ i=1 D ∑ j=1 (Xij −

    (UV)ij) 2 Subject to Minimize ∥U∥2 = 1 and ∥U∥0 ≤ k
  7. K-Means* N ∑ i=1 D ∑ j=1 (Xij − (UV)ij)

    2 Subject to Minimize ∥U∥2 = 1 and ∥U∥0 = 1
  8. NMF N ∑ i=1 D ∑ j=1 (Xij − (UV)ij)

    2 Subject to Minimize Uij ≥ 0 and Vij ≥ 0
  9. NMF N ∑ i=1 D ∑ j=1 (UV)ij − Xij

    log ((UV)ij) Subject to Minimize Uij ≥ 0 and Vij ≥ 0
  10. −log(P(Xi ∣ Θi )) ∝ G(Θi ) − Xi ⋅

    Θi In general for an exponential family distribution
  11. Normal Matrix Factorization N ∑ i=1 D ∑ j=1 1

    2 ((UV)ij )2 − Xij ⋅ (UV)ij With no constraints Minimize
  12. Normal Matrix Factorization N ∑ i=1 D ∑ j=1 1

    2 ((UV)ij )2 − Xij ⋅ (UV)ij + 1 2 (Xij )2 With no constraints Minimize
  13. Normal Matrix Factorization N ∑ i=1 D ∑ j=1 1

    2 (Xij − (UV)ij) 2 With no constraints Minimize
  14. Poisson Matrix Factorization N ∑ i=1 D ∑ j=1 exp(UV)ij

    − Xij ⋅ (UV)ij With no constraints Minimize
  15. Multinomial Matrix Factorization N ∑ i=1 D ∑ j=1 −

    (UV)ij ⋅ log (Xij) Subject to Minimize (UV)1 = 1 and (UV)ij ≥ 0
  16. Let Uik = P(i|k), Vkj = P(k|j) Then Θij =

    ∑ k Uik ⋅ Vkj = ∑ k P(i|k) ⋅ P(k|j) = P(i|j)
  17. Probabilistic Latent Semantic Indexing N ∑ i=1 D ∑ j=1

    − (UV)ij ⋅ log (Xij) Subject to Minimize U1 = 1, V1 = 1 and Uij ≥ 0,Vij ≥ 0
  18. And that’s LDA* (modulo all the technical details involved in

    the Bayesian inference used for optimization)
  19. *

  20. *

  21. Compute the graph Laplacian* Lij = −w(i, j) di ×

    dj if i ≠ j 1 − w(i, i) di if i = j Where di is the total weight of row i di i
  22. Graph Construction K-Nearest Neighbours weighted according to fancy math* I

    have fun mathematics to explain this which this margin is too small to contain
  23. Framing the problem as a matrix factorization or neighbour graph

    algorithm captures most of the core intuitions