Leland McInnes
October 18, 2018
1.2k

# A Guide to Dimension Reduction

Talk given at PyData NYC 2018 on Dimension Reduction: a quick tour of a broad swathe of the field with a focus on core ideas and intuitions rather than technical details.

October 18, 2018

## Transcript

2. ### Bluffer’s Guides are lighthearted and humorous surveys providing a condensed

overview of a potentially complicated subject.

7. ### Matrix Factorization Principal Component Analysis Non-negative Matrix Factorization Latent Dirichlet

Allocation Word2Vec GloVe Generalised Low Rank Models Linear Autoencoder Probablistic PCA Sparse PCA
8. ### Neighbour Graphs Locally Linear Embedding Laplacian Eigenmaps Hessian Eigenmaps Local

Tangent Space Alignment t-SNE UMAP Isomap JSE Spectral Embedding LargeVis NerV

11. ### X ≈ UV Where X is an NxD matrix U

is an Nxd matrix V is an dxD matrix X U V N × D N × d d × D
12. ### N ∑ i=1 D ∑ j=1 Loss (Xij , (UV)ij)

Subject to constraints… Minimize

16. ### Classic PCA N ∑ i=1 D ∑ j=1 (Xij −

(UV)ij) 2 with no constraints Minimize
17. ### We can make PCA more interpretable by constraining how many

archetypes can be combined
18. ### Sparse PCA N ∑ i=1 D ∑ j=1 (Xij −

(UV)ij) 2 Subject to Minimize ∥U∥2 = 1 and ∥U∥0 ≤ k

20. ### K-Means* N ∑ i=1 D ∑ j=1 (Xij − (UV)ij)

2 Subject to Minimize ∥U∥2 = 1 and ∥U∥0 = 1

23. ### NMF N ∑ i=1 D ∑ j=1 (Xij − (UV)ij)

2 Subject to Minimize Uij ≥ 0 and Vij ≥ 0
24. ### NMF N ∑ i=1 D ∑ j=1 (UV)ij − Xij

log ((UV)ij) Subject to Minimize Uij ≥ 0 and Vij ≥ 0

UV
27. ### Let the loss be the negative log likelihood of observing

X given O X Θ
28. ### How to parameterize Pr(.|O) ? Use the exponential family of

distributions! Pr( ⋅ ∣ Θ)
29. ### −log(P(Xi ∣ Θi )) ∝ G(Θi ) − Xi ⋅

Θi In general for an exponential family distribution
30. ### Normal Matrix Factorization N ∑ i=1 D ∑ j=1 1

2 ((UV)ij )2 − Xij ⋅ (UV)ij With no constraints Minimize
31. ### Normal Matrix Factorization N ∑ i=1 D ∑ j=1 1

2 ((UV)ij )2 − Xij ⋅ (UV)ij + 1 2 (Xij )2 With no constraints Minimize
32. ### Normal Matrix Factorization N ∑ i=1 D ∑ j=1 1

2 (Xij − (UV)ij) 2 With no constraints Minimize
33. ### Poisson Matrix Factorization N ∑ i=1 D ∑ j=1 exp(UV)ij

− Xij ⋅ (UV)ij With no constraints Minimize
34. ### Binomial Matrix Factorization Bernoulli Matrix Factorization Gamma Matrix Factorization Beta

Matrix Factorization Exponential Matrix Factorization …

37. ### Multinomial Matrix Factorization N ∑ i=1 D ∑ j=1 −

(UV)ij ⋅ log (Xij) Subject to Minimize (UV)1 = 1 and (UV)ij ≥ 0

39. ### Let Uik = P(i|k), Vkj = P(k|j) Then Θij =

∑ k Uik ⋅ Vkj = ∑ k P(i|k) ⋅ P(k|j) = P(i|j)
40. ### Probabilistic Latent Semantic Indexing N ∑ i=1 D ∑ j=1

− (UV)ij ⋅ log (Xij) Subject to Minimize U1 = 1, V1 = 1 and Uij ≥ 0,Vij ≥ 0

42. ### We can apply a Dirichlet prior over the multinomial distributions

for U and V U V
43. ### And that’s LDA* (modulo all the technical details involved in

the Bayesian inference used for optimization)

space?

52. ### Consider the weighted adjacency matrix Aij = { w(i, j)

if (i, j) ∈ E 0 otherwise

56. ### Compute the graph Laplacian* Lij = −w(i, j) di ×

dj if i ≠ j 1 − w(i, i) di if i = j Where di is the total weight of row i di i

of vertices

66. ### Graph Construction K-Nearest Neighbours weighted according to fancy math* I

have fun mathematics to explain this which this margin is too small to contain

70. ### Framing the problem as a matrix factorization or neighbour graph

algorithm captures most of the core intuitions
71. ### This provides a general framework for understanding almost all dimension

reduction techniques