Modern Techniques for Dimensional Reduction

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Copyright © ASI 2017 All rights reserved Why is it needed? 7 • Nowadays data is very high dimensional • Many features are highly correlated • Manual feature selection is often impossible • Visualising data requires 2D representations • Most models break down in high dimensions! 1-D Data occupies ⇠ ✓ 1 10 ◆2 of space Data occupies ⇠ ✓ 1 10 ◆n of space Data occupies ⇠ 1 10 of space 2-D n-D Curse of dimensionality

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Copyright © ASI 2017 All rights reserved Principal Component Analysis (PCA) 9 Idea: Find basis that better expresses data How: Eigenvalue decomposition (or SVD) What for: Can then throw away directions of low variance • Rotate to uncorrelated coordinates • Project onto largest variance axes Dimensional reduction with PCA:

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Copyright © ASI 2017 All rights reserved Definition of t-SNE 13 “t-distributed stochastic neighbour embedding” Step 1: Construct a distribution in the high- dimensional space based on pair-wise distance Step 2: Construct a similar distribution (but with wider tails) in the low-dimensional space Step 3: Make the two distributions as similar as possible by minimising their KL divergence pj|i = exp ||xi xj ||2/2 2 i P k6=i exp ||xi xk ||2/2 2 i pij = 1 2N pi|j + pj|i qij = 1 + ||yi yj ||2 1 P k6=` 1 + ||yk y` ||2 1 {y⇤ i } = argmin yi n KL P||Q o = argmin yi ⇢ X j6=k pjk log pjk qjk 1 2 3