Slide 1

Slide 1 text

D A T A S C I E N C E Modern Techniques for Dimensional Reduction 1 Copyright © ASI 2017 All rights reserved Machine Learning Tech Sessions 7th December 2017 Ilya Feige Head of ML Research

Slide 2

Slide 2 text

Copyright © ASI 2017 All rights reserved • Head of Research Ilya Feige 2 Copyright © ASI 2017 All rights reserved

Slide 3

Slide 3 text

Artificial Intelligence for everyone 3 Copyright © ASI 2017 All rights reserved

Slide 4

Slide 4 text

Copyright © ASI 2017 All rights reserved About ASI Data Science 4 TECHNOLOGY PEOPLE TRAINING EXPERTISE CONSULTING Copyright © ASI 2017 All rights reserved

Slide 5

Slide 5 text

Copyright © ASI 2017 All rights reserved Outline 5 1. Dimensional reduction motivation 2. PCA reminder 3. T-SNE is amazing! 5

Slide 6

Slide 6 text

Copyright © ASI 2017 All rights reserved Dimensional Reduction 6 The process of reducing the number of variables (features) under consideration in a statistical / machine learning analysis

Slide 7

Slide 7 text

Copyright © ASI 2017 All rights reserved Why is it needed? 7 • Nowadays data is very high dimensional • Many features are highly correlated • Manual feature selection is often impossible • Visualising data requires 2D representations • Most models break down in high dimensions! 1-D Data occupies ⇠ ✓ 1 10 ◆2 of space Data occupies ⇠ ✓ 1 10 ◆n of space Data occupies ⇠ 1 10 of space 2-D n-D Curse of dimensionality

Slide 8

Slide 8 text

Copyright © ASI 2017 All rights reserved PCA Reminder 8

Slide 9

Slide 9 text

Copyright © ASI 2017 All rights reserved Principal Component Analysis (PCA) 9 Idea: Find basis that better expresses data How: Eigenvalue decomposition (or SVD) What for: Can then throw away directions of low variance • Rotate to uncorrelated coordinates • Project onto largest variance axes Dimensional reduction with PCA:

Slide 10

Slide 10 text

Copyright © ASI 2017 All rights reserved 10 MNIST is a dataset images of handwritten digits

Slide 11

Slide 11 text

Copyright © ASI 2017 All rights reserved 11 PCA on MNIST does not separate digits

Slide 12

Slide 12 text

Copyright © ASI 2017 All rights reserved 12 T-SNE is amazing!

Slide 13

Slide 13 text

Copyright © ASI 2017 All rights reserved Definition of t-SNE 13 “t-distributed stochastic neighbour embedding” Step 1: Construct a distribution in the high- dimensional space based on pair-wise distance Step 2: Construct a similar distribution (but with wider tails) in the low-dimensional space Step 3: Make the two distributions as similar as possible by minimising their KL divergence pj|i = exp ||xi xj ||2/2 2 i P k6=i exp ||xi xk ||2/2 2 i pij = 1 2N pi|j + pj|i qij = 1 + ||yi yj ||2 1 P k6=` 1 + ||yk y` ||2 1 {y⇤ i } = argmin yi n KL P||Q o = argmin yi ⇢ X j6=k pjk log pjk qjk 1 2 3

Slide 14

Slide 14 text

Copyright © ASI 2017 All rights reserved 14 t-SNE is able to separate MNIST digits incredibly well

Slide 15

Slide 15 text

Copyright © ASI 2017 All rights reserved Q & A 15

Slide 16

Slide 16 text

Copyright © ASI 2017 All rights reserved 16 PCA effectively separates data in low dimensions PCA on Genotypes:

Slide 17

Slide 17 text

Copyright © ASI 2017 All rights reserved 18 PCA vs t-SNE on twitter data

Slide 18

Slide 18 text

No content