Slide 1

Slide 1 text

UMAP Uniform Manifold Approximation and Projection for dimension reduction

Slide 2

Slide 2 text

Who am I? I am a research mathematician at the Tutte Institute for Mathematics and Computing My Ph.D. was in Profinite Lie Rings (no, you don’t care) I now work on applying topological techniques to unsupervised learning problems

Slide 3

Slide 3 text

What is Dimension Reduction?

Slide 4

Slide 4 text

Find the “latent” features in your data

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Matrix Factorization Neighbour Graphs

Slide 8

Slide 8 text

Matrix Factorization Principal Component Analysis Non-negative Matrix Factorization Latent Dirichlet Allocation Word2Vec GloVe Generalised Low Rank Models Linear Autoencoder

Slide 9

Slide 9 text

Neighbour Graphs Locally Linear Embedding Laplacian Eigenmaps Hessian Eigenmaps Local Tangent Space Alignment t-SNE UMAP Isomap JSE

Slide 10

Slide 10 text

PCA is the prototypical matrix factorization

Slide 11

Slide 11 text

PCA on MNIST digits

Slide 12

Slide 12 text

PCA on Fashion MNIST

Slide 13

Slide 13 text

t-SNE is the current state-of-the art for neighbour graphs

Slide 14

Slide 14 text

t-SNE on MNIST digits

Slide 15

Slide 15 text

t-SNE on Fashion MNIST

Slide 16

Slide 16 text

Uniform Manifold Approximation and Projection

Slide 17

Slide 17 text

UMAP builds mathematical theory to justify the graph based approach

Slide 18

Slide 18 text

First, a little bit of topological data analysis…

Slide 19

Slide 19 text

Simplices

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Theorem 1 (Nerve theorem). Let U = {Ui }i2I be a cover of a topological space X. If, for all ⇢ I T i2 Ui is either contractible or empty, then N(U) is homtopically equivalent to X. AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw= AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw= AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw= AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

If the data is uniformly distributed on the manifold then the cover will be “good”

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

When is data that nicely behaved?

Slide 28

Slide 28 text

Assumption: Data is uniformly distributed on the manifold

Slide 29

Slide 29 text

Define a Riemannian metric on the manifold to make this assumption true

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Why choose a fixed radius? Why not have a fuzzy cover?

Slide 33

Slide 33 text

Theorem 2 (UMAP Adjunction). The functors FinReal : sFuzz ! FinEPMet and FinSing : FinEPMet ! sFuzz form an adjunction FinReal a FinSing. AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc= AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc= AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc= AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

Assumption: The manifold is locally connected

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

But our local metrics are all incompatible!

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

Theorem 2 (UMAP Adjunction). The functors FinReal : sFuzz ! FinEPMet and FinSing : FinEPMet ! sFuzz form an adjunction FinReal a FinSing. AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc= AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc= AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc= AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=

Slide 44

Slide 44 text

f(↵, ) = ↵ + ↵ · AAAC0XicfZFNbxMxEIad5auErxSOXCxSpCIg2k2ApgekSiDEBVFo0lbKRpHXO5u16rVX9iwlslYCrvwHfg1XuPNvcL4kWgEjWX71jEcznjcppbAYhr8awYWLly5f2bjavHb9xs1brc3bh1ZXhsOQa6nNccIsSKFgiAIlHJcGWJFIOEpOXszzRx/AWKHVAGcljAs2VSITnKFHk1Z/K9uOmSxz9ojGCSB7QJ/TJaAPl4Q+XoOYpxqXcGvSaoedqLv7JOpSL/phb3cuelFvp/uURp1wEW2yiv3JZuNznGpeFaCQS2btKApLHDtmUHAJdTOuLJSMn7ApjLxUrAA7dosv1vS+JynNtPFHIV3QPyscK6ydFYl/WTDM7fncHP4tN6ow64+dUGWFoPiyUVZJiprO90VTYYCjnHnBuBF+VspzZhhHv9XzXTAv/D8UnGIO2kDhVnftBivRjC14k9QUcxcjfMRTkfrJ3DM+z70EvxkDb/yUb0swDLVx8Suh3gOTtYsX02duDf5TcCDU9EzBAtTetLUz9N/isNuJvKXvuu29wcq+DXKX3CPbJCI7ZI+8JvtkSDj5Rr6TH+RncBDMgk/Bl+XToLGquUPORPD1N7yn5b4= AAAC0XicfZFNbxMxEIad5auErxSOXCxSpCIg2k2ApgekSiDEBVFo0lbKRpHXO5u16rVX9iwlslYCrvwHfg1XuPNvcL4kWgEjWX71jEcznjcppbAYhr8awYWLly5f2bjavHb9xs1brc3bh1ZXhsOQa6nNccIsSKFgiAIlHJcGWJFIOEpOXszzRx/AWKHVAGcljAs2VSITnKFHk1Z/K9uOmSxz9ojGCSB7QJ/TJaAPl4Q+XoOYpxqXcGvSaoedqLv7JOpSL/phb3cuelFvp/uURp1wEW2yiv3JZuNznGpeFaCQS2btKApLHDtmUHAJdTOuLJSMn7ApjLxUrAA7dosv1vS+JynNtPFHIV3QPyscK6ydFYl/WTDM7fncHP4tN6ow64+dUGWFoPiyUVZJiprO90VTYYCjnHnBuBF+VspzZhhHv9XzXTAv/D8UnGIO2kDhVnftBivRjC14k9QUcxcjfMRTkfrJ3DM+z70EvxkDb/yUb0swDLVx8Suh3gOTtYsX02duDf5TcCDU9EzBAtTetLUz9N/isNuJvKXvuu29wcq+DXKX3CPbJCI7ZI+8JvtkSDj5Rr6TH+RncBDMgk/Bl+XToLGquUPORPD1N7yn5b4= AAAC0XicfZFNbxMxEIad5auErxSOXCxSpCIg2k2ApgekSiDEBVFo0lbKRpHXO5u16rVX9iwlslYCrvwHfg1XuPNvcL4kWgEjWX71jEcznjcppbAYhr8awYWLly5f2bjavHb9xs1brc3bh1ZXhsOQa6nNccIsSKFgiAIlHJcGWJFIOEpOXszzRx/AWKHVAGcljAs2VSITnKFHk1Z/K9uOmSxz9ojGCSB7QJ/TJaAPl4Q+XoOYpxqXcGvSaoedqLv7JOpSL/phb3cuelFvp/uURp1wEW2yiv3JZuNznGpeFaCQS2btKApLHDtmUHAJdTOuLJSMn7ApjLxUrAA7dosv1vS+JynNtPFHIV3QPyscK6ydFYl/WTDM7fncHP4tN6ow64+dUGWFoPiyUVZJiprO90VTYYCjnHnBuBF+VspzZhhHv9XzXTAv/D8UnGIO2kDhVnftBivRjC14k9QUcxcjfMRTkfrJ3DM+z70EvxkDb/yUb0swDLVx8Suh3gOTtYsX02duDf5TcCDU9EzBAtTetLUz9N/isNuJvKXvuu29wcq+DXKX3CPbJCI7ZI+8JvtkSDj5Rr6TH+RncBDMgk/Bl+XToLGquUPORPD1N7yn5b4= AAAC0XicfZFNbxMxEIad5auErxSOXCxSpCIg2k2ApgekSiDEBVFo0lbKRpHXO5u16rVX9iwlslYCrvwHfg1XuPNvcL4kWgEjWX71jEcznjcppbAYhr8awYWLly5f2bjavHb9xs1brc3bh1ZXhsOQa6nNccIsSKFgiAIlHJcGWJFIOEpOXszzRx/AWKHVAGcljAs2VSITnKFHk1Z/K9uOmSxz9ojGCSB7QJ/TJaAPl4Q+XoOYpxqXcGvSaoedqLv7JOpSL/phb3cuelFvp/uURp1wEW2yiv3JZuNznGpeFaCQS2btKApLHDtmUHAJdTOuLJSMn7ApjLxUrAA7dosv1vS+JynNtPFHIV3QPyscK6ydFYl/WTDM7fncHP4tN6ow64+dUGWFoPiyUVZJiprO90VTYYCjnHnBuBF+VspzZhhHv9XzXTAv/D8UnGIO2kDhVnftBivRjC14k9QUcxcjfMRTkfrJ3DM+z70EvxkDb/yUb0swDLVx8Suh3gOTtYsX02duDf5TcCDU9EzBAtTetLUz9N/isNuJvKXvuu29wcq+DXKX3CPbJCI7ZI+8JvtkSDj5Rr6TH+RncBDMgk/Bl+XToLGquUPORPD1N7yn5b4= Under a probabilistic fuzzy union the combination of weights on edges is given by

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

Suppose we were given a low dimensional representation

Slide 47

Slide 47 text

We can apply the same process to get a fuzzy graph!

Slide 48

Slide 48 text

Except we know the manifold, and don’t know the “correct” nearest neighbour distance

Slide 49

Slide 49 text

Now measure the distance between the graphs using cross- entropy and optimize

Slide 50

Slide 50 text

X a2A µ(a) log ✓ µ(a) ⌫(a) ◆ + (1 µ(a)) log ✓ 1 µ(a) 1 ⌫(a) ◆ AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew== AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew== AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew== AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==

Slide 51

Slide 51 text

We are just embedding the graph

Slide 52

Slide 52 text

X a2A µ(a) log ✓ µ(a) ⌫(a) ◆ + (1 µ(a)) log ✓ 1 µ(a) 1 ⌫(a) ◆ AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew== AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew== AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew== AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew== Get the clumps right Get the gaps right

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

On real data?

Slide 57

Slide 57 text

UMAP on MNIST digits

Slide 58

Slide 58 text

UMAP on Fashion MNIST

Slide 59

Slide 59 text

Implementation

Slide 60

Slide 60 text

Need to find (approximate) nearest neighbours very efficiently Even in high dimensional space

Slide 61

Slide 61 text

RP-trees + NN-descent Dasgupta + Freund 2008 Dong, Charikar + Li 2011

Slide 62

Slide 62 text

Need to optimize the layout subquadratically

Slide 63

Slide 63 text

SGD + negative sampling Mikolov et al 2013 Tang et al 2016

Slide 64

Slide 64 text

Need to be high level but still fast

Slide 65

Slide 65 text

+

Slide 66

Slide 66 text

Numba is awesome!

Slide 67

Slide 67 text

• High performance • Clean code • Custom distance metrics

Slide 68

Slide 68 text

Performance Comparison t-SNE UMAP COIL20 20 seconds 7 seconds MNIST 22 minutes 98 seconds Fashion MNIST 15 minutes 78 seconds GoogleNews 4.5 hours 14 minutes UMAP speed up over t-SNE COIL20 3x MNIST 13x Fashion MNIST 11x GoogleNews 19x

Slide 69

Slide 69 text

Where Next?

Slide 70

Slide 70 text

Given the mathematical foundation, a number of options are available

Slide 71

Slide 71 text

Embed new unseen points into an existing embedding

Slide 72

Slide 72 text

No content

Slide 73

Slide 73 text

Make use of labels for supervised dimension reduction

Slide 74

Slide 74 text

No content

Slide 75

Slide 75 text

And combine those for metric learning

Slide 76

Slide 76 text

No content

Slide 77

Slide 77 text

Derived from Adam Bielski’s Siamese/Triplet repository: https://github.com/adambielski/siamese-triplet

Slide 78

Slide 78 text

Adding one categorical variable is no harder theoretically than adding many

Slide 79

Slide 79 text

Combine spaces with different metrics Continuous, categorical, ordinal, Haversine, Levenstein, and more …

Slide 80

Slide 80 text

As long as you can provide a metric for the datatype UMAP can combine it with other datatypes!

Slide 81

Slide 81 text

UMAP for pandas dataframes!

Slide 82

Slide 82 text

https://github.com/lmcinnes/umap conda install -c conda-forge umap-learn pip install umap-learn [email protected] @leland_mcinnes