Upgrade to Pro — share decks privately, control downloads, hide ads and more …

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

This talk will present a new approach to dimension reduction called UMAP. UMAP is grounded in manifold learning and topology, making an effort to preserve the topological structure of the data. The resulting algorithm can provide both 2D visualizations of data of comparable quality to t-SNE, and general purpose dimension reduction. UMAP has been implemented as a (scikit-learn compatible) python library that can perform efficient dimension reduction, scaling out to much larger datasets than t-SNE or other comparable algorithms (see http://github.com/lmcinnes/umap).

4c76f001e0a3d59cc5a269df70940dfd?s=128

Leland McInnes

July 12, 2018
Tweet

Transcript

  1. UMAP Uniform Manifold Approximation and Projection for dimension reduction

  2. Who am I? I am a research mathematician at the

    Tutte Institute for Mathematics and Computing My Ph.D. was in Profinite Lie Rings (no, you don’t care) I now work on applying topological techniques to unsupervised learning problems
  3. What is Dimension Reduction?

  4. Find the “latent” features in your data

  5. None
  6. None
  7. Matrix Factorization Neighbour Graphs

  8. Matrix Factorization Principal Component Analysis Non-negative Matrix Factorization Latent Dirichlet

    Allocation Word2Vec GloVe Generalised Low Rank Models Linear Autoencoder
  9. Neighbour Graphs Locally Linear Embedding Laplacian Eigenmaps Hessian Eigenmaps Local

    Tangent Space Alignment t-SNE UMAP Isomap JSE
  10. PCA is the prototypical matrix factorization

  11. PCA on MNIST digits

  12. PCA on Fashion MNIST

  13. t-SNE is the current state-of-the art for neighbour graphs

  14. t-SNE on MNIST digits

  15. t-SNE on Fashion MNIST

  16. Uniform Manifold Approximation and Projection

  17. UMAP builds mathematical theory to justify the graph based approach

  18. First, a little bit of topological data analysis…

  19. Simplices

  20. None
  21. Theorem 1 (Nerve theorem). Let U = {Ui }i2I be

    a cover of a topological space X. If, for all ⇢ I T i2 Ui is either contractible or empty, then N(U) is homtopically equivalent to X. <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit>
  22. None
  23. None
  24. None
  25. If the data is uniformly distributed on the manifold then

    the cover will be “good”
  26. None
  27. When is data that nicely behaved?

  28. Assumption: Data is uniformly distributed on the manifold

  29. Define a Riemannian metric on the manifold to make this

    assumption true
  30. None
  31. None
  32. Why choose a fixed radius? Why not have a fuzzy

    cover?
  33. Theorem 2 (UMAP Adjunction). The functors FinReal : sFuzz !

    FinEPMet and FinSing : FinEPMet ! sFuzz form an adjunction FinReal a FinSing. <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit>
  34. None
  35. Assumption: The manifold is locally connected

  36. None
  37. None
  38. None
  39. None
  40. But our local metrics are all incompatible!

  41. None
  42. None
  43. Theorem 2 (UMAP Adjunction). The functors FinReal : sFuzz !

    FinEPMet and FinSing : FinEPMet ! sFuzz form an adjunction FinReal a FinSing. <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit>
  44. f(↵, ) = ↵ + ↵ · <latexit sha1_base64="iZE9o2NLLCVem811EJWhe5r3D70=">AAAC0XicfZFNbxMxEIad5auErxSOXCxSpCIg2k2ApgekSiDEBVFo0lbKRpHXO5u16rVX9iwlslYCrvwHfg1XuPNvcL4kWgEjWX71jEcznjcppbAYhr8awYWLly5f2bjavHb9xs1brc3bh1ZXhsOQa6nNccIsSKFgiAIlHJcGWJFIOEpOXszzRx/AWKHVAGcljAs2VSITnKFHk1Z/K9uOmSxz9ojGCSB7QJ/TJaAPl4Q+XoOYpxqXcGvSaoedqLv7JOpSL/phb3cuelFvp/uURp1wEW2yiv3JZuNznGpeFaCQS2btKApLHDtmUHAJdTOuLJSMn7ApjLxUrAA7dosv1vS+JynNtPFHIV3QPyscK6ydFYl/WTDM7fncHP4tN6ow64+dUGWFoPiyUVZJiprO90VTYYCjnHnBuBF+VspzZhhHv9XzXTAv/D8UnGIO2kDhVnftBivRjC14k9QUcxcjfMRTkfrJ3DM+z70EvxkDb/yUb0swDLVx8Suh3gOTtYsX02duDf5TcCDU9EzBAtTetLUz9N/isNuJvKXvuu29wcq+DXKX3CPbJCI7ZI+8JvtkSDj5Rr6TH+RncBDMgk/Bl+XToLGquUPORPD1N7yn5b4=</latexit> <latexit

    sha1_base64="iZE9o2NLLCVem811EJWhe5r3D70=">AAAC0XicfZFNbxMxEIad5auErxSOXCxSpCIg2k2ApgekSiDEBVFo0lbKRpHXO5u16rVX9iwlslYCrvwHfg1XuPNvcL4kWgEjWX71jEcznjcppbAYhr8awYWLly5f2bjavHb9xs1brc3bh1ZXhsOQa6nNccIsSKFgiAIlHJcGWJFIOEpOXszzRx/AWKHVAGcljAs2VSITnKFHk1Z/K9uOmSxz9ojGCSB7QJ/TJaAPl4Q+XoOYpxqXcGvSaoedqLv7JOpSL/phb3cuelFvp/uURp1wEW2yiv3JZuNznGpeFaCQS2btKApLHDtmUHAJdTOuLJSMn7ApjLxUrAA7dosv1vS+JynNtPFHIV3QPyscK6ydFYl/WTDM7fncHP4tN6ow64+dUGWFoPiyUVZJiprO90VTYYCjnHnBuBF+VspzZhhHv9XzXTAv/D8UnGIO2kDhVnftBivRjC14k9QUcxcjfMRTkfrJ3DM+z70EvxkDb/yUb0swDLVx8Suh3gOTtYsX02duDf5TcCDU9EzBAtTetLUz9N/isNuJvKXvuu29wcq+DXKX3CPbJCI7ZI+8JvtkSDj5Rr6TH+RncBDMgk/Bl+XToLGquUPORPD1N7yn5b4=</latexit> <latexit sha1_base64="iZE9o2NLLCVem811EJWhe5r3D70=">AAAC0XicfZFNbxMxEIad5auErxSOXCxSpCIg2k2ApgekSiDEBVFo0lbKRpHXO5u16rVX9iwlslYCrvwHfg1XuPNvcL4kWgEjWX71jEcznjcppbAYhr8awYWLly5f2bjavHb9xs1brc3bh1ZXhsOQa6nNccIsSKFgiAIlHJcGWJFIOEpOXszzRx/AWKHVAGcljAs2VSITnKFHk1Z/K9uOmSxz9ojGCSB7QJ/TJaAPl4Q+XoOYpxqXcGvSaoedqLv7JOpSL/phb3cuelFvp/uURp1wEW2yiv3JZuNznGpeFaCQS2btKApLHDtmUHAJdTOuLJSMn7ApjLxUrAA7dosv1vS+JynNtPFHIV3QPyscK6ydFYl/WTDM7fncHP4tN6ow64+dUGWFoPiyUVZJiprO90VTYYCjnHnBuBF+VspzZhhHv9XzXTAv/D8UnGIO2kDhVnftBivRjC14k9QUcxcjfMRTkfrJ3DM+z70EvxkDb/yUb0swDLVx8Suh3gOTtYsX02duDf5TcCDU9EzBAtTetLUz9N/isNuJvKXvuu29wcq+DXKX3CPbJCI7ZI+8JvtkSDj5Rr6TH+RncBDMgk/Bl+XToLGquUPORPD1N7yn5b4=</latexit> <latexit sha1_base64="iZE9o2NLLCVem811EJWhe5r3D70=">AAAC0XicfZFNbxMxEIad5auErxSOXCxSpCIg2k2ApgekSiDEBVFo0lbKRpHXO5u16rVX9iwlslYCrvwHfg1XuPNvcL4kWgEjWX71jEcznjcppbAYhr8awYWLly5f2bjavHb9xs1brc3bh1ZXhsOQa6nNccIsSKFgiAIlHJcGWJFIOEpOXszzRx/AWKHVAGcljAs2VSITnKFHk1Z/K9uOmSxz9ojGCSB7QJ/TJaAPl4Q+XoOYpxqXcGvSaoedqLv7JOpSL/phb3cuelFvp/uURp1wEW2yiv3JZuNznGpeFaCQS2btKApLHDtmUHAJdTOuLJSMn7ApjLxUrAA7dosv1vS+JynNtPFHIV3QPyscK6ydFYl/WTDM7fncHP4tN6ow64+dUGWFoPiyUVZJiprO90VTYYCjnHnBuBF+VspzZhhHv9XzXTAv/D8UnGIO2kDhVnftBivRjC14k9QUcxcjfMRTkfrJ3DM+z70EvxkDb/yUb0swDLVx8Suh3gOTtYsX02duDf5TcCDU9EzBAtTetLUz9N/isNuJvKXvuu29wcq+DXKX3CPbJCI7ZI+8JvtkSDj5Rr6TH+RncBDMgk/Bl+XToLGquUPORPD1N7yn5b4=</latexit> Under a probabilistic fuzzy union the combination of weights on edges is given by
  45. None
  46. Suppose we were given a low dimensional representation

  47. We can apply the same process to get a fuzzy

    graph!
  48. Except we know the manifold, and don’t know the “correct”

    nearest neighbour distance
  49. Now measure the distance between the graphs using cross- entropy

    and optimize
  50. X a2A µ(a) log ✓ µ(a) ⌫(a) ◆ + (1

    µ(a)) log ✓ 1 µ(a) 1 ⌫(a) ◆ <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit>
  51. We are just embedding the graph

  52. X a2A µ(a) log ✓ µ(a) ⌫(a) ◆ + (1

    µ(a)) log ✓ 1 µ(a) 1 ⌫(a) ◆ <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> Get the clumps right Get the gaps right
  53. None
  54. None
  55. None
  56. On real data?

  57. UMAP on MNIST digits

  58. UMAP on Fashion MNIST

  59. Implementation

  60. Need to find (approximate) nearest neighbours very efficiently Even in

    high dimensional space
  61. RP-trees + NN-descent Dasgupta + Freund 2008 Dong, Charikar +

    Li 2011
  62. Need to optimize the layout subquadratically

  63. SGD + negative sampling Mikolov et al 2013 Tang et

    al 2016
  64. Need to be high level but still fast

  65. +

  66. Numba is awesome!

  67. • High performance • Clean code • Custom distance metrics

  68. Performance Comparison t-SNE UMAP COIL20 20 seconds 7 seconds MNIST

    22 minutes 98 seconds Fashion MNIST 15 minutes 78 seconds GoogleNews 4.5 hours 14 minutes UMAP speed up over t-SNE COIL20 3x MNIST 13x Fashion MNIST 11x GoogleNews 19x
  69. Where Next?

  70. Given the mathematical foundation, a number of options are available

  71. Embed new unseen points into an existing embedding

  72. None
  73. Make use of labels for supervised dimension reduction

  74. None
  75. And combine those for metric learning

  76. None
  77. Derived from Adam Bielski’s Siamese/Triplet repository: https://github.com/adambielski/siamese-triplet

  78. Adding one categorical variable is no harder theoretically than adding

    many
  79. Combine spaces with different metrics Continuous, categorical, ordinal, Haversine, Levenstein,

    and more …
  80. As long as you can provide a metric for the

    datatype UMAP can combine it with other datatypes!
  81. UMAP for pandas dataframes!

  82. https://github.com/lmcinnes/umap conda install -c conda-forge umap-learn pip install umap-learn leland.mcinnes@gmail.com

    @leland_mcinnes