Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning Topology: topological methods for unsupervised learning

4c76f001e0a3d59cc5a269df70940dfd?s=47 Leland McInnes
February 23, 2019

Learning Topology: topological methods for unsupervised learning

A whirlwind tour of how topological ideas and methods can provide powerful solutions to unsupervised learning problems.

4c76f001e0a3d59cc5a269df70940dfd?s=128

Leland McInnes

February 23, 2019
Tweet

Transcript

  1. Learning Topology Topological Methods for Unsupervised Learning

  2. A whirlwind tour of some topological data analysis techniques

  3. Sound theory Practical application

  4. A Topology Primer

  5. Simplices

  6. None
  7. Theorem 1 (Nerve theorem). Let U = {Ui }i2I be

    a cover of a topological space X. If, for all ⇢ I T i2 Ui is either contractible or empty, then N(U) is homtopically equivalent to X. <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit>
  8. None
  9. None
  10. None
  11. Functor Adjuction Limit Colimit

  12. Functor: A function between domains of discourse

  13. Adjunction: A near equivalence between domains of discourse

  14. Limit: A solution to a system of constraints

  15. Colimit: Gluing together a system of objects

  16. Dimension Reduction

  17. None
  18. If the data is uniformly distributed on the manifold then

    the cover will be “good”
  19. None
  20. When is data that nicely behaved?

  21. Assumption: Data is uniformly distributed on the manifold

  22. Define a Riemannian metric on the manifold to make this

    assumption true
  23. None
  24. Assumption: The manifold is locally connected

  25. None
  26. But our local metrics are all incompatible!

  27. None
  28. Glue things together with colimits?

  29. Theorem 2 (UMAP Adjunction). The functors FinReal : sFuzz !

    FinEPMet and FinSing : FinEPMet ! sFuzz form an adjunction FinReal a FinSing. <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit>
  30. None
  31. Suppose we were given a low dimensional representation

  32. We can apply the same process to get a probabilistic

    graph!
  33. X a2A µ(a) log ✓ µ(a) ⌫(a) ◆ + (1

    µ(a)) log ✓ 1 µ(a) 1 ⌫(a) ◆ <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit>
  34. X a2A µ(a) log ✓ µ(a) ⌫(a) ◆ + (1

    µ(a)) log ✓ 1 µ(a) 1 ⌫(a) ◆ <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> Get the clumps right Get the gaps right
  35. UMAP on MNIST digits

  36. UMAP on Fashion MNIST

  37. UMAP on Kuzushiji-MNIST

  38. Derived from Adam Bielski’s Siamese/Triplet repository: https://github.com/adambielski/siamese-triplet Metric Learning

  39. None
  40. Word Embeddings

  41. You shall know a word by the company it keeps

    — John Rupert Firth
  42. Represent a word as a multinomial distribution of words that

    co-occur with it
  43. A very large and very sparse matrix

  44. Find the manifold on which the words lie

  45. Assumptions: Uniform distribution Locally connected

  46. None
  47. None
  48. None
  49. Use the correct metric for multinomial parameter space

  50. Example embedding of Yelp reviews

  51. Clustering

  52. What do we mean by a cluster?

  53. A cluster is …

  54. A connected component of a level set of the probability

    density function of the underlying (and unknown) distribution from which our data samples are drawn.
  55. None
  56. None
  57. None
  58. None
  59. None
  60. How do we compute that without knowing the PDF?

  61. None
  62. Assumption: Data is distributed on the manifold according to some

    PDF
  63. Choose a Riemannian metric that preserves the distribution

  64. But our local metrics may be incompatible…

  65. Solve the system of constraints using limits?

  66. Theorem 2 (UMAP Adjunction). The functors FinReal : sFuzz !

    FinEPMet and FinSing : FinEPMet ! sFuzz form an adjunction FinReal a FinSing. <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit> <latexit sha1_base64="14RNK4O3bzzPAFIzAUCYYtOueE8=">AAADhnicfVLbbtNAEF3XQIO5pfDIy4oEiafITihp+5QKqHiJCDRpK8VRtF6P46X22tpdt6SWH4Cv5CP4B9bOhYTbSJbGZ+bsXM54acSksu3vxo556/ad3dpd6979Bw8f1fcen8kkExRGNIkSceERCRHjMFJMRXCRCiCxF8G5d/m6jJ9fgZAs4UM1T2ESkxlnAaNEaWha/+FKUDTJuAKRqxASAXGRO4XlejBjfA2NR/3jAT72P2WclsyJhYch4KD8TYTETfeE8Y9AInyE3Zio0AtyeZLd3BTYVcka0klvB31QRRNbhPsL2injsw3aOmeLuXhM04JExJhwTNa9bBR3fSLDK7x6tdmyXOD+eoppvWG3nPbhS6eNtXNgdw5Lp+N0uu197LTsyhpoaYPpnvHF9ROaxcAVjYiUY8dO1SQnQjEagV5TJiEl9JLMYKxdTmKQk7zSpcDPNeJj3bD+uMIVusnISSzlPPZ0Zjml/D1Wgn+LjTMVHExyxtNMAaeLQkEWYb2vUmTsMwFURXPtECqY7hXTkAhCtcp/VFFhrOfgcL3c0q8zGC4dqzySCPhMhbmr4LO6Zr7uLH9Fy9gb0JsR0Nddvk9BEH0P+UqQIq/0k5WqFfAfQqnYFqECStFWyuB/O2ftlqMl/dBu9IZL+WroKXqGXiAHdVEPvUMDNELUODXmxlfjm1kzW+a+2V2k7hhLzhO0ZWbvJ8jSKCc=</latexit>
  67. None
  68. We have captured the topology of the PDF

  69. The connected components functor p0 produces a fuzzy set of

    connected components π0
  70. Exclude components below a threshold cluster size and sort points

    by component membership
  71. None
  72. None
  73. None
  74. None
  75. Conclusions

  76. Topology and category theory provide a different language to frame

    problems
  77. Topological techniques can provide powerful solutions

  78. Hopefully I have motivated you to learn more! leland.mcinnes@gmail.com @leland_mcinnes