Topological Data Analysis

Topological Data Analysis

An introduction to topological data analysis presented as a mini-course at the CMS winter meeting 2019. Some of the animated content did not survive the translation to PDF.

4c76f001e0a3d59cc5a269df70940dfd?s=128

Leland McInnes

December 06, 2019
Tweet

Transcript

  1. Topological Data Analysis Leland McInnes CMS Winter Meeting 2019

  2. Leland McInnes Researcher at the Tutte Institute for Mathematics and

    Computing Maintainer for many machine learning packages umap-learn, hdbscan, pynndescent, enstop Scikit-learn and Scikit-TDA contributor @leland_mcinnes leland.mcinnes@gmail.com
  3. I want to empower users to explore their data

  4. Data Analysis

  5. Name Birthdate Height Weight Alice 1985-10-12 178 55 Bob 1991-01-02

    189 85 Carmen 1978-05-18 170 54 David 1996-11-30 175 72 Eva 1975-09-21 159 45 Frank 1999-06-28 192 80 Gertrude 1943-10-19 181 63 Harold 1982-11-08 176 65
  6. Name Birthdate Height Weight Mohsen March 21st 1985 5’8” 55

    kg Type Name Here 12/31/1991 almost 6 feet 178 Rahul Pushpakumara 30-05-2001 170 cm ??? Luuk Sander van der Berg 02/07/05 1.75 metres N/A Yumiko❤✨ Feb 31 1978 short Akwesi Olatunji 1970-01-01 5 foot 11 80 王秀英 5’9” 130 Robert Garcia-Smith Jr. January I think? 176 65
  7. What questions do you want answered?

  8. What questions should you be asking?

  9. Look at your data!

  10. Name Birthdate Height Weight Mohsen March 21st 1985 5’8” 55

    kg Type Name Here 12/31/1991 almost 6 feet 178 Rahul Pushpakumara 30-05-2001 170 cm ??? Luuk Sander van der Berg 02/07/05 1.75 metres N/A Yumiko❤✨ Feb 31 1978 short Akwesi Olatunji 1970-01-01 5 foot 11 80 王秀英 5’9” 130 Robert Garcia-Smith Jr. January I think? 176 65
  11. Petal width Petal length Sepal width Sepal length Species 5.1

    3.5 1.4 0.2 Setosa 4.9 3 1.4 0.2 Setosa 4.7 3.2 1.3 0.2 Setosa 4.6 3.1 1.5 0.2 Setosa 5 3.6 1.4 0.2 Setosa 5.4 3.9 1.7 0.4 Setosa 4.6 3.4 1.4 0.3 Setosa 5 3.4 1.5 0.2 Setosa 4.4 2.9 1.4 0.2 Setosa 4.9 3.1 1.5 0.1 Setosa 5.4 3.7 1.5 0.2 Setosa 4.8 3.4 1.6 0.2 Setosa 4.8 3 1.4 0.1 Setosa
  12. Petal width Petal length Sepal width Sepal length Species 5.1

    3.5 1.4 0.2 Setosa 4.9 3 1.4 0.2 Setosa 4.7 3.2 1.3 0.2 Setosa 4.6 3.1 1.5 0.2 Setosa 5 3.6 1.4 0.2 Setosa 5.4 3.9 1.7 0.4 Setosa 4.6 3.4 1.4 0.3 Setosa 5 3.4 1.5 0.2 Setosa 4.4 2.9 1.4 0.2 Setosa 4.9 3.1 1.5 0.1 Setosa 5.4 3.7 1.5 0.2 Setosa 4.8 3.4 1.6 0.2 Setosa 4.8 3 1.4 0.1 Setosa 4.3 3 1.1 0.1 Setosa 5.8 4 1.2 0.2 Setosa 5.7 4.4 1.5 0.4 Setosa 5.4 3.9 1.3 0.4 Setosa 5.1 3.5 1.4 0.3 Setosa 5.7 3.8 1.7 0.3 Setosa 5.1 3.8 1.5 0.3 Setosa 5.4 3.4 1.7 0.2 Setosa 5.1 3.7 1.5 0.4 Setosa 4.6 3.6 1 0.2 Setosa 5.1 3.3 1.7 0.5 Setosa 4.8 3.4 1.9 0.2 Setosa 5 3 1.6 0.2 Setosa 5 3.4 1.6 0.4 Setosa 5.2 3.5 1.5 0.2 Setosa 5.2 3.4 1.4 0.2 Setosa 4.7 3.2 1.6 0.2 Setosa 4.8 3.1 1.6 0.2 Setosa 5.4 3.4 1.5 0.4 Setosa 5.2 4.1 1.5 0.1 Setosa 5.5 4.2 1.4 0.2 Setosa 4.9 3.1 1.5 0.2 Setosa 5 3.2 1.2 0.2 Setosa 5.5 3.5 1.3 0.2 Setosa Petal width Petal length Sepal width Sepal length Species 4.9 3.6 1.4 0.1 Setosa 4.4 3 1.3 0.2 Setosa 5.1 3.4 1.5 0.2 Setosa 5 3.5 1.3 0.3 Setosa 4.5 2.3 1.3 0.3 Setosa 4.4 3.2 1.3 0.2 Setosa 5 3.5 1.6 0.6 Setosa 5.1 3.8 1.9 0.4 Setosa 4.8 3 1.4 0.3 Setosa 5.1 3.8 1.6 0.2 Setosa 4.6 3.2 1.4 0.2 Setosa 5.3 3.7 1.5 0.2 Setosa 5 3.3 1.4 0.2 Setosa 7 3.2 4.7 1.4 Versicolor 6.4 3.2 4.5 1.5 Versicolor 6.9 3.1 4.9 1.5 Versicolor 5.5 2.3 4 1.3 Versicolor 6.5 2.8 4.6 1.5 Versicolor 5.7 2.8 4.5 1.3 Versicolor 6.3 3.3 4.7 1.6 Versicolor 4.9 2.4 3.3 1 Versicolor 6.6 2.9 4.6 1.3 Versicolor 5.2 2.7 3.9 1.4 Versicolor 5 2 3.5 1 Versicolor 5.9 3 4.2 1.5 Versicolor 6 2.2 4 1 Versicolor 6.1 2.9 4.7 1.4 Versicolor 5.6 2.9 3.6 1.3 Versicolor 6.7 3.1 4.4 1.4 Versicolor 5.6 3 4.5 1.5 Versicolor 5.8 2.7 4.1 1 Versicolor 6.2 2.2 4.5 1.5 Versicolor 5.6 2.5 3.9 1.1 Versicolor 5.9 3.2 4.8 1.8 Versicolor 6.1 2.8 4 1.3 Versicolor 6.3 2.5 4.9 1.5 Versicolor 6.1 2.8 4.7 1.2 Versicolor Petal width Petal length Sepal width Sepal length Species 6.4 2.9 4.3 1.3 Versicolor 6.6 3 4.4 1.4 Versicolor 6.8 2.8 4.8 1.4 Versicolor 6.7 3 5 1.7 Versicolor 6 2.9 4.5 1.5 Versicolor 5.7 2.6 3.5 1 Versicolor 5.5 2.4 3.8 1.1 Versicolor 5.5 2.4 3.7 1 Versicolor 5.8 2.7 3.9 1.2 Versicolor 6 2.7 5.1 1.6 Versicolor 5.4 3 4.5 1.5 Versicolor 6 3.4 4.5 1.6 Versicolor 6.7 3.1 4.7 1.5 Versicolor 6.3 2.3 4.4 1.3 Versicolor 5.6 3 4.1 1.3 Versicolor 5.5 2.5 4 1.3 Versicolor 5.5 2.6 4.4 1.2 Versicolor 6.1 3 4.6 1.4 Versicolor 5.8 2.6 4 1.2 Versicolor 5 2.3 3.3 1 Versicolor 5.6 2.7 4.2 1.3 Versicolor 5.7 3 4.2 1.2 Versicolor 5.7 2.9 4.2 1.3 Versicolor 6.2 2.9 4.3 1.3 Versicolor 5.1 2.5 3 1.1 Versicolor 5.7 2.8 4.1 1.3 Versicolor 6.3 3.3 6 2.5 Virginica 5.8 2.7 5.1 1.9 Virginica 7.1 3 5.9 2.1 Virginica 6.3 2.9 5.6 1.8 Virginica 6.5 3 5.8 2.2 Virginica 7.6 3 6.6 2.1 Virginica 4.9 2.5 4.5 1.7 Virginica 7.3 2.9 6.3 1.8 Virginica 6.7 2.5 5.8 1.8 Virginica 7.2 3.6 6.1 2.5 Virginica 6.5 3.2 5.1 2 Virginica Petal width Petal length Sepal width Sepal length Species 6.4 2.7 5.3 1.9 Virginica 6.8 3 5.5 2.1 Virginica 5.7 2.5 5 2 Virginica 5.8 2.8 5.1 2.4 Virginica 6.4 3.2 5.3 2.3 Virginica 6.5 3 5.5 1.8 Virginica 7.7 3.8 6.7 2.2 Virginica 7.7 2.6 6.9 2.3 Virginica 6 2.2 5 1.5 Virginica 6.9 3.2 5.7 2.3 Virginica 5.6 2.8 4.9 2 Virginica 7.7 2.8 6.7 2 Virginica 6.3 2.7 4.9 1.8 Virginica 6.7 3.3 5.7 2.1 Virginica 7.2 3.2 6 1.8 Virginica 6.2 2.8 4.8 1.8 Virginica 6.1 3 4.9 1.8 Virginica 6.4 2.8 5.6 2.1 Virginica 7.2 3 5.8 1.6 Virginica 7.4 2.8 6.1 1.9 Virginica 7.9 3.8 6.4 2 Virginica 6.4 2.8 5.6 2.2 Virginica 6.3 2.8 5.1 1.5 Virginica 6.1 2.6 5.6 1.4 Virginica 7.7 3 6.1 2.3 Virginica 6.3 3.4 5.6 2.4 Virginica 6.4 3.1 5.5 1.8 Virginica 6 3 4.8 1.8 Virginica 6.9 3.1 5.4 2.1 Virginica 6.7 3.1 5.6 2.4 Virginica 6.9 3.1 5.1 2.3 Virginica 5.8 2.7 5.1 1.9 Virginica 6.8 3.2 5.9 2.3 Virginica 6.7 3.3 5.7 2.5 Virginica 6.7 3 5.2 2.3 Virginica 6.3 2.5 5 1.9 Virginica 6.5 3 5.2 2 Virginica
  13. None
  14. 0.59 0.59 0.61 0.59 0.6 0.57 0.54 0.54 0.6 0.69

    0.67 0.65 0.61 0.6 0.57 0.53 0.47 0.46 0.49 0.51 0.54 0.57 0.39 0.51 0.42 0.39 0.38 0.42 0.52 0.59 0.59 0.6 0.6 0.63 0.65 0.65 0.64 0.62 0.61 0.48 0.26 0.23 0.22 0.22 0.22 0.67 0.63 0.61 0.56 0.54 0.53 0.53 0.58 0.64 0.68 0.65 0.61 0.58 0.57 0.56 0.53 0.52 0.5 0.51 0.53 0.56 0.57 0.39 0.48 0.49 0.4 0.42 0.37 0.38 0.52 0.54 0.58 0.6 0.63 0.65 0.65 0.64 0.64 0.58 0.4 0.33 0.32 0.32 0.31 0.31 0.73 0.69 0.64 0.54 0.47 0.46 0.52 0.61 0.64 0.59 0.6 0.58 0.57 0.53 0.51 0.51 0.53 0.52 0.51 0.51 0.51 0.55 0.49 0.52 0.53 0.45 0.42 0.42 0.44 0.48 0.48 0.55 0.63 0.66 0.64 0.63 0.64 0.61 0.47 0.4 0.39 0.39 0.38 0.38 0.37 0.71 0.7 0.65 0.52 0.47 0.52 0.57 0.62 0.59 0.57 0.61 0.56 0.51 0.54 0.56 0.55 0.49 0.47 0.49 0.55 0.54 0.55 0.49 0.48 0.56 0.53 0.42 0.46 0.52 0.47 0.48 0.51 0.6 0.66 0.65 0.62 0.63 0.57 0.46 0.44 0.44 0.43 0.43 0.42 0.42 0.71 0.69 0.67 0.51 0.48 0.57 0.63 0.61 0.57 0.59 0.58 0.54 0.55 0.59 0.54 0.51 0.51 0.49 0.54 0.58 0.58 0.52 0.55 0.56 0.48 0.53 0.44 0.5 0.51 0.42 0.5 0.54 0.6 0.65 0.63 0.62 0.63 0.56 0.48 0.47 0.47 0.46 0.46 0.46 0.46 0.77 0.75 0.71 0.55 0.52 0.57 0.64 0.61 0.58 0.57 0.55 0.57 0.59 0.59 0.53 0.51 0.52 0.49 0.52 0.58 0.59 0.49 0.54 0.64 0.48 0.51 0.47 0.49 0.47 0.51 0.54 0.59 0.62 0.63 0.63 0.61 0.64 0.54 0.49 0.49 0.49 0.49 0.49 0.48 0.48 0.8 0.79 0.73 0.58 0.59 0.61 0.64 0.62 0.6 0.57 0.55 0.59 0.63 0.61 0.6 0.56 0.57 0.51 0.53 0.6 0.62 0.51 0.58 0.69 0.51 0.52 0.5 0.54 0.46 0.6 0.56 0.58 0.63 0.6 0.65 0.62 0.57 0.48 0.47 0.49 0.5 0.5 0.5 0.5 0.5 0.8 0.79 0.72 0.61 0.62 0.58 0.57 0.59 0.6 0.59 0.59 0.62 0.65 0.66 0.68 0.63 0.61 0.59 0.58 0.63 0.62 0.51 0.64 0.73 0.58 0.55 0.5 0.59 0.55 0.64 0.6 0.57 0.65 0.62 0.64 0.61 0.5 0.48 0.47 0.49 0.5 0.51 0.51 0.51 0.51 0.79 0.77 0.69 0.58 0.58 0.56 0.54 0.47 0.41 0.48 0.54 0.61 0.66 0.64 0.53 0.48 0.45 0.46 0.53 0.63 0.63 0.55 0.69 0.72 0.65 0.61 0.59 0.68 0.66 0.67 0.65 0.57 0.66 0.65 0.64 0.62 0.58 0.57 0.56 0.53 0.5 0.5 0.5 0.52 0.53 0.77 0.76 0.68 0.55 0.55 0.54 0.49 0.42 0.46 0.52 0.59 0.66 0.66 0.44 0.38 0.46 0.36 0.19 0.22 0.42 0.62 0.62 0.68 0.67 0.66 0.66 0.67 0.73 0.72 0.7 0.69 0.59 0.67 0.66 0.65 0.63 0.64 0.65 0.63 0.55 0.46 0.44 0.47 0.51 0.53 0.75 0.72 0.6 0.53 0.54 0.46 0.39 0.49 0.59 0.58 0.61 0.7 0.72 0.45 0.52 0.53 0.32 0.16 0.21 0.32 0.42 0.64 0.68 0.65 0.62 0.64 0.69 0.73 0.71 0.65 0.67 0.61 0.63 0.62 0.67 0.67 0.65 0.67 0.65 0.54 0.4 0.37 0.43 0.49 0.51 0.66 0.57 0.51 0.48 0.45 0.41 0.49 0.61 0.64 0.63 0.59 0.65 0.71 0.5 0.51 0.49 0.19 0.1 0.27 0.4 0.27 0.59 0.69 0.66 0.6 0.6 0.68 0.74 0.68 0.64 0.6 0.39 0.37 0.37 0.51 0.66 0.63 0.62 0.6 0.49 0.36 0.38 0.43 0.47 0.48 0.45 0.45 0.52 0.5 0.49 0.59 0.65 0.68 0.67 0.63 0.58 0.58 0.63 0.61 0.46 0.5 0.27 0.17 0.38 0.43 0.25 0.53 0.7 0.68 0.62 0.59 0.67 0.73 0.67 0.61 0.34 0.19 0.33 0.45 0.49 0.6 0.59 0.62 0.6 0.52 0.43 0.43 0.45 0.46 0.45 0.34 0.47 0.56 0.58 0.6 0.65 0.68 0.72 0.69 0.61 0.56 0.53 0.53 0.61 0.54 0.5 0.49 0.44 0.46 0.32 0.22 0.47 0.71 0.71 0.65 0.61 0.67 0.72 0.66 0.54 0.23 0.12 0.35 0.51 0.59 0.7 0.69 0.72 0.72 0.69 0.65 0.61 0.57 0.51 0.47 0.38 0.5 0.58 0.59 0.56 0.64 0.7 0.73 0.7 0.64 0.59 0.56 0.55 0.53 0.6 0.58 0.48 0.38 0.31 0.3 0.25 0.43 0.7 0.73 0.67 0.64 0.66 0.68 0.64 0.53 0.32 0.22 0.45 0.48 0.67 0.72 0.69 0.73 0.76 0.76 0.76 0.76 0.73 0.69 0.64 0.38 0.49 0.56 0.58 0.51 0.54 0.68 0.71 0.7 0.65 0.57 0.57 0.57 0.55 0.57 0.59 0.65 0.66 0.65 0.57 0.39 0.41 0.67 0.71 0.67 0.66 0.68 0.67 0.64 0.52 0.43 0.43 0.42 0.52 0.71 0.63 0.58 0.6 0.6 0.61 0.64 0.68 0.71 0.7 0.69 0.41 0.5 0.55 0.57 0.55 0.49 0.55 0.61 0.62 0.6 0.53 0.52 0.55 0.56 0.57 0.56 0.59 0.63 0.65 0.59 0.49 0.42 0.57 0.65 0.67 0.7 0.72 0.71 0.62 0.49 0.48 0.47 0.57 0.66 0.6 0.53 0.51 0.52 0.49 0.46 0.47 0.5 0.5 0.5 0.52 0.53 0.53 0.56 0.57 0.58 0.52 0.55 0.54 0.51 0.49 0.49 0.52 0.54 0.55 0.55 0.56 0.57 0.6 0.6 0.6 0.53 0.44 0.52 0.62 0.67 0.71 0.75 0.77 0.62 0.49 0.65 0.65 0.64 0.57 0.54 0.51 0.49 0.48 0.48 0.46 0.43 0.43 0.45 0.46 0.47 0.54 0.54 0.59 0.59 0.6 0.55 0.54 0.61 0.6 0.58 0.56 0.56 0.55 0.54 0.51 0.55 0.55 0.59 0.63 0.62 0.55 0.5 0.5 0.57 0.65 0.7 0.75 0.78 0.67 0.51 0.58 0.57 0.57 0.55 0.53 0.52 0.49 0.47 0.47 0.47 0.46 0.49 0.58 0.62 0.62 0.54 0.53 0.61 0.62 0.62 0.59 0.53 0.55 0.6 0.63 0.62 0.63 0.64 0.62 0.59 0.59 0.61 0.6 0.62 0.6 0.56 0.55 0.48 0.48 0.59 0.67 0.72 0.77 0.66 0.51 0.57 0.56 0.54 0.54 0.56 0.55 0.51 0.5 0.52 0.5 0.5 0.6 0.7 0.73 0.73 0.52 0.52 0.59 0.65 0.64 0.62 0.57 0.53 0.53 0.59 0.62 0.65 0.67 0.67 0.66 0.68 0.66 0.63 0.63 0.63 0.63 0.62 0.54 0.47 0.54 0.63 0.7 0.75 0.63 0.49 0.54 0.54 0.52 0.56 0.63 0.65 0.63 0.64 0.65 0.59 0.56 0.65 0.72 0.74 0.75 0.49 0.51 0.56 0.64 0.67 0.65 0.62 0.59 0.55 0.56 0.62 0.65 0.67 0.69 0.69 0.69 0.67 0.66 0.64 0.58 0.65 0.66 0.58 0.46 0.48 0.58 0.64 0.69 0.62 0.52 0.54 0.56 0.59 0.6 0.63 0.69 0.73 0.74 0.73 0.68 0.61 0.66 0.7 0.72 0.74 0.49 0.5 0.52 0.62 0.67 0.68 0.64 0.6 0.58 0.54 0.54 0.61 0.66 0.68 0.71 0.72 0.68 0.63 0.58 0.58 0.65 0.69 0.59 0.38 0.37 0.45 0.53 0.58 0.53 0.53 0.56 0.58 0.62 0.58 0.56 0.57 0.67 0.73 0.73 0.68 0.59 0.65 0.69 0.72 0.74 0.48 0.46 0.49 0.61 0.67 0.7 0.68 0.66 0.64 0.62 0.56 0.56 0.62 0.66 0.71 0.72 0.71 0.7 0.67 0.61 0.64 0.72 0.69 0.44 0.34 0.42 0.49 0.48 0.43 0.55 0.55 0.58 0.61 0.6 0.56 0.52 0.6 0.66 0.69 0.63 0.56 0.64 0.7 0.73 0.74 0.48 0.46 0.46 0.55 0.66 0.7 0.7 0.69 0.68 0.66 0.62 0.56 0.57 0.64 0.7 0.7 0.68 0.65 0.62 0.62 0.66 0.71 0.74 0.66 0.4 0.3 0.45 0.44 0.55 0.62 0.58 0.63 0.62 0.62 0.6 0.6 0.61 0.57 0.6 0.57 0.55 0.64 0.7 0.74 0.74 0.51 0.47 0.47 0.52 0.64 0.7 0.72 0.71 0.7 0.7 0.68 0.64 0.59 0.59 0.64 0.67 0.66 0.64 0.62 0.64 0.67 0.69 0.7 0.69 0.56 0.34 0.37 0.5 0.62 0.59 0.58 0.62 0.61 0.58 0.57 0.59 0.59 0.55 0.51 0.51 0.58 0.67 0.72 0.74 0.74 0.51 0.49 0.47 0.53 0.65 0.71 0.73 0.73 0.72 0.71 0.69 0.67 0.65 0.62 0.61 0.64 0.63 0.62 0.59 0.59 0.6 0.61 0.6 0.6 0.54 0.29 0.41 0.57 0.58 0.57 0.58 0.61 0.6 0.55 0.54 0.59 0.65 0.65 0.54 0.52 0.64 0.71 0.73 0.74 0.75 0.5 0.52 0.52 0.58 0.67 0.72 0.73 0.73 0.73 0.72 0.71 0.69 0.66 0.65 0.64 0.63 0.62 0.61 0.58 0.55 0.53 0.53 0.52 0.51 0.47 0.41 0.5 0.53 0.53 0.55 0.59 0.58 0.56 0.57 0.63 0.7 0.73 0.7 0.59 0.58 0.68 0.71 0.7 0.7 0.71 0.47 0.53 0.56 0.6 0.67 0.71 0.73 0.73 0.73 0.72 0.71 0.69 0.68 0.66 0.65 0.64 0.63 0.62 0.6 0.57 0.55 0.53 0.52 0.5 0.5 0.51 0.55 0.57 0.57 0.58 0.59 0.58 0.6 0.68 0.73 0.75 0.74 0.68 0.56 0.5 0.57 0.62 0.62 0.63 0.66 0.42 0.52 0.57 0.61 0.66 0.7 0.72 0.73 0.73 0.72 0.71 0.69 0.68 0.67 0.65 0.64 0.63 0.62 0.61 0.6 0.57 0.57 0.55 0.55 0.56 0.58 0.6 0.6 0.58 0.56 0.57 0.63 0.71 0.75 0.76 0.75 0.73 0.66 0.49 0.38 0.48 0.65 0.7 0.7 0.69
  15. None
  16. There are many different lenses through which to view data

  17. Pairs plot Heatmap Flat clustering Topic modelling Dimension reduction Correlation

    plot Density plot Vector quantization Persistence barcode Cluster tree Data table Scatter plot Outlier analysis Histograms Tree map Fourier analysis Time series matrix profiling Clustergram Swarm plot
  18. Each approach reveals some things and hides other things

  19. Look at your data!

  20. Exploring your data through many different lenses can help you

    find the right questions to ask
  21. Beyond plotting the most powerful tools come from machine learning

  22. Machine Learning

  23. Classical Machine Learning

  24. None
  25. None
  26. Are we really just fitting multivariate polynomials?

  27. Image by Randall Munroe: https://xkcd.com/1838/

  28. Unsupervised Learning

  29. None
  30. None
  31. None
  32. Two clusters?

  33. None
  34. None
  35. Look at your data!

  36. What is going to determine which way the data might

    cluster?
  37. How do you compare two data samples?

  38. For unsupervised learning dissimilarity is what matters

  39. What is the geometry of the data under a chosen

    dissimilarity?
  40. Some Topology

  41. Topology (noun): 1. The study of geometrical properties and spatial

    relations unaffected by the continuous change of shape or size of figures. 2. The way in which constituent parts are interrelated or arranged. — Oxford English Dictionary
  42. None
  43. Continuity, infinity, and the continuum can be hard to work

    with.
  44. Can we make topology finite and combinatorial?

  45. Simplicial Complexes

  46. Build simple pieces of varying dimension in a combinatorial way

  47. None
  48. None
  49. None
  50. None
  51. None
  52. We can build up a vast array of topological spaces

    in this purely combinatorial way
  53. None
  54. None
  55. Face maps provide the combinatorics to glue simplicial complexes together

  56. · · · ! ! ! ! X2 ! !

    ! X1 ! !X0 <latexit sha1_base64="oMjPuGNawGKAJKwWOlxY6qAd9Fw=">AAAC3niclVJNSxtBGJ5dP5rGr2iPvQwGwYthZ6Oot0AvHi00Gthd4uxkkgzOziwz71bCkoMXDxXx6u/qrT+k987GFFoxiC8MPDzP+/1OmkthIQh+ef7S8srqh9rH+tr6xuZWY3vnwurCMN5lWmrTS6nlUijeBQGS93LDaZZKfplef6n0y+/cWKHVN5jkPMnoSImhYBQc1W/8jtlAg8WxLVILlF2XsdRqZMRoDNQYfRPH0QHhWbKAxi94vEiY4l4/fLvMu/KR9+erwoJ6v9EMWiQ8PSQhduAkaJ9WoE3ax+ERJq1gZk00t/N+42c80KzIuAImqbURCXJISmpAMMmn9biwPHdt0BGPHFQ04zYpZ+eZ4j3HDPBQG/cU4Bn7b0RJM2snWeo8Mwpj+1KryNe0qIDhSVIKlRfAFXsuNCwkBo2rW+OBMJyBnDhAmRGuV8zG1FAG7kdUS/g7KV4MLsIWcSv6etjsnM3XUUOf0S7aRwQdow46Q+eoi5gXebfeD+/ev/Lv/Af/8dnV9+Yxn9B/5j/9ARVD6z8=</latexit> <latexit sha1_base64="oMjPuGNawGKAJKwWOlxY6qAd9Fw=">AAAC3niclVJNSxtBGJ5dP5rGr2iPvQwGwYthZ6Oot0AvHi00Gthd4uxkkgzOziwz71bCkoMXDxXx6u/qrT+k987GFFoxiC8MPDzP+/1OmkthIQh+ef7S8srqh9rH+tr6xuZWY3vnwurCMN5lWmrTS6nlUijeBQGS93LDaZZKfplef6n0y+/cWKHVN5jkPMnoSImhYBQc1W/8jtlAg8WxLVILlF2XsdRqZMRoDNQYfRPH0QHhWbKAxi94vEiY4l4/fLvMu/KR9+erwoJ6v9EMWiQ8PSQhduAkaJ9WoE3ax+ERJq1gZk00t/N+42c80KzIuAImqbURCXJISmpAMMmn9biwPHdt0BGPHFQ04zYpZ+eZ4j3HDPBQG/cU4Bn7b0RJM2snWeo8Mwpj+1KryNe0qIDhSVIKlRfAFXsuNCwkBo2rW+OBMJyBnDhAmRGuV8zG1FAG7kdUS/g7KV4MLsIWcSv6etjsnM3XUUOf0S7aRwQdow46Q+eoi5gXebfeD+/ev/Lv/Af/8dnV9+Yxn9B/5j/9ARVD6z8=</latexit> <latexit sha1_base64="oMjPuGNawGKAJKwWOlxY6qAd9Fw=">AAAC3niclVJNSxtBGJ5dP5rGr2iPvQwGwYthZ6Oot0AvHi00Gthd4uxkkgzOziwz71bCkoMXDxXx6u/qrT+k987GFFoxiC8MPDzP+/1OmkthIQh+ef7S8srqh9rH+tr6xuZWY3vnwurCMN5lWmrTS6nlUijeBQGS93LDaZZKfplef6n0y+/cWKHVN5jkPMnoSImhYBQc1W/8jtlAg8WxLVILlF2XsdRqZMRoDNQYfRPH0QHhWbKAxi94vEiY4l4/fLvMu/KR9+erwoJ6v9EMWiQ8PSQhduAkaJ9WoE3ax+ERJq1gZk00t/N+42c80KzIuAImqbURCXJISmpAMMmn9biwPHdt0BGPHFQ04zYpZ+eZ4j3HDPBQG/cU4Bn7b0RJM2snWeo8Mwpj+1KryNe0qIDhSVIKlRfAFXsuNCwkBo2rW+OBMJyBnDhAmRGuV8zG1FAG7kdUS/g7KV4MLsIWcSv6etjsnM3XUUOf0S7aRwQdow46Q+eoi5gXebfeD+/ev/Lv/Af/8dnV9+Yxn9B/5j/9ARVD6z8=</latexit> <latexit sha1_base64="oMjPuGNawGKAJKwWOlxY6qAd9Fw=">AAAC3niclVJNSxtBGJ5dP5rGr2iPvQwGwYthZ6Oot0AvHi00Gthd4uxkkgzOziwz71bCkoMXDxXx6u/qrT+k987GFFoxiC8MPDzP+/1OmkthIQh+ef7S8srqh9rH+tr6xuZWY3vnwurCMN5lWmrTS6nlUijeBQGS93LDaZZKfplef6n0y+/cWKHVN5jkPMnoSImhYBQc1W/8jtlAg8WxLVILlF2XsdRqZMRoDNQYfRPH0QHhWbKAxi94vEiY4l4/fLvMu/KR9+erwoJ6v9EMWiQ8PSQhduAkaJ9WoE3ax+ERJq1gZk00t/N+42c80KzIuAImqbURCXJISmpAMMmn9biwPHdt0BGPHFQ04zYpZ+eZ4j3HDPBQG/cU4Bn7b0RJM2snWeo8Mwpj+1KryNe0qIDhSVIKlRfAFXsuNCwkBo2rW+OBMJyBnDhAmRGuV8zG1FAG7kdUS/g7KV4MLsIWcSv6etjsnM3XUUOf0S7aRwQdow46Q+eoi5gXebfeD+/ev/Lv/Af/8dnV9+Yxn9B/5j/9ARVD6z8=</latexit>
  57. This provides us with simple combinatorial and algebraic tools for

    working with topological spaces.
  58. Homology

  59. Boundary map ∂i : ℤ[Xi ] → ℤ[Xi−1 ]

  60. ∂i (∑ k ak σk) = ∑ k,ℓ ak (−1)ℓdℓ

    (σk )
  61. None
  62. None
  63. The sign can provide a direction for a simplex

  64. None
  65. None
  66. None
  67. Boundaries form cycles What if we apply the boundary map

    to a cycle?
  68. None
  69. None
  70. None
  71. None
  72. None
  73. The boundary map will map cycles to zero

  74. None
  75. None
  76. None
  77. Cycles form a group

  78. ∂1 : ℤ[X1 ] → ℤ[X0 ]

  79. ker(∂1 ) ◃ ℤ[X1 ]

  80. None
  81. None
  82. None
  83. None
  84. None
  85. None
  86. There is a group of cycles modulo filled triangles

  87. H1(X) = ker(@1) im(@2) <latexit sha1_base64="cyiBqCgeSl4MAbgU+hMacCKsRn0=">AAACTHicbVBNT9tAEF2HFkJoS4Ajl1UjpORi2XGA5ICEaA9cqoJEIFIcWevNGFZZ26vdNSKy/Dv4NVzh1nv/BzeExDpEtHw8aaSnNzP7dl4oOFPacf5alYVPnxeXqsu1lS9fv63W19ZPVZpJCn2a8lQOQqKAswT6mmkOAyGBxCGHs3Dyo+yfXYJULE1O9FTAKCbnCYsYJdpIQd09DNzmoIX3sB9JQnN/ArLpCyI1IzxwW0Xus/if0G4VQb3h2O225+1uY8f23E635xnS2fG6joNd25mhgeY4CtasBX+c0iyGRFNOlBq6jtCjvHyScihqfqZAEDoh5zA0NCExqFE+u63AW0YZ4yiVphKNZ+r/GzmJlZrGoZmMib5Qb3ul+FFvmOmoO8pZIjINCX02ijKOdYrLoPCYSaCaTw0hVDLzV0wviMlImzhfuVxNBaPmjJ9gzpPwy1j9FiCJTmWZXpGbqs1i65XYeQnpPTlt265nd447jf2DeYBVtIm+oyZy0S7aR4foCPURRdfoBt2iO+uPdW89WI/PoxVrvrOBXqGy+AQsXLLV</latexit>

  88. Hi(X) = ker(@i) im(@i+1) <latexit sha1_base64="pX/E5Oij4V5HHNP5cRAmZ1Lax6Q=">AAACUHicbVBNT9tAEB2HftC0tKE99rIiqpQIybLjAOFQCZUeuCBAIhApjqz1ZgyrrO3V7rpqZPmX9Nf02t648U+4lXWgfJQ+abRPb2Z39r1YCq6N5106jaVnz1+8XH7VfP1m5e271ur7E50XiuGQ5SJXo5hqFDzDoeFG4EgqpGks8DSe7db902+oNM+zYzOXOEnpWcYTzqixUtTa2It4Z9Qln0mYKMrKcIaqE0qqDKci4t2qDHl6L5R83a+6VdRqe26vFwRbG8RzA78/2A4s6W8GA88jvust0IZbHEarzlI4zVmRYmaYoFqPfU+aSVk/ywRWzbDQKCmb0TMcW5rRFPWkXPiryCerTEmSK1uZIQv14Y2SplrP09hOptSc6397tfi/3rgwyWBS8kwWBjN2sygpBDE5qcMiU66QGTG3hDLF7V8JO6c2J2MjfbTl+1xyZm18RWtP4b5ddSBRUZOrOsGqtNVcxLZdY/MupKfkpOf6gds/6rd3vtwGuAwfYQ064MMW7MAeHMIQGPyAn/ALfjsXzpXzp+HcjP494QM8QqN5DV0Us/k=</latexit>

  89. What does ∂0 do? What should ∂0 do?

  90. Topology from Data

  91. Cech Complexes

  92. None
  93. None
  94. None
  95. None
  96. Theorem 1 (Nerve theorem). Let U = {Ui }i2I be

    a cover of a topological space X. If, for all ⇢ I T i2 Ui is either contractible or empty, then N(U) is homtopically equivalent to X. <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit>
  97. Extracting the Cech complex preserves most topological information

  98. Vietoris Rips Complexes

  99. None
  100. None
  101. None
  102. None
  103. Vietoris-Rips complexes are very similar to Cech complexes, but are

    entirely determined by the 1-skeleton
  104. There are lots of related constructions for simplicial complexes: Delaunay

    complexes Alpha complexes Witness complexes …
  105. In all cases: Dissimilarity matters

  106. Mapper

  107. Motivation from Morse Theory

  108. Try to understand a manifold by looking at functions defined

    on that manifold
  109. Look at the function in terms of level-sets

  110. None
  111. None
  112. None
  113. None
  114. Mapper

  115. None
  116. None
  117. None
  118. None
  119. None
  120. None
  121. None
  122. None
  123. None
  124. None
  125. None
  126. Applications

  127. Mapper tends to get used for clustering and exploratory analysis

  128. Identification of type 2 diabetes subgroups through topological analysis of

    patient similarity, Li et al, 2015
  129. Topological data analysis for discovery in preclinical spinal cord injury

    and traumatic brain injury, Nielson et al, 2015
  130. Topology based data analysis identifies a subgroup of breast cancers

    with a unique mutational profile and excellent survival, Nicolau et al, 2011
  131. From 5 to 13: Redefining the Positions in Basketball, Alagappan,

    2012
  132. None
  133. Summary

  134. Provides a novel topological view of data

  135. The choice of function matters a lot

  136. The resolution of the cover of the projection space can

    have a significant impact on results
  137. https://kepler-mapper.scikit-tda.org/

  138. Persistent Homology

  139. Consider a Vietoris- Rips complex…

  140. None
  141. For any given radius there is homology we can compute

  142. By fixing a radius we induce topology and forget metric

    information
  143. How can we get metric information back into our computations?

  144. Vr (X) ⊆ Vr+ϵ (X) ⊆ Vr+2ϵ (X)

  145. Vr(X) // Vr+✏(X) // Vr+2✏(X) Hi(Vr(X)) // Hi(Vr+✏(X)) // Hi(Vr+2✏(X))

    <latexit sha1_base64="Pc65o8G5/iJWo6a0zgn0koUYzak=">AAAClnicbVFda9swFJW9rh/e2qXdS6EvYmElpeDJcdqmL6V0jPYxgyUNxMbIitKKyrKR5BFj/Ef2z/oD+j8qJyEk7Q4ILufce644N844UxqhZ8v+sPFxc2t7x/n0eXfvS2P/YKDSXBLaJylP5TDGinImaF8zzekwkxQnMaf38dPPWr//S6Viqfiji4yGCX4QbMII1oaKGv+CaZFgLdm0dAaRbA1PYIDlSIbwGA6iUp4GNFOMp6J6L7VXtSBw7iLWmnucLBvn3BufFaOlvmZWu1VRo4ncdtv3L84gcn2v0730TdE597sIQc9FMzTBAr2o8RKMU5InVGjCsVIjD2U6LLHUjHBaOUGuaIbJE36gI1MKnFAVlrMMK/jdMGM4SaV5QsMZuzpR4kSpIolNp8nrUb3VavJ/2ijXk25YMpHlmgoyXzTJOdQprA8Cx0xSonlhCkwkM3+F5BFLTLQ529qWaZExUjmzXC5rnC9TeF8M2q7nu53fneb1zSKhbXAEvoEW8MAFuAZ3oAf6gFjAOrZ+WMg+tK/sX/btvNW2FjNfwRrs3iuKkMVc</latexit>
  146. Compute homology for varying radii!

  147. How do we describe the result?

  148. None
  149. Computing Persistent Homology

  150. First, to make things easier, let’s work over a field

  151. ∂i : [Xi ] → [Xi−1 ] Boundary map

  152. Boundary maps are between vector spaces Linear algebra

  153. None
  154. Reduce to Smith normal form …

  155. None
  156. None
  157. None
  158. So simple linear algebra is enough to compute everything

  159. What about persistence?

  160. There is a partial order on simplices based on inclusion

    and arrival time
  161. Construct a boundary matrix over all simplices in order

  162. For simplicity we’ll work in 2

  163. None
  164. Reduce via column additions from left to right

  165. for each column : while such that : add column

    to column j ∃j0 < j low(j0 ) = low(j) j0 j
  166. None
  167. We can now read off the barcode or persistence diagram

  168. None
  169. None
  170. This can be extended to work for other fields, but

    this is the core idea
  171. Applications

  172. Persistent homology has seen use in a diverse range of

    applications
  173. Using Persistent Homology to Quantify a Diurnal Cycle in Hurricane

    Felix, Tymochko et al, 2019
  174. Sliding Windows and Persistence: An Application of Topological Methods to

    Signal Analysis, Perea and Harer, 2014
  175. Topological Eulerian Synthesis of Slow Motion Periodic Videos, Tralie and

    Berger, 2018
  176. Coverage in sensor networks via persistent homology, De Silva and

    Ghrist, 2007
  177. Topological Feature Vectors for Chatter Detection in Turning Processes, Yesilli

    et al, 2019 Chatter Diagnosis in Milling Using Supervised Learning and Topological Features Vector, Yesilli et al, 2019
  178. Persistent Homology: An Introduction and a New Text Representation for

    Natural Language Processing, Zhu, 2013
  179. Implementation

  180. Naively this algorithm is worst case O(m3)

  181. The matrix is sparse and significant practical speedups can be

    gained through careful use of sparse matrix data structures
  182. Specializing to specific cases (e.g. only Vietoris-Rips complexes) can provide

    significant gains
  183. https://github.com/scikit-tda/ripser.py

  184. Clustering

  185. Find groups of points that are similar

  186. What counts as a group?

  187. What makes things similar enough to group together?

  188. None
  189. A connected component of a super-level-set of the probability density

    function of the underlying (and unknown) distribution from which our data samples are drawn.
  190. None
  191. None
  192. None
  193. None
  194. None
  195. How can we do such a thing in practice?

  196. HDBSCAN

  197. We need an effective density function over sample points

  198. None
  199. None
  200. None
  201. Connected components is something persistent homology could tell us

  202. Can we weave density and connectedness together?

  203. None
  204. Mutual Reachability Distance

  205. None
  206. None
  207. None
  208. None
  209. Applications

  210. Untangling the Galaxy. I. Local Structure and Star Formation History

    of the Milky Way, Kounkel and Covey, 2019
  211. Unsupervised star, galaxy, qso classification Application of HDBSCAN, Logan and

    Fotopoulou, 2019
  212. Manifold learning of four-dimensional scanning transmission electron microscopy, Li et

    al, 2019 Machine learning for the structure–energy–property landscapes of molecular crystals, Musil et al, 2017
  213. Uncovering Large-Scale Conformational Change in Molecular Dynamics without Prior Knowledge,

    Melvin et al, 2016
  214. Unsupervised clustering of temporal patterns in high-dimensional neuronal ensembles using

    a novel dissimilarity measure, Grossberger et al, 2018 Computational analysis of laminar structure of the human cortex based on local neuron features, Štajduhar et al, 2019
  215. Parallels in the sequential organization of birdsong and human speech,

    Sainburg et al, 2019
  216. https://www.esri.com/arcgis-blog/products/geoanalytics-server/analytics/geoanalytics-detect-delays-public-transit/

  217. Our Shared Digital Future Building an Inclusive, Trustworthy and Sustainable

    Digital Society, World Economic Forum Insight report, 2018
  218. Implementation

  219. If we have points then there are 1-simplices and persistent

    homology computations will be worst case N O(N2) O((N2)3) = O(N6)
  220. Specializations described before can get it down to O(N4)

  221. Can we do any better?

  222. Connected components of a weighted graph Minimum spanning trees

  223. None
  224. None
  225. Spatial Indexing Ram, Lee, Ouyang, Gray 2009

  226. March, Ram, Gray 2010 Curtin, March, Ram, Anderson, Gray, Isbell

    2013 Dual Tree Boruvka for Euclidean Minimum Spanning Trees Where and are a data dependent constants and is the inverse Ackermann function O(max{c6, c2 p , c2 l }N log(N)α(N)) c, cp cl α
  227. With a little finessing this can provide the core for

    HDBSCAN in time O(N log(N)α(N))
  228. https://github.com/scikit-learn-contrib/hdbscan

  229. Dimension Reduction

  230. Find the “latent” features in your data

  231. None
  232. None
  233. Matrix Factorization Neighbour Graphs

  234. Matrix Factorization Principal Component Analysis Non-negative Matrix Factorization Latent Dirichlet

    Allocation Word2Vec GloVe Generalised Low Rank Models Linear Autoencoder
  235. Neighbour Graphs Locally Linear Embedding Laplacian Eigenmaps Hessian Eigenmaps Local

    Tangent Space Alignment t-SNE UMAP Isomap JSE
  236. PCA is the prototypical matrix factorization

  237. PCA on MNIST digits

  238. PCA on Fashion MNIST

  239. t-SNE is the current state-of-the art for neighbour graphs

  240. t-SNE on MNIST digits

  241. t-SNE on Fashion MNIST

  242. UMAP

  243. UMAP builds mathematical theory to justify the graph based approach

  244. Theorem 1 (Nerve theorem). Let U = {Ui }i2I be

    a cover of a topological space X. If, for all ⇢ I T i2 Ui is either contractible or empty, then N(U) is homtopically equivalent to X. <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit> <latexit sha1_base64="8ITSuq3xcb28tfscSBtUyYXmYf8=">AAADUHicdZJNb9NAEIbXCR8lfLSFAwcuIxKkIlVRnICaHpAqcSESqoqUtJHiKFpvxs6q67XxrlMiy/warvBjuPFPuME4MVWoYC8ez+zO+86z6ydKGtvp/HBq9Vu379zdude4/+Dho929/cfnJs5SgSMRqzgd+9ygkhpHVlqF4yRFHvkKL/zLt2X9YompkbEe2lWC04iHWgZScEup2b7z1PMxlDq3C4xTjIrJKaZLhOp32niPFlpexO1CcJWPCngDXj6aSa+Y5dKTGgZFC3wEDiImIYgDCm2ckHJIKgpMwgVCa9xqwyA4hCBOgStFPY0MIw6eyXxDGoMWpXwZCp5UjTcbCiCxFkgDKMlUSjLaplxYSSMCNcMosavD0rDeMnpaHGyZfrlusIgjMlaaUivAj5lccoXaktu1vYaHen7NYbbX7LTd7vErtwsU9Du94zLoub2j7mtw2531arJqnRFK8OaxyCJqKRQ3ZuJ2EjvNeWqlUFg0vMwgsbjkIU4o1DxCM83XV1jAC8rM13ACmg/W2e0TOY+MWUU+7SzHMjdrZfJftUlmg/40lzrJLGqxEQoyVQ5dvgeYyxSFJSBzyUUqySuIBS8B06u5qWIXEc2h8aqidE0rH1ZBw6PLJKqhXeSexU/2Ss7JWd4XVCOmf8DB/4Pzbtsl4h+6zZNhRXeHPWPP2QFz2RE7Ye/YGRsx4Xx2vjhfnW+177WftV91Z7O1Vn3ZE/bXqjd+A6KpECw=</latexit>
  245. None
  246. None
  247. None
  248. If the data is uniformly distributed on the manifold then

    the cover will be “good”
  249. None
  250. Vary the notion of distance according to the density

  251. None
  252. We can do this all formally using some category theory

    sleight-of-hand
  253. Suppose we were given a low dimensional representation

  254. We can apply the same process to get a probabilistic

    graph!
  255. Now measure the distance between the graphs using cross- entropy

    and optimize
  256. X a2A µ(a) log ✓ µ(a) ⌫(a) ◆ + (1

    µ(a)) log ✓ 1 µ(a) 1 ⌫(a) ◆ <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit>
  257. We are just embedding the graph

  258. X a2A µ(a) log ✓ µ(a) ⌫(a) ◆ + (1

    µ(a)) log ✓ 1 µ(a) 1 ⌫(a) ◆ <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> <latexit sha1_base64="u7fUXwg3iBtccLqdNJCqNBZ5RiA=">AAADFXicfVHLjtMwFHXCY4byamHJxqJC6ghRJSloZnaDQIgNYoB2ZqS6qlz3JrHGcSLbYaisbPgJvoYdYsuaJX+Ck2YEHR5Xsnx0jo/u9T2LQnBtguC751+6fOXq1va1zvUbN2/d7vbuHOm8VAwmLBe5OllQDYJLmBhuBJwUCmi2EHC8OH1W68fvQWmey7FZFTDLaCJ5zBk1jpp3LZkSXWZzSwmX+GmFSVYO6A4ReUIExGZAYkWZXbOVJbK5ieJJanbwQzwI8aPW84fpl1St8YaZzObdfjAMo/3HYYQd2AtG+zUYhaPd6AkOh0FTfdTW4bznfSTLnJUZSMME1XoaBoWZWaoMZwKqDik1FJSd0gSmDkqagZ7ZZksVfuCYJY5z5Y40uGF/d1iaab3KFu5lRk2qL2o1+TdtWpp4b2a5LEoDkq0bxaXAJsf1yvGSK2BGrBygTHE3K2YpdRsyLpiLXUyauX9IODMp5Aoy296VHbegQzS4nGViUksMfDBnfOkms2HEavE5uNUoeOXGfF2AoiZXlrzg8i1Q4QJsxo/tOfEfwzsukw1DQ1QutfNo8L/BUTQMXaZvov7BuM1vG91D99EAhWgXHaCX6BBNEEM/vC2v6/X8T/5n/4v/df3U91rPXbRR/refSHP9Ew==</latexit> Get the clumps right Get the gaps right
  259. UMAP on MNIST digits

  260. UMAP on Fashion MNIST

  261. Applications

  262. Exploring Neural Networks with Activation Atlases, Carter et al, 2019

  263. Visualizing and Measuring the Geometry of BERT, Coenen et al,

    2019
  264. The single-cell transcriptional landscape of mammalian organogenesis, Cao et al,

    2019
  265. A lineage-resolved molecular atlas of C. elegans embryogenesis at single

    cell resolution, Packer et al, 2019
  266. Dimensionality reduction for visualizing single-cell data using UMAP, Becht et

    al, 2019
  267. UMAP reveals cryptic population structure and phenotype heterogeneity in large

    genomic cohorts, Diaz-Papkovich et al, 2019
  268. OpenSyllabus Galaxy, McClure, 2019

  269. Modeling the Structure of Recent Philosophy, Noichl, 2019

  270. TimeCluster: dimension reduction applied to temporal data for visual analytics,

    Ali et al, 2019
  271. Identifying galaxies, quasars and stars with machine learning: a new

    catalogue of classifications for 111 million SDSS sources without spectra, Clark et al, 2019
  272. None
  273. Implementation

  274. Algorithm has two hard components: 1. Find near neighbours 2.

    Optimize according to the cross entropy
  275. Near neighbours via NN-Descent

  276. Optimization via Stochastic Gradient Descent with negative sampling

  277. Performance Comparison t-SNE UMAP COIL20 20 seconds 7 seconds MNIST

    22 minutes 98 seconds Fashion MNIST 15 minutes 78 seconds GoogleNews 4.5 hours 14 minutes UMAP speed up over t-SNE COIL20 3x MNIST 13x Fashion MNIST 11x GoogleNews 19x
  278. https://github.com/lmcinnes/umap

  279. Conclusions

  280. Understanding your data is critical for any analysis

  281. Understanding data is an unsupervised learning problem

  282. Unsupervised Inter-relationships Topology

  283. Topological techniques provide powerful methods for exploratory data analysis

  284. TDA Session: Saturday 8:30-10:30 and 16:00-18:00