Slide 1

Slide 1 text

Visualizing Topic Models Ben Mabey @bmabey

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

2 Latent Dirichlet Allocation (LDA)

Slide 4

Slide 4 text

2 0 1 … k doc a 0.25 0.14 … 0.02 doc b 0.01 0.30 … 0.09 … … … … 0.31 doc D 0.13 0.07 … 0.01 Document-Topic Distributions Latent Dirichlet Allocation (LDA)

Slide 5

Slide 5 text

2 0 1 … k doc a 0.25 0.14 … 0.02 doc b 0.01 0.30 … 0.09 … … … … 0.31 doc D 0.13 0.07 … 0.01 Document-Topic Distributions 0 1 … k bird 0.002 0.01 … 0.004 coffee 0.001 0.003 … 0.009 … … … … 0.031 work 0.002 0.006 … 0.021 Term-Topic Distributions Latent Dirichlet Allocation (LDA)

Slide 6

Slide 6 text

3

Slide 7

Slide 7 text

3 250k+ stories July 2007 - May 2014

Slide 8

Slide 8 text

3 250k+ stories July 2007 - May 2014 POS tagging w/spaCy

Slide 9

Slide 9 text

3 250k+ stories July 2007 - May 2014 POS tagging w/spaCy Phrase detection w/Gensim

Slide 10

Slide 10 text

3 250k+ stories July 2007 - May 2014 POS tagging w/spaCy Phrase detection w/Gensim Stopword removal & only kept nouns or phrases with nouns

Slide 11

Slide 11 text

3 250k+ stories July 2007 - May 2014 POS tagging w/spaCy Phrase detection w/Gensim Stopword removal & only kept nouns or phrases with nouns Fit LDA models varying the number of topics

Slide 12

Slide 12 text

4 Game written by 14 year old passes Angry Birds as the top free iphone app

Slide 13

Slide 13 text

4 Game written by 14 year old passes Angry Birds as the top free iphone app Topic P(T|D) 58 0.19 38 0.14 16 0.06 … … Document-Topic Distribution

Slide 14

Slide 14 text

4 Game written by 14 year old passes Angry Birds as the top free iphone app Topic P(T|D) 58 0.19 38 0.14 16 0.06 … … Document-Topic Distribution 58 38 16 app game language developer player code mobile video game programming user gaming java app store developer programmer Sorted Topic-Term Distributions

Slide 15

Slide 15 text

5 Topic P(T|D) mobile apps 0.19 38 0.14 16 0.06 … … Table 2 58mobile apps 38video games 16programming app game language developer player code application video game programming user gaming java app store developer programmer mobile play programming language mobile apps 38 16 app game language developer player code mobile video game programming user gaming java app store developer programmer Game written by 14 year old passes Angry Birds as the top free iphone app Document-Topic Distribution Sorted Topic-Term Distributions

Slide 16

Slide 16 text

6 Topic P(T|D) mobile apps 0.19 video games 0.14 16 0.06 … … Table 2 58mobile apps 38video games 16programming app game language developer player code application video game programming user gaming java app store developer programmer mobile play programming language mobile apps video games 16 app game language developer player code mobile video game programming user gaming java app store developer programmer Game written by 14 year old passes Angry Birds as the top free iphone app Document-Topic Distribution Sorted Topic-Term Distributions

Slide 17

Slide 17 text

7 Topic P(T|D) mobile apps 0.19 video games 0.14 programming 0.06 … … Table 2 58mobile apps 38video games 16programming app game language developer player code application video game programming user gaming java app store developer programmer mobile play programming language mobile apps video games programming app game language developer player code mobile video game programming user gaming java app store developer programmer Game written by 14 year old passes Angry Birds as the top free iphone app Document-Topic Distribution Sorted Topic-Term Distributions

Slide 18

Slide 18 text

8 Interpreting Topic Models What  is  the  meaning  of  each  topic?  

Slide 19

Slide 19 text

8 Interpreting Topic Models What  is  the  meaning  of  each  topic?   How  prevalent  is  each  topic?

Slide 20

Slide 20 text

8 Interpreting Topic Models What  is  the  meaning  of  each  topic?   How  prevalent  is  each  topic? How  do  the  topics  relate  to  each  other?

Slide 21

Slide 21 text

8 Interpreting Topic Models What  is  the  meaning  of  each  topic?   How  prevalent  is  each  topic? How  do  the  topics  relate  to  each  other? How  do  the  documents  relate  to  each  other?

Slide 22

Slide 22 text

9 Visualizing Topic Models https://de.dariah.eu/tatom/topic_model_visualization.html

Slide 23

Slide 23 text

10 Visualizing Topic Models https://de.dariah.eu/tatom/topic_model_visualization.html

Slide 24

Slide 24 text

11 Visualizing Topic Models https://de.dariah.eu/tatom/topic_model_visualization.html

Slide 25

Slide 25 text

12 Visualizing Topic Models https://dhs.stanford.edu/algorithmic-literacy/using-word-clouds-for-topic-modeling-results/ Please don’t…

Slide 26

Slide 26 text

LDAvis 13 https://github.com/cpsievert/LDAvis

Slide 27

Slide 27 text

pyLDAvis 14 https://github.com/bmabey/pyLDAvis py

Slide 28

Slide 28 text

pyLDAvis 14 https://github.com/bmabey/pyLDAvis py

Slide 29

Slide 29 text

Demo Time! 15

Slide 30

Slide 30 text

Distinctiveness & Saliency 16 Termite: Visualization Techniques for Assessing Textual Topic Models Jason Chuang, Christopher D. Manning and Jeffrey Heer. 2012 measure  how  much  information  a  term  conveys  about  topics

Slide 31

Slide 31 text

Distinctiveness & Saliency 17 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45

Slide 32

Slide 32 text

Distinctiveness & Saliency 17 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Slide 33

Slide 33 text

Distinctiveness & Saliency 17 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Slide 34

Slide 34 text

Distinctiveness & Saliency 18 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Slide 35

Slide 35 text

Distinctiveness & Saliency 19 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.56 0.13 0.07 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Slide 36

Slide 36 text

Distinctiveness & Saliency 20 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.15 0.28 0.04 apple 20 40 20 0.18 0.32 0.06 angry birds 1 1 30 0.56 0.13 0.07 python 50 5 10 0.41 0.26 0.11 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Slide 37

Slide 37 text

Distinctiveness & Saliency 21 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.15 0.28 0.04 apple 20 40 20 0.18 0.32 0.06 angry birds 1 1 30 0.56 0.13 0.07 python 50 5 10 0.41 0.26 0.11 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 distinctiveness weighted by the term's overall frequency computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Slide 38

Slide 38 text

Distinctiveness & Saliency 21 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.15 0.28 0.04 apple 20 40 20 0.18 0.32 0.06 angry birds 1 1 30 0.56 0.13 0.07 python 50 5 10 0.41 0.26 0.11 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 distinctiveness weighted by the term's overall frequency computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Slide 39

Slide 39 text

Distinctiveness & Saliency 21 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.15 0.28 0.04 apple 20 40 20 0.18 0.32 0.06 angry birds 1 1 30 0.56 0.13 0.07 python 50 5 10 0.41 0.26 0.11 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 distinctiveness weighted by the term's overall frequency computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics

Slide 40

Slide 40 text

Distinctiveness & Saliency 22 measure  how  much  information  a  term  conveys  about  topics…

Slide 41

Slide 41 text

Distinctiveness & Saliency 22 measure  how  much  information  a  term  conveys  about  topics… globally

Slide 42

Slide 42 text

23

Slide 43

Slide 43 text

Thank you! Learn more at http://github.com/bmabey/pyLDAvis Ben Mabey @bmabey http://nbviewer.ipython.org/github/bmabey/hacker_news_topic_modelling/