Pro Yearly is on sale from $80 to $50! »

America's Next Topic Model Lightning talk (5 mins)

America's Next Topic Model Lightning talk (5 mins)

Presented at Pydata London 5 July 2016

39368910dbd6371b507e0b2113dcf4fe?s=128

Lev Konstantinovskiy

July 05, 2016
Tweet

Transcript

  1. America’s Next Topic Model Lev Konstantinovskiy Community Manager at Gensim

    @teagermylk http://rare-technologies.com/
  2. Streaming Topic Modelling and Word2vec in Python

  3. The questions I get asked all the time: - Why

    is your hair blue? - How to choose the best Topic Model?
  4. Business Problem solved by Topic Modelling Bird’s eye view of

    internal company documents Drill down into individual documents by topic. Rather than just keywords!
  5. From Latent Dirichlet Allocation paper by David M. Blei. Words

    colored according to their topic
  6. Colouring words in Gensim bow_water = ['bank','water','river', 'tree'] color_words(goodLdaModel, bow_water)

    bank river water tree color_words(badLdaModel, bow_water) bank river water tree ? river bank or financial bank ?
  7. Automated model selection See "Reading Tea Leaves: How Humans Interpret

    Topic Models by Chang,Boyd-Graber et al". Model fit Human opinion
  8. Topic coherence = human opinion Coherence is how often the

    topic words appear ‘together’ in the corpus. Many ways to define ‘together’ - ‘c_v’ is the best one. goodcm = CoherenceModel(model=goodLdaModel, texts=texts, dictionary=dictionary, coherence='c_v') print goodcm.get_coherence() 0.552164532134 goodcm = CoherenceModel(model=badLdaModel, texts=texts, dictionary=dictionary, coherence='c_v') print goodcm.get_coherence() 0.5269189184
  9. Summary: How to choose your next Topic Model: Manually: -

    Colour words - pyLDAVis Automatically: - Topic coherence C_v
  10. Why is your hair blue? Trying to fit in at

    PyCon in Portland, Oregon
  11. Lev Konstantinovskiy @teagermylk Topic coherence by our incubator student Devashish

    Deshpande Word colouring by our Google Summer of Code student Bhargav Srinivasa See you at PyCon UK Sprints! Monday 19 September
  12. Topic Model of Harry Potter 1. (the Muggle topic) 50%

    “Muggle”, 25% “Dursey”, 10% “Privet”, 5% “Mudblood”... 2. (the Voldemort topic) 65% “Voldemort”, 12% “Death”, 10% “Horcrux”, 5% “Snake”… 3. (the Harry topic) 42% “Harry Potter”, 15% “Scar”, 7% “Quidditch”, 7% “Gryffindor”…
  13. Topic Model of Harry Potter Chapter 1 of Book 1:

    introduces the Dursley family and has Dumbledore discuss Harry’s parent’s death. - 40% Muggle topic - 30% Voldemort topic - 30% Harry