Upgrade to Pro — share decks privately, control downloads, hide ads and more …

America's Next Topic Model Lightning talk (5 mins)

America's Next Topic Model Lightning talk (5 mins)

Presented at Pydata London 5 July 2016

Lev Konstantinovskiy

July 05, 2016
Tweet

More Decks by Lev Konstantinovskiy

Other Decks in Technology

Transcript

  1. The questions I get asked all the time: - Why

    is your hair blue? - How to choose the best Topic Model?
  2. Business Problem solved by Topic Modelling Bird’s eye view of

    internal company documents Drill down into individual documents by topic. Rather than just keywords!
  3. Colouring words in Gensim bow_water = ['bank','water','river', 'tree'] color_words(goodLdaModel, bow_water)

    bank river water tree color_words(badLdaModel, bow_water) bank river water tree ? river bank or financial bank ?
  4. Automated model selection See "Reading Tea Leaves: How Humans Interpret

    Topic Models by Chang,Boyd-Graber et al". Model fit Human opinion
  5. Topic coherence = human opinion Coherence is how often the

    topic words appear ‘together’ in the corpus. Many ways to define ‘together’ - ‘c_v’ is the best one. goodcm = CoherenceModel(model=goodLdaModel, texts=texts, dictionary=dictionary, coherence='c_v') print goodcm.get_coherence() 0.552164532134 goodcm = CoherenceModel(model=badLdaModel, texts=texts, dictionary=dictionary, coherence='c_v') print goodcm.get_coherence() 0.5269189184
  6. Summary: How to choose your next Topic Model: Manually: -

    Colour words - pyLDAVis Automatically: - Topic coherence C_v
  7. Why is your hair blue? Trying to fit in at

    PyCon in Portland, Oregon
  8. Lev Konstantinovskiy @teagermylk Topic coherence by our incubator student Devashish

    Deshpande Word colouring by our Google Summer of Code student Bhargav Srinivasa See you at PyCon UK Sprints! Monday 19 September
  9. Topic Model of Harry Potter 1. (the Muggle topic) 50%

    “Muggle”, 25% “Dursey”, 10% “Privet”, 5% “Mudblood”... 2. (the Voldemort topic) 65% “Voldemort”, 12% “Death”, 10% “Horcrux”, 5% “Snake”… 3. (the Harry topic) 42% “Harry Potter”, 15% “Scar”, 7% “Quidditch”, 7% “Gryffindor”…
  10. Topic Model of Harry Potter Chapter 1 of Book 1:

    introduces the Dursley family and has Dumbledore discuss Harry’s parent’s death. - 40% Muggle topic - 30% Voldemort topic - 30% Harry