$30 off During Our Annual Pro Sale. View Details »

pycon_delhi_lightening

 pycon_delhi_lightening

Lightening talk delivered at PyCon India 2016

Devashish Deshpande

September 24, 2016
Tweet

Other Decks in Technology

Transcript

  1. News classification with Gensim
    Devashish Deshpande
    Undergraduate student
    RaRe Technologies Incubator Program
    Github: dsquareindia
    Blogs: https://rare-technologies.com/blog/

    View Slide

  2. Gensim: Topic modeling in python

    View Slide

  3. Problem of News (mis)classification

    View Slide

  4. Screenshots from play newsstand

    View Slide

  5. Topic-word coloring with LDA
    Image taken from LDA paper by David Blei

    View Slide

  6. What is a good LDA model?

    Come up with good topics

    Infer topic distribution
    (United topic): mourinho, red_devils, old_trafford, bad_team...
    (Arsenal topic): wenger, henry, invincibles,....
    (City topic): aguero, etihad, england, premier_league
    (Chelsea topic): blues, football, roman, bridge,...
    Football LDA model

    View Slide

  7. Evaluating topic models

    Manually
    – Look at the topics. See if they are interpretable.
    – Comparing different topic models
    Qualititative

    View Slide

  8. View Slide

  9. Topic Coherence

    Quantitave

    View Slide

  10. Topic Coherence

    Assign a number to the human interpretability!
    Comparing topic models becomes much easier

    View Slide

  11. Topic Coherence

    Better LDA -> Better topics -> Better classification
    Topics from topic modeling tutorial on Lee corpus

    View Slide

  12. Join the community!

    Pick up issues from:
    https://github.com/RaRe-Technologies/gensim

    Come for the sprint!

    View Slide