Lightening talk delivered at PyCon India 2016
News classification with Gensim
RaRe Technologies Incubator Program
Gensim: Topic modeling in python
Problem of News (mis)classification
Screenshots from play newsstand
Topic-word coloring with LDA
Image taken from LDA paper by David Blei
What is a good LDA model?
Come up with good topics
Infer topic distribution
(United topic): mourinho, red_devils, old_trafford, bad_team...
(Arsenal topic): wenger, henry, invincibles,....
(City topic): aguero, etihad, england, premier_league
(Chelsea topic): blues, football, roman, bridge,...
Football LDA model
Evaluating topic models
– Look at the topics. See if they are interpretable.
– Comparing different topic models
Assign a number to the human interpretability!
Comparing topic models becomes much easier
Better LDA -> Better topics -> Better classification
Topics from topic modeling tutorial on Lee corpus
Join the community!
Pick up issues from:
Come for the sprint!