pycon_delhi_lightening

News classification with Gensim Devashish Deshpande Undergraduate student RaRe Technologies
Incubator Program Github: dsquareindia Blogs: https://rare-technologies.com/blog/

Gensim: Topic modeling in python

Problem of News (mis)classification

Screenshots from play newsstand

Topic-word coloring with LDA Image taken from LDA paper by
David Blei

What is a good LDA model? • Come up with
good topics • Infer topic distribution (United topic): mourinho, red_devils, old_trafford, bad_team... (Arsenal topic): wenger, henry, invincibles,.... (City topic): aguero, etihad, england, premier_league (Chelsea topic): blues, football, roman, bridge,... Football LDA model

Evaluating topic models • Manually – Look at the topics.
See if they are interpretable. – Comparing different topic models Qualititative

Topic Coherence • Quantitave

Topic Coherence • Assign a number to the human interpretability!
Comparing topic models becomes much easier

Topic Coherence • Better LDA -> Better topics -> Better
classification Topics from topic modeling tutorial on Lee corpus

Join the community! • Pick up issues from: https://github.com/RaRe-Technologies/gensim •
Come for the sprint!

pycon_delhi_lightening

pycon_delhi_lightening

Devashish Deshpande

Other Decks in Technology

Featured

Transcript

News classification with Gensim Devashish Deshpande Undergraduate student RaRe Technologies

Gensim: Topic modeling in python

Problem of News (mis)classification

Screenshots from play newsstand

Topic-word coloring with LDA Image taken from LDA paper by

What is a good LDA model? • Come up with

Evaluating topic models • Manually – Look at the topics.

Topic Coherence • Quantitave

Topic Coherence • Assign a number to the human interpretability!

Topic Coherence • Better LDA -> Better topics -> Better

Join the community! • Pick up issues from: https://github.com/RaRe-Technologies/gensim •