Parul Sethi | Visual Analysis of Topic Models

VISUALIZING TOPIC MODELS Parul Sethi @parulsethi @parul1sethi

What do we look for? What is in them? How
to navigate through them Too much textual data.

Bird’s eye view of documents Trafﬁcking 13% Corruption 9% Drugs
12% Wars 25% Election 41%

How are the labels inferred Topic model gives us a
set of words for each topic that it ﬁnds from the collection of documents. It looks through a corpus for these clusters of words and groups them together. In a good topic model, the words in topic make sense, for ex. “navy, ship, captain” and “tobacco, farm, crops.” army, killings, terrorist, bombs Trafﬁcking 13% Corruption 9% Drugs 12% Wars 25% Election 41% overdose, smoke, dealer, rave bribe, scam, economy, illegal vote, party, campaign, candidate slavery, exploitation, victim, human

Topic Model Visualizations • pyLDAvis • Topic Difference • Topic
Networks • Topic Dendrogram • LDA projections

pyLDAvis •Left panel represents the topics which are positioned according
to their inter-topic distances •Right panel represents the list of terms that are most useful in interpreting the selected topic

Topic Difference •Heatmap represent the difference between the topics of
two LDA models •Cell annotation represent the distance value (z), intersecting words (+++) and different words (---) between the topics (x, y) of the two model.

Topic networks •The nodes represent topic •Edges represent connections between
topics created based on their distance •Node’s label deﬁne the topic no. and top 10 words •Edge’s label deﬁne the intersecting/different words between the two connected topics

Topic Dendrogram •Leaves represent topic •Y-axis is the measure of
closeness of either individual topics or their cluster. Lesser the y value at the point of connection, more closely the topic/cluster are connected

But what about documents? • Which topics a document belong
to • Which documents belong to this topic

LDA Projections •Each point represent a document •Distances are based
on difference of topic distribution of documents •We can also discover document clusters using t-sne

THANK YOU!

Parul Sethi | Visual Analysis of Topic Models

Parul Sethi | Visual Analysis of Topic Models

PyData London

More Decks by PyData London

Other Decks in Technology

Featured

Transcript

VISUALIZING TOPIC MODELS Parul Sethi @parulsethi @parul1sethi

What do we look for? What is in them? How

Bird’s eye view of documents Trafﬁcking 13% Corruption 9% Drugs

How are the labels inferred Topic model gives us a

Topic Model Visualizations • pyLDAvis • Topic Difference • Topic

pyLDAvis •Left panel represents the topics which are positioned according

Topic Difference •Heatmap represent the difference between the topics of

Topic networks •The nodes represent topic •Edges represent connections between

Topic Dendrogram •Leaves represent topic •Y-axis is the measure of

But what about documents? • Which topics a document belong

LDA Projections •Each point represent a document •Distances are based

THANK YOU!