2003: LDA Same method independently found in population genetics [Pritchard+ 200] ๏ 2003-: Extensions of LDA ๏ 2007-: Scalable algorithms 7 History of the topic models
a set of “Words”. ๏ “Document” consists of multiple “Topics”. ๏ “Topic” is a distribution over a vocabulary (all possible words). ๏ “Words” are generated by “Topics”.
for each document 1. Randomly choose a distribution over topics. 2. For each word in the document a) Randomly choose a topic from the distribution over topic in step #1. b) Randomly choose a word from the corresponding topic. ๏ “Document” is a set of “Words”. ๏ “Document” consists of multiple “Topics”. ๏ “Topic” is a distribution over a vocabulary (all possible words). ๏ “Words” are generated by “Topics”.
for each document 1. Randomly choose a distribution over topics. 2. For each word in the document a) Randomly choose a topic from the distribution over topic in step #1. b) Randomly choose a word from the corresponding topic. ๏ “Document” is a set of “Words”. ๏ “Document” consists of multiple “Topics”. ๏ “Topic” is a distribution over a vocabulary (all possible words). ๏ “Words” are generated by “Topics”.
for each document 1. Randomly choose a distribution over topics. 2. For each word in the document a) Randomly choose a topic from the distribution over topic in step #1. b) Randomly choose a word from the corresponding topic. ๏ “Document” is a set of “Words”. ๏ “Document” consists of multiple “Topics”. ๏ “Topic” is a distribution over a vocabulary (all possible words). ๏ “Words” are generated by “Topics”.
for each document 1. Randomly choose a distribution over topics. 2. For each word in the document a) Randomly choose a topic from the distribution over topic in step #1. b) Randomly choose a word from the corresponding topic. ๏ “Document” is a set of “Words”. ๏ “Document” consists of multiple “Topics”. ๏ “Topic” is a distribution over a vocabulary (all possible words). ๏ “Words” are generated by “Topics”.
a distribution over topics Step 2b: Choose a word from topic Topic: in word simplex 8 Step 2a: Choose a topic from distribution В ; LDA: finding the optimal sub-simplex to represent documents.
a distribution over topics Step 2b: Choose a word from topic Topic: in word simplex 8 Step 2a: Choose a topic from distribution В ; LDA: finding the optimal sub-simplex to represent documents. sub-simplex
to Probabilistic Topic Models https://www.cs.princeton.edu/~blei/papers/Blei2011.pdf ๏ [Blei 2012] Review Articles: Probabilistic Topic Models Communications of The ACM http://www.cs.princeton.edu/~blei/papers/Blei2012.pdf ๏ [Blei 2012] Probabilistic Topic Models Machine Learning Summer School http://www.cs.princeton.edu/~blei/blei-mlss-2012.pdf ๏ Topic Models by David Blei (video) https://www.youtube.com/watch?v=DDq3OVp9dNA 17 References (2): papers, videos, and articles ๏ What is a good explanation of Latent Dirichlet Allocation? - Quora http://www.quora.com/What-is-a-good-explanation-of-Latent-Dirichlet-Allocation ๏ The LDA Buffet is Now Open by Matthew L. Jockers http://www.matthewjockers.net/2011/09/29/ ๏ [ࠤ౻ 2012] ࢲͷϒοΫϚʔΫ Latent Topic Model (જࡏతτϐοΫϞσϧ) http://www.ai-gakkai.or.jp/my-bookmark_vol27-no3/ ๏ [࣋ڮ&ੴࠇ 2013] ֬తτϐοΫϞσϧ ౷ܭཧݚڀॴ H24ެ։ߨ࠲ http://www.ism.ac.jp/~daichi/lectures/ISM-2012-TopicModels-daichi.pdf ๏ Links to the Papers Related to Topic Models by Tomonori Masada http://tmasada.wikispaces.com/Links+to+the+Papers+Related+to+Topic+Models