Ensemble Topic Modelling

Ensemble Topic Modelling

A short lightning talk on ensemble topic modelling with pLSA using the enstop package.

4c76f001e0a3d59cc5a269df70940dfd?s=128

Leland McInnes

July 12, 2019
Tweet

Transcript

  1. Ensemble Topic Modelling Leland McInnes

  2. Model a corpus of documents in terms of underlying “topics”

  3. Topic Modelling as Matrix Factorization

  4. None
  5. None
  6. None
  7. None
  8. LDA and pLSA are probabilistic matrix factorization methods

  9. (Ensembles of) pLSA

  10. Performance?

  11. None
  12. Quality?

  13. None
  14. Instability?

  15. These are hard optimization problems

  16. Topics vary from one run to another

  17. What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282

  18. None
  19. Each cluster represents a stable topic

  20. None
  21. • Greater stability • Determines number of topics automatically •

    Embarrassingly parallel computation
  22. Implementation

  23. sklearn API

  24. None
  25. https://github.com/lmcinnes/enstop

  26. pip install enstop