OpenTalks.AI - Алексей Бурнаков, Тематическое моделирование новостей на основе детекции цитирований​

Ad8ae7af280edaecb09bd73a551b5e5f?s=47 OpenTalks.AI
February 21, 2020

OpenTalks.AI - Алексей Бурнаков, Тематическое моделирование новостей на основе детекции цитирований​

Ad8ae7af280edaecb09bd73a551b5e5f?s=128

OpenTalks.AI

February 21, 2020
Tweet

Transcript

  1. NEWS TOPIC MODELLING BASED ON CITATION DETECTION Alexey Burnakov, TASS

  2. ITAR-TASS www.tass.ru 115th birthday It's been a while…

  3. News Media Market: A Complex Graph

  4. Citation Detection

  5. News Specific Citation Detection Personal data Headline Editorial Cite number

    Cite index Citing media Date News rating Editor rating Board rating Organization rating
  6. Methods I Cosine similarity Bag of words / tf-idf Generalized

    linear models
  7. Methods II PageRank Random-Walk Graph Partitioning

  8. Citation Detection: results precision = 0.89 recall = 0.87 Logistic

    regression output F1 score = 0.88 MCC score = 0.88 AUC: 0.998 We did good at a train dataset
  9. PageRank: results. TOP-25 of the Russian Mass Media

  10. NLP Pipeline Raw text Tokenization Who cites TASS Which news

    was cited Topic modelling Customer facing
  11. Topic Modelling I Motivation: Are there big topics today? Notre-Dame

    de Paris’s on fire : (
  12. Topic Modelling II Airbus emergency landing Motivation: Are there big

    topics today?
  13. Topic Modelling III Flood in the Irkutsk Region :( `Losharik`

    Submarine deadly accident :( Motivation: Are there big topics today?
  14. Topic Report Ex-Kyrgyz president Atambaev seizure by special forces

  15. Competition Snapshot Which agency did a good job?

  16. Daily Competition Snapshot Who is the hero of the day?

    Personal data Personal data Personal data Personal data Personal data
  17. Daily Competition Snapshot https://www.gazeta.ru/politics/2019/08/08_a_12564199.shtml

  18. THANK YOU!