Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Detecting Opinionated Articles in SmartNews

Detecting Opinionated Articles in SmartNews

News content can be roughly divided into the categories of news and opinion. News articles attempt to provide information on a current event, while opinion pieces attempt to persuade readers to adopt a particular position on that event.

Our mission is to delivering the world's quality information to the people who need it. To accomplish this, first, we need to understand the contents. We are aiming to detect opinionated articles for a better understanding of the contents.

TianxiangZhang

May 30, 2019
Tweet

Other Decks in Technology

Transcript

  1. About me Mobile App News Backend Media Engineering Machine Learning

    …. Data Science Product Team US Politics …. My Squad My Team Me JP Growth • Tianxiang Zhang ( ) • 2014/04 ~ 2018/05 Software Engineer, Yahoo Japan Corp. • 2018/06 ~ Software Engineer, SmartNews Inc. 1
  2. Back Ground Our mission is to delivering the world's quality

    information to the people who need it.To accomplish this, first we need to understand the contents. Crawl Analyze • Structure analysis: title, site name, author, thumbnail, languages, paragraphs ... • ML-based Image Analysis: face-detect, .. • ML-based Semantics analysis: categorize, named entity, polarity, keyword extraction, topic model, tagging, … Index Deliver Life of a news article 4
  3. Task Definition • News content can be roughly divided into

    the categories of news and opinion. • News articles attempt to provide information on a current event, while opinion pieces attempt to persuade readers to adopt a particular position on that event. (https://writingcommons.org/news-or-opinion) Binary Classification Task 5
  4. Task Definition Have Clear & Peculiar Definition Model Evaluation Use

    Existing Data Model Evaluation Make the definition clear Top-down Bottom-up 6
  5. Data for Opinion Detection task • Positive Samples: 30,000 Political

    articles in opinion category of some big publishers. - For example: https://edition.cnn.com/opinions https://www.nytimes.com/section/opinion https://newsela.com/articles/#/category/opinion • Negative Samples: 45,000 Political articles in 1st block of SmartNews Top channel. - This block usually aggregates articles with high ranking scores. 7
  6. Before Engineering Field Science Field Barrier Between Research and Production

    Accuracy- Focused Hard to Convert Scalability- Focused ☹ 11
  7. After Engineering Field Science Field When SmartNews meets SageMaker Accuracy-

    Focused Easy to Convert Scalability- Focused ! 12
  8. Model  Bert/Transformer/ LSTM/CNN/DAN Classification Task1 Dense Layers Dense Layers

    Dense Layers Classification Task2 Classification Task3 Input(title, paragraphs, …, etc.) Frozen Tunable Task specific Model for article classification in SmartNews 15
  9. Evaluation Model Validation Universal Sentence Encoder (Deep Averaging Network) +

    Dense layers Precision: 83.8% Recall: 77.8%, F-measure: 80.7% Universal Sentence Encoder-Large (Transformer) + Dense layers Precision: 80.2% Recall: 82.9%, F-measure: 81.6% Pre-trained word embeddings (Glove6B.300d) + Bi-directional LSTM + Dense layers Precision: 93.1% Recall: 88.0%, F-measure: 90.5% Pre-trained BERT (bert_uncased_L-12_H-768_A-12) + Dense layers Precision: 95.7% Recall: 90.8% F-measure: 93.2% 16
  10. Evaluation Model Validation Deep Universal Sentence Encoder (Deep Averaging Network)

    + Dense layers Wide meta data of the article F-measure: 95.3% (+11.5%) Precision: 95.6% (+17.8%) Recall: 95.1% (+14.4%) title paragraphs binary features 17
  11. Evaluation Edge cases 1. Pieces reporting on a subject’s opinion

    Eric Holder: If Buzzfeed Report is True ‘Congress Must Begin Impeachment Proceedings’ https://www.mediaite.com/online/eric-holder-if-buzzfeed-report-is-true-congress-must-begin-impeachment-proceedings/ 2. A review of other writers/politicians opinions 2018: The Year In Ideas: A Review Of Ideas https://www.huffingtonpost.com/entry/2018-the-year-in-ideas-a-review-of-ideas_us_5c242fc1e4b05c88b6fd6011 3. A relatively straightforward account of events, with some opinion/snark in the headline “The House Will Redo a Vote to Reopen the Government Because Republicans Weren’t Paying Attention” https://slate.com/news-and-politics/2019/01/house-republicans-forgot-roll-call-democrats-shutdown-bill.amp 4. Polling data (i.e. a review of voters/citizens/respondents opinions) YOU’LL NEVER BELIEVE IT, BUT THE SHUTDOWN IS MAKING TRUMP UNPOPULAR https://www.vanityfair.com/news/2019/01/surprise-the-shutdown-is-making-trump-unpopular 18
  12. Future Works 1. Opinion sentence detection By catching the opinion

    sentence, we can know the ratio of opinion part to the whole article. 2. Syntactic dependencies analysis To know the subject of the opinion sentence 3. Author identification To check if the opinion is the author’s or other subject’s 20
  13. Summary 21 • We developed a model with high accuracy

    for opinonated articles detection and deployed it to production. • SageMaker allows ML experts at SmartNews to produce great results while only focusing on what is essential. • We continue to strive for realizing our mission to deliver quality information from around the world by taking full advantage of SageMaker.