"Detecting Propaganda in Fake News using Natural Language Processing" by Aroma Rodrigues

Propaganda Detection in Fake News using Natural Language Processing Aroma
Rodrigues

Speaker Bio

A BuzzFeed News analysis found that 50 of the biggest
fake stories of 2018 generated roughly 22 million total shares, reactions, and comments on Facebook

The Knight Foundation analyzed more 10 million tweets from 700,000
Twitter accounts which had linked to more than 600 fake and conspiracy news outlets. They found that in the lead-up to the 2016 US Presidential election, more than 6.6 million tweets linked to fake news and conspiracy news publishers, a problem which continued after, with 4 million tweets to fake and conspiracy news publishers found from mid-March to mid-April 2017.

And now: "More than 80% of accounts that repeatedly spread
misinformation during the 2016 election campaign are still active, and they continue to publish more than a million tweets on a typical day."

A recent Reuters Institute survey of English- language Indian internet
users found that 52% of respondents got news via WhatsApp. The same proportion said they got their news from Facebook. But content shared via WhatsApp has led to murder. At least 31 people were killed in 2017 and 2018 as a result of mob attacks fuelled by rumours on WhatsApp and social media, a BBC analysis found.

Recent Example • YES Bank • PMC Bank

Identifying Fake News • Bad grammar, spelling mistakes • No
source : find source • A lot of praise for propaganda highly positive • A lot of criticism for propaganda highly negative • Keywords, google search • Credible mainstream agencies

Photoshop/ Image editing : Reverse google image search / differences
Articles: Mainstream: not generally fake news Spoof websites: BBCNewspoint Blogs: make sure they are credible, personal or professional Govt Agencies tweets: could be fake Fact checking websites: alt news, social media hoax slayer: for mainstream

Extracting keywords from a text >>> from rake_nltk import Rake
>>> from nltk.corpus import stopwords >>> r = Rake() >>> b=r.get_ranked_phrases() >>> b ['pm narendra', 'best pm', 'world', 'us', 'unesco', 'modi', 'declared', 'congratulation']

Finding sources for keywords import requests url = ('https://newsapi.org/v2/everything?' 'q=pm
narendra&best pm&world' 'from=2019-05-06&' 'sortBy=popularity&' 'apiKey=f3ff05f37c2b4b0c9707a6c1de8076bb') response = requests.get(url) # extract all review sentences that contains the term - “best pm” keyword_extracted= [sent for sent in response.content.sents if ‘best pm’ in sent.string.lower()]

Results {u'status': u'ok', u'articles': [{u'description': u'Rahul Gandhi has energised a
struggling party and has been increasingly setting the agenda.', u'title': u'Can India\u2019s political prince unseat the PM?', u'url': u'https://www.bbc.co.uk/news/world-asia-india-47978944', u'author': u'https://www.facebook.com/bbcnews', u'publishedAt': u'2019-04- 24T23:17:51Z', u'content': u"Image copyrightGetty ImagesImage caption\r\ n Rahul Gandhi (centre) received a tumultuous welcome during his road show in Amethi\r\nIndia's main opposition leader Rahul Gandhi was all but written off after his crushing defeat in the last elections. But he has ener\ u2026 [+7947 chars]", u'source': {u'id': u'bbc-news', u'name': u'BBC News'}, u'urlToImage': u'https://ichef.bbci.co.uk/news/1024/branded_news/15056/production/ _106520168_siblings.jpg'},

Similarity Check import spacy nlp = spacy.load('en') doc1 = nlp(u'Hello
hi there!') doc2 = nlp(u'Hello hi there!') doc3 = nlp(u'Hey whatsup?') print doc1.similarity(doc2) # 0.999999954642 print doc2.similarity(doc3) # 0.699032527716 print doc1.similarity(doc3) # 0.699032527716

Path Model of Blame

Important Parameters • location (a town, a country), • labeling
• argumentation • emotions (fear, outrage, sympathy, hatred, other, missing), • fabrication • politician (the name of a mentioned politician)

Pattern with POS tagger import nltk from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag ex = '''Theresa May ordered use of military force against Syria. This is what she ordered. Two residential areas have been struck by the Uk/French/US missiles. Reports of 4 dead in one of the strikes. ''' def preprocess(sent): sent = nltk.word_tokenize(sent) sent = nltk.pos_tag(sent) return sent sent = preprocess(ex) print(sent) pattern = 'NP: {<NNP>?<VBD>?<NN>*<NNP>}' cp = nltk.RegexpParser(pattern) cs = cp.parse(sent)

POS Tags • CC coordinating conjunction • CD cardinal digit
• DT determiner • EX existential there (like: "there is" ... think of it like "there exists") • FW foreign word • IN preposition/subordinating conjunction • JJ adjective 'big' • JJR adjective, comparative 'bigger' • JJS adjective, superlative 'biggest' • LS list marker 1) • MD modal could, will • NN noun, singular 'desk' • NNS noun plural 'desks'

POS Tags • NNP proper noun, singular 'Harrison' • NNPS
proper noun, plural 'Americans' • PDT predeterminer 'all the kids' • POS possessive ending parent's • PRP personal pronoun I, he, she • PRP$ possessive pronoun my, his, hers • RB adverb very, silently, • RBR adverb, comparative better • RBS adverb, superlative best • RP particle give up • TO to go 'to' the store. • UH interjection errrrrrrrm • VB verb, base form take

POS Tags • VBD verb, past tense took • VBG
verb, gerund/present participle taking • VBN verb, past participle taken • VBP verb, sing. present, non-3d take • VBZ verb, 3rd person sing. present takes • WDT wh-determiner which • WP wh-pronoun who, what • WP$ possessive wh-pronoun whose • WRB wh-abverb where, when

Patterns to consider • Active Voice ◦ <Individual/Community/Organization> <Causative Verb>
<Event entity> • Passive Voice ◦ <Event entity> <Causative Verb> <Individual/Community/Organization> Causative verbs are verbs that show the reason that something happened. Easier/Basic patterns in scope for now. • Thresholding ◦ What percentage of sentences fitting this pattern in an article text would be considered propaganda?

Accuracy and Data Cleaning • Whatsapp texts • Tweets •
Articles • Facebook posts • SMS

References • Manipulative Propaganda Techniques - Vít Baisa, Ondˇrej Herman,
and Aleš Horák • Detecting Expressions of Blame or Praise in Text - Udochukwu Orizu and Yulan He • The BECauSE Corpus 2.0: Annotating Causality and Overlapping Relations - Jesse Dunietz, Lori Levin and Jaime Carbonell • Unsupervised Learning of Narrative Event Chains - Nathanael Chambers and Dan Jurafsky • Samples: https://medium.com/@VasquezNnenna/different-examples- of-propaganda-in-social-media-758fc98d021d • https://towardsdatascience.com/named-entity-recognition-with-nltk- and-spacy-8c4a7d88e7da • https://www.kdnuggets.com/2018/08/emotion-sentiment-analysis- practitioners-guide-nlp-5.html • https://towardsdatascience.com/natural-language processing-event- extraction-f20d634661d3

Thank You

"Detecting Propaganda in Fake News using Natura...

"Detecting Propaganda in Fake News using Natural Language Processing" by Aroma Rodrigues

Pycon ZA

More Decks by Pycon ZA

Other Decks in Programming

Featured

Transcript

Propaganda Detection in Fake News using Natural Language Processing Aroma

Speaker Bio

A BuzzFeed News analysis found that 50 of the biggest

A BuzzFeed News analysis found that 50 of the biggest

The Knight Foundation analyzed more 10 million tweets from 700,000

And now: "More than 80% of accounts that repeatedly spread

A recent Reuters Institute survey of English- language Indian internet

Recent Example • YES Bank • PMC Bank

Identifying Fake News • Bad grammar, spelling mistakes • No

Photoshop/ Image editing : Reverse google image search / differences

Extracting keywords from a text >>> from rake_nltk import Rake

Finding sources for keywords import requests url = ('https://newsapi.org/v2/everything?' 'q=pm

Results {u'status': u'ok', u'articles': [{u'description': u'Rahul Gandhi has energised a

Similarity Check import spacy nlp = spacy.load('en') doc1 = nlp(u'Hello

Path Model of Blame

Important Parameters • location (a town, a country), • labeling

Pattern with POS tagger import nltk from nltk.tokenize import word_tokenize

POS Tags • CC coordinating conjunction • CD cardinal digit

POS Tags • NNP proper noun, singular 'Harrison' • NNPS

POS Tags • VBD verb, past tense took • VBG

Patterns to consider • Active Voice ◦ <Individual/Community/Organization> <Causative Verb>

Accuracy and Data Cleaning • Whatsapp texts • Tweets •

References • Manipulative Propaganda Techniques - Vít Baisa, Ondˇrej Herman,

Thank You