Towards Keeping up with Fake News in the Social Media Ecosystem (Dissertation Slides)

Towards Keeping up with Fake News in the Social Media
Ecosystem Taichi Murayama NAIST, Social Computing Lab., D3

2 Fake news is false or misleading content presented as
news and communicated in formats spanning spoken, written, printed, electronic, and digital communication. -“The Anatomy of Fake News” by Nolan Higdon

3 Early Fake News? Great Moon Hoax in 1835 by
The Sun German Corpse Factory in WWI by The Times of London

4 Fake News after 2010 Internet Social Media Easy to
access news Easy to communicate Polarization Fake News Various social problems Online slander Echo Chamber

5 Fake news in 2016 US presidential election Example of
Fake News: Politics 25% of tweets with news media link in the five months preceding the election day are either fake or extremely biased news. Example of fake news

6 Relationship between Fake news and COVID-19 Example of Fake
News: Health Vitamin C cures COVID-19 5G network spreads COVID-19

7 How does society combat against fake news? Fact check
organizations Regulations by platforms Education Legislation

8 Targeting of my dissertation Event: Spreading of fake news
Social activities (especially in Japan) US researchers discover a huge cat!!!! Over 5 meters… RT RT RT RT Fact Check Education Legislation Reply Reply Very Interesting news !!!! Really ?? I can’t find the information source Help to connect

9 Targeting of my dissertation Event: Spreading of fake news
Social activities (especially in Japan) US researchers discover a huge cat!!!! Over 5 meters… RT RT RT RT Fact Check Education Legislation Reply Reply Very Interesting news !!!! Really ?? I can’t find the information source Understanding the spread of fake news (Chapter 2) ↓ Detection of fake news (Chapter 3) Fundamental Toward applications Foundation for keeping up with fake news in Japan (Chapter 4)

10 1. Modeling the spread of fake news on Twitter
2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Contents

2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Contents • Fundamental • Focus on time series of fake news posts • Application to Japan • Resource construction

2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Understanding the spread of fake news Time from original post [h] Modeling Time from original post [h] Posting Activity 0 0 0 l 2 (t) l 1 (t) 20 20 40 40 40 20 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h]

13 Research objective Modeling the cascade of fake news, in
terms of passage of time, on Twitter l How fake news is spread over time is not fully understood l Modeling method can reveal the characteristics of each fake news story. Background Modeling Time from original post [h] Posting Activity 0 0 l 2 (t) l 1 (t) 20 20 40 40 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h] 0 20 40 Fake news tweets Time from original post [h] Probability of posts Time from the initial post ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Modeling Posts of fake news on Twitter Possibility rates of posts on Twitter

14
Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Background Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction) User A @userA US researchers discover a huge cat!!!! Over 5 meters… URL User B @userB This is very very interesting news ! URL

15
Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Background Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction) User C @userC I think this is fake news…. URL User D @userD Really ?? I can’t find the information source URL

16
Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Proposed Model Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction)

17 Base Technique: Hawkes Process l Hawkes Process: one of
point process models whose defining characteristic is that they “self-excite” l Calculate the probability of the next event from past events and elapsed time l Examples of application: Earthquake movement, Financial transactions Proposed Model Time series of event Modeling Time series of event probability

18 Base Model: Time-Dependent Hawkes Process[Kobayashi+, 2016] l Hawkes Process
for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model Modeling Time from original post [h] Posting Activity 0 0 l 2 (t) l 1 (t) 20 20 40 40 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h] Modeling 0 20 40 Fake news tweets Time from original post [h] Probability of posts Time from the initial post Modeling The Possibility rates of posts on Twitter Time series of posts on Twitter

for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model The probability of post between t, t + ∆% = λ % ∆% λ % = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? "($): infection rate at time t '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember

for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model The probability of post between t, t + ∆% = λ % ∆% λ % = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? "($): infection rate at time t '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember ( % = / 1 − 1 sin 26 7% % + 8& 9'#/) *: the intensity of information +: the relative amplitude of the oscillation ,$: phase - : the characteristic time of popularity decay Circadian Rhythm the freshness

21 Proposed Model Proposed Model
Time (hour) e ) Modeling Time (hour) Posting Activity : % = (* % ℎ* % + (+ (%)ℎ+ (%) $% Split the cascade at the correction time .& /' . 0' . /( (.)0( (.)

22 Proposed Model Proposed Model
Time (hour) e ) Modeling Time (hour) Posting Activity : % = (* % ℎ* % + (+ (%)ℎ+ (%) $% Split the cascade at the correction time .& /' . 0' . /( (.)0( (.) ℎ* % = < !:#!$%!,(#, #") ,! -(% − %! ) ℎ+ % = < !:#" $ #! $# ,! -(% − %! ) from $% until $% 0(1) *: the intensity of information +: the relative amplitude of the oscillation ,$: phase - : the characteristic time of popularity decay

23 Experimental Settings l Task: Predict the number of posts
about fake news in the future Evaluation l Setting the modeling period as the front half of the observation time [0, 0.5T] l Setting the test period as the back half of observation time [0.5T, T] Cumulative number of posts Time Test period Modeling period l Evaluation metrics: We evaluate the the the number of posts per an hour. l Mean Average Error : the smaller value is, the better is. l Median Average Error: the smaller value is, the better is.

24 Dataset l Recent Fake News (RFN) l Collect 7
fake news, which reported by “Politifact” and “Snopes” from March to May, 2019. l Each news had over 300 posts and kept posting over 36 hours from the initial post. l Fake News in Tohoku earthquake (Tohoku) l Collect 19 fake news related to Tohoku earthquake, which reported by Japanese news media from March 12th to March 24th 2011. l Each news had over 300 posts and kept posting over 36 hours from the initial post. Evaluation

25 Experimental Results Proposed method achieved higher accuracy than other
methods in two dataset. (100% in RFN and 89% in Tohoku) Evaluation The smaller value is better

26 Experimental Results l Magenta indicates the prediction number of
the proposed model and Black indicates the actual number of posts. l Proposed model achieves the close to actual number of posts. Evaluation

27 Characteristics of estimated parameter Discussion l Parameter / :
Compared of / in the first cascade, / in the second cascade shows a tendency to be small (the intensity of information is weak). l The correction time =2 is estimated as around 40 hours from the initial post in both dataset. ( % = > 1 − 1 sin 26 7% % + 8& 9'#/)

28 Verification of the correction time !! Discussion l Check
the text around the correctio time !! l Verify whether the proposed model can properly estimate the correction time %3 from text of posts l Count words that mean hoax or correction (e.g. fake, not true…) • Our hypothesis The first cascade has the characteristics of ordinary news, while second cascade has the characteristics of correction Blue line indicate the correction time Black line indicates the number of fake words

29 Verification of the correction time !!: Word cloud Discussion
l Example: Fake news “Turkey Donates 10 Billion Yen to Japan” l Compare the word cloud around the correction time (%3 =37) l Before =2, words related to the news “pro-Japanese” appear. l After =2, some user point out the news is “false rumor.”

30 Summary: Modeling the spread of fake news on Twitter
ü Model posts of fake news on Twitter as two cascades comprised of ordinary news and correction of fake news ü The proposed model achieves higher accuracy in the prediction task of number of posts. ü The proposed model achieves higher accuracy in the prediction task of number of posts.

2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Detection of fake news %/ 0# #$ $ & ") * + #*/#+ " %0- 1# % ! #' %%((&&& %. / ! #& %%((&&& Time (hours)

32 Research objective Propose the fake news detection model leveraging
temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background

temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background 0 20 40 Fake news tweets Time from original post [h] ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Posts of fake news on Twitter Who post (User) What’s post (Linguistic) Existing Model Fake or Real Extraction Existing fake news detection model

temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background 0 20 40 Fake news tweets Time from original post [h] ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Posts of fake news on Twitter Who post (User) What’s post (Linguistic) Proposed Model Fake or Real Extraction Proposed model When post (Temporal)

35 Case study (Revisit Chapter 2) Background An example of
real news An example of fake news Number of posts Time (hours) Time (hours)

36 l Problem statement l Input: Text & User &
Temporal features extracted from social media posts l Output: the news story is fake or not (Multi-label) Fake news detection from social media posts

37 Proposed Model
%/ 0# #$ $ & ") * + #*/#+ " %0- 1# % ! #' %%((&&& %. / ! #& %%((&&& Time (hours)

38 Proposed Model
%/ 0# #$ $ & ") * + #*/#+ " %0- 1# % ! #' %%((&&& %. / ! #& %%((&&& Time (hours) ≈

39 Proposed Model
%/ 0# #$ $ & ") * + #*/#+ " %0- 1# % ! #' %%((&&& %. / ! #& %%((&&& Time (hours) ≈

40 Proposed Model
%/ 0# #$ $ & ") * + #*/#+ " %0- 1# % ! #' %%((&&& %. / ! #& %%((&&& Time (hours) ≈

41 Proposed Model
%/ 0# #$ $ & ") * + #*/#+ " %0- 1# % ! #' %%((&&& %. / ! #& %%((&&& Time (hours) ≈

42 Convert temporal features The probability of post between t,
t + ∆% = λ % ∆% = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? /(.): infection rate at time 1 '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember SEISMIC [Zhao+, 2015] (Hawkes Process) Time series of posts on social media Time series of /(.)

43 Proposed Model
%/ 0# #$ $ & ") * + #*/#+ " %0- 1# % ! #' %%((&&& %. / ! #& %%((&&& Time (hours) ≈

44 &YQFSJNFOUBM4FUUJOHT l Dataset: Each news consists from multiple posts
in SNS l Weibo [Ma+, 2016] l Twitter15 [Ma+, 2017] l Twitter16 [Ma+, 2017] l Metrics l Accuracy: Percentage of correct predictions l F1-Score: The harmonic mean of the precision and recall Experiment No. of true news article No. of fake news article No. of unverified article No. of debunking article

45 Experimental Results Experiment T: True, F: False, U: Unverified,
D: Debunking Label l Proposed model performed the best for most measures and datasets l Our model did not produce good results for classifying unverified label l As ablation study (proposed model without temporal features), the proposed model achieves higher than proposed (w/o time).

48 Summary: Fake News Detection using Temporal Features Extracted via
Point Process ü We proposed a novel multi-modal method for fake news detection, combining text and user features and infectiousness values. ü The experimental results empirically showed the effectiveness of temporal features in our proposed model.

2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Detection of fake news

50 Research objective Promote fact-checking and fake news research in
Japanese l Most studies focus on US society and build English resources l Two things we have tackled for the expansion of Japanese resources l Japanese fake news dataset construction l Japanese false news collection system Background Japanese Dataset Fake news collection &$&"%&&! %&&"&!)% "%&!)%%% !&"$ !")&&&!)%%% "")"%& %% !&"$ %'!$%&!&!)% &""$)&%&%!)% &$& "%&!)%&&$"$!$& &&$& &&#'$#"%"&% !)% ")&*&!&%&!)%$ ' &"%"&+ &&+#%"$ "&!)% ( Fact check Fake News Research Help

52 Background: Japanese dataset construction Q: Can existing fake news
datasets really be called “fake news” datasets?

datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts

datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label

datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label Binary Label (31/51) True or Real e.g., 5-scale Label : True / Mostly-True / Mixture / Mostly False/ False

56 There is no set definition of the phrase “Fake
news.” l Broad definition: “Fake news is false news.” [Zhou+ 2020] [Lazer+, 2018] l Narrow definition: “Fake news is a news article that is intentionally and verifiably false.” [Alcott+ 2017] [Kai+, 2017] [Xichen+, 2020] Definition of fake news

57 There is no set definition of the phrase “Fake
news.” l Broad definition: “Fake news is false news.” [Zhou+ 2020] [Lazer+, 2018] l Narrow definition: “Fake news is a news article that is intentionally and verifiably false.” [Alcott+ 2017] [Kai+, 2017] [Xichen+, 2020] Definition of fake news The phrase “Fake news” is ambiguous word

58 “The phrase ‘fake news’ is ‘woefully inadequate’ to describe
the issues at play” by Claire Wardle Fake news can be examined from three perspectives Criticism of the term “Fake news” l Mis-information: false information disseminated online by people who don't have a harmful intent l Dis-information: false information created and shared by people with harmful intent l Mal-information: the sharing of "genuine" information with the intent to cause harm

59 Criticism of the term “Fake news” The British government
decided that the term "fake news" will no longer be used in official documents in 2018 because it is "a poorly- defined and misleading term.

datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label A: No. Most of them are false news datasets.

datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label A: No. Most of them are false news datasets. • The label considers only the factual aspect of the news story • The intention of the disseminator, which is one of the component of fake news, is not taken into account.

62 Research objective Propose the annotation scheme with fine-grained labels
for fake news story: l Factual aspect of news story (false or not) l User intention (Dis-information) l Harmfulness to society (Mal-information), etc. ⇒ We expect to analyze each type of false news based on the attributes labeled, e.g., what kind of false news is likely to spread. Background: Japanese dataset construction

63 Our dataset construction scheme (Japanese) Annotation scheme Fact checking
organization (Fact Check Initiative Japan) from July 2019 and October 2021 ≈ç Identify and collect the source Social media posts ≈ç ≈ç Annotation 3 annotators ≈ Public Dataset (307 news stories 471,446 tweets)

64 7 Questions in our annotation scheme Annotation scheme Q1:
What rate does the fact-checking site label to the news? Q2-1: Does the news disseminator know that the news is false? 1. Yes, the news disseminator definitely know the news is false 2. Yes, the news disseminator probably know the news is false 3. No, the news disseminator does not probably know the news is false 4. No, the news disseminator does not definitely know the news is false True Half-True Inaccurate Misleading False Pants on Fire Unknown Evidence Suspended Judgement 1 2 3 4

65 Annotation scheme Q2-2A: If Yes, how was the news
created? 1. Disinformation, Fabricated content 2. Disinformation, Manipulated image 3. Disinformation, Manipulated text 4. Disinformation, False context Q2-2B: If No, how does the disseminator misunderstand the news? 1. Misinformation, Trusting other sources 2. Misinformation, Inadequate understanding 3. Misinformation, Misleading 1 2 3 4 1 2 3

66 Annotation scheme Q3: At Who or what is the
false news targeted? l Free writing Q4: Does the news flatter or denigrate the target? 1. Flattery 2. Denigration 3. Neither, No such intention Q5: What the purpose of the false news? 1. Satire / Parody 2. Partisan 3. Propaganda 4. No purpose / Unknown 1 2 3 1 2 3 4

67 Annotation scheme Q6: To what extent is the news
harmful to society ? (average) l Rating from 0 to 5 Q7:What types of harm do the news have ? 1. Harmless (e.g. Satire / Parody) 2. Confusion and anxiety about society 3. Threat honor and trust in people, company and good 4. Threat correct understanding of politics and social events 5. Health 6. Prejudice against national and racial 7. Conspiracy Theory 0 〜 1 1〜 2 2〜 3 3〜 4 4〜 5 1 2 3 4 5 6 7

68 Example
&$&"%&&! %&&"&!)% "%&!)%%% !&"$ !")&&&!)%%% "")"%& %% !&"$ %'!$%&!&!)% &""$)&%&%!)% &$& "%&!)%&&$"$!$& &&$& &&#'$#"%"&% !)% ")&*&!&%&!)%$ ' &"%"&+ &&+#%"$ "&!)% (

69 Example
%#%!$%% $%%!% ($ !$% ($$$ %!# !(%%% ($$$ ! $!(($% ($ #% %!!#(%$%$ ($ %#% !$% ($%%#!# #% %%#% %%"&#"!$!%$ ($ !(%)% %$% ($#& %!$!%* %%*"$!#!% ($ ' !

70 Word cloud: about Q7 Understanding of politics and social
events Health Prejudice against national and racial Statistics

71 Bot user percentage Statistics FakeNewsNet Our dataset from [Kai+,
2019] Bot user Real user 22% 78% 93% 7%

73 Existing fake news collection system depends on rich resource
such as fake news detection dataset and fact-checking organizations Background: Japanese false news collection system Hoaxy [Pik-Mai+, 2018] NewsGuard It is difficult to apply these system to Japanese society because there is few fake news resources. ⇒ We aim to utilize Guardian posts for building Japanese fake news collection system

74 What’s Guardian? Background: Japanese false news collection system Guardians
are social media users who perform the fact-checking intervention themselves towards the posts of uncertain truth.

75 What’s Guardian? Background: Japanese false news collection system Guardians
are social media users who perform the fact-checking intervention themselves towards the posts of uncertain truth. We try to discover Guardian posts and related fake news by searching for specific language patterns.

76 Research objective Propose the fake news collection system for
Japanese without Fake news detection dataset l Easy to apply to other languages l Not dependent on abundant resources l Utilize Guardian tweets Background: Japanese false news collection system " ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "&

77 Fake news collection system "
),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "&

),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step1: Crawling Guardian posts from Twitter l FAKEWORDS: {σϚ, ϑΣΠΫ, ؒҧ͍, ෆਖ਼֬, ޡใ, ڏِ, ࣄ࣮ແࠜ} l Collect about 5000 tweets per day

),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step2: Noise tweet removal l Remove irrelevant posts using fine-tuned BERT-based model

),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step3: Tweet grouping l Group posts with similar meanings based on Word Mover’s Distance (WMD), which measures the proximity of meanings in sentences.

),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step4: Ranking l Ranking of collected fake news and guardian posts based on users' interest, representing such as Number of retweets, likes, and etc.

),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step5: Visualization and dataset collection l Visualize high-ranking fake news l Collect relevant posts of visualized fake news

83 Fake news collection system

84 Evaluation Evaluate the collected posts between 2021/11/1 and 11/14
from two perspectives l Q1: Do the collected tweet point out the possibility of false in the news story? ⇒ 77% of collected tweet point out the possibility of false l Q2: Is the subjects of the collected tweet truly fake? ⇒ 52% of collected tweets were truly false

85 Summary: Towards countermeasures against the fake news ü We
proposed a novel annotation scheme with fine-grained labels to capture various perspectives of fake news ü We built the first Japanese fake news dataset. ü We proposed a Japanese fake news collection system using Guardian posts, without rich fake news resources.

86 ü Chapter 2: We model posts of fake news
on Twitter as two cascades comprised of ordinary news and correction of fake news ü Chapter 3: We proposed a novel multi-modal method for fake news detection, combining text and user features and infectiousness values ü Chapter 4: We proposed a novel annotation scheme with fine- grained labels to capture various perspectives of fake news. We built the first Japanese fake news dataset and fake news collection system. Conclusion

87 ü Construction of English fake news detection dataset following
our proposal annotation scheme ü Construction of larger Japanese dataset by fake news collection system Future Work " ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "&

Towards Keeping up with Fake News in the Social...

Towards Keeping up with Fake News in the Social Media Ecosystem (Dissertation Slides)

More Decks by taichi_murayama

Other Decks in Research

Featured

Transcript