Fake News: Politics 25% of tweets with news media link in the five months preceding the election day are either fake or extremely biased news. Example of fake news
Social activities (especially in Japan) US researchers discover a huge cat!!!! Over 5 meters… RT RT RT RT Fact Check Education Legislation Reply Reply Very Interesting news !!!! Really ?? I can’t find the information source Help to connect
Social activities (especially in Japan) US researchers discover a huge cat!!!! Over 5 meters… RT RT RT RT Fact Check Education Legislation Reply Reply Very Interesting news !!!! Really ?? I can’t find the information source Understanding the spread of fake news (Chapter 2) ↓ Detection of fake news (Chapter 3) Fundamental Toward applications Foundation for keeping up with fake news in Japan (Chapter 4)
2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Contents
2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Contents • Fundamental • Focus on time series of fake news posts • Application to Japan • Resource construction
2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Understanding the spread of fake news Time from original post [h] Modeling Time from original post [h] Posting Activity 0 0 0 l 2 (t) l 1 (t) 20 20 40 40 40 20 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h]
terms of passage of time, on Twitter l How fake news is spread over time is not fully understood l Modeling method can reveal the characteristics of each fake news story. Background Modeling Time from original post [h] Posting Activity 0 0 l 2 (t) l 1 (t) 20 20 40 40 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h] 0 20 40 Fake news tweets Time from original post [h] Probability of posts Time from the initial post ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Modeling Posts of fake news on Twitter Possibility rates of posts on Twitter
Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Background Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction) User A @userA US researchers discover a huge cat!!!! Over 5 meters… URL User B @userB This is very very interesting news ! URL
Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Background Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction) User C @userC I think this is fake news…. URL User D @userD Really ?? I can’t find the information source URL
Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Proposed Model Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction)
point process models whose defining characteristic is that they “self-excite” l Calculate the probability of the next event from past events and elapsed time l Examples of application: Earthquake movement, Financial transactions Proposed Model Time series of event Modeling Time series of event probability
for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model Modeling Time from original post [h] Posting Activity 0 0 l 2 (t) l 1 (t) 20 20 40 40 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h] Modeling 0 20 40 Fake news tweets Time from original post [h] Probability of posts Time from the initial post Modeling The Possibility rates of posts on Twitter Time series of posts on Twitter
for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model The probability of post between t, t + ∆% = λ % ∆% λ % = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? "($): infection rate at time t '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember
for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model The probability of post between t, t + ∆% = λ % ∆% λ % = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? "($): infection rate at time t '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember ( % = / 1 − 1 sin 26 7% % + 8& 9'#/) *: the intensity of information +: the relative amplitude of the oscillation ,$: phase - : the characteristic time of popularity decay Circadian Rhythm the freshness
Time (hour) e ) Modeling Time (hour) Posting Activity : % = (* % ℎ* % + (+ (%)ℎ+ (%) $% Split the cascade at the correction time .& /' . 0' . /( (.)0( (.) ℎ* % = < !:#!$%!,(#, #") ,! -(% − %! ) ℎ+ % = < !:#" $ #! $# ,! -(% − %! ) from $% until $% 0(1) *: the intensity of information +: the relative amplitude of the oscillation ,$: phase - : the characteristic time of popularity decay
about fake news in the future Evaluation l Setting the modeling period as the front half of the observation time [0, 0.5T] l Setting the test period as the back half of observation time [0.5T, T] Cumulative number of posts Time Test period Modeling period l Evaluation metrics: We evaluate the the the number of posts per an hour. l Mean Average Error : the smaller value is, the better is. l Median Average Error: the smaller value is, the better is.
fake news, which reported by “Politifact” and “Snopes” from March to May, 2019. l Each news had over 300 posts and kept posting over 36 hours from the initial post. l Fake News in Tohoku earthquake (Tohoku) l Collect 19 fake news related to Tohoku earthquake, which reported by Japanese news media from March 12th to March 24th 2011. l Each news had over 300 posts and kept posting over 36 hours from the initial post. Evaluation
Compared of / in the first cascade, / in the second cascade shows a tendency to be small (the intensity of information is weak). l The correction time =2 is estimated as around 40 hours from the initial post in both dataset. ( % = > 1 − 1 sin 26 7% % + 8& 9'#/)
the text around the correctio time !! l Verify whether the proposed model can properly estimate the correction time %3 from text of posts l Count words that mean hoax or correction (e.g. fake, not true…) • Our hypothesis The first cascade has the characteristics of ordinary news, while second cascade has the characteristics of correction Blue line indicate the correction time Black line indicates the number of fake words
l Example: Fake news “Turkey Donates 10 Billion Yen to Japan” l Compare the word cloud around the correction time (%3 =37) l Before =2, words related to the news “pro-Japanese” appear. l After =2, some user point out the news is “false rumor.”
ü Model posts of fake news on Twitter as two cascades comprised of ordinary news and correction of fake news ü The proposed model achieves higher accuracy in the prediction task of number of posts. ü The proposed model achieves higher accuracy in the prediction task of number of posts.
2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Detection of fake news %/ 0# #$ $ & ") * + #*/#+ " %0- 1# % ! #' %%((&&& %. / ! #& %%((&&& Time (hours)
temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background
temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background 0 20 40 Fake news tweets Time from original post [h] ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Posts of fake news on Twitter Who post (User) What’s post (Linguistic) Existing Model Fake or Real Extraction Existing fake news detection model
temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background 0 20 40 Fake news tweets Time from original post [h] ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Posts of fake news on Twitter Who post (User) What’s post (Linguistic) Proposed Model Fake or Real Extraction Proposed model When post (Temporal)
t + ∆% = λ % ∆% = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? /(.): infection rate at time 1 '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember SEISMIC [Zhao+, 2015] (Hawkes Process) Time series of posts on social media Time series of /(.)
in SNS l Weibo [Ma+, 2016] l Twitter15 [Ma+, 2017] l Twitter16 [Ma+, 2017] l Metrics l Accuracy: Percentage of correct predictions l F1-Score: The harmonic mean of the precision and recall Experiment No. of true news article No. of fake news article No. of unverified article No. of debunking article
D: Debunking Label l Proposed model performed the best for most measures and datasets l Our model did not produce good results for classifying unverified label l As ablation study (proposed model without temporal features), the proposed model achieves higher than proposed (w/o time).
D: Debunking Label l Proposed model performed the best for most measures and datasets l Our model did not produce good results for classifying unverified label l As ablation study (proposed model without temporal features), the proposed model achieves higher than proposed (w/o time).
D: Debunking Label l Proposed model performed the best for most measures and datasets l Our model did not produce good results for classifying unverified label l As ablation study (proposed model without temporal features), the proposed model achieves higher than proposed (w/o time).
Point Process ü We proposed a novel multi-modal method for fake news detection, combining text and user features and infectiousness values. ü The experimental results empirically showed the effectiveness of temporal features in our proposed model.
2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Detection of fake news
Japanese l Most studies focus on US society and build English resources l Two things we have tackled for the expansion of Japanese resources l Japanese fake news dataset construction l Japanese false news collection system Background Japanese Dataset Fake news collection &$&"%&&! %&&"&!)% "%&!)%%% !&"$ !")&&&!)%%% "")"%& %% !&"$ %'!$%&!&!)% &""$)&%&%!)% &$& "%&!)%&&$"$!$& &&$& &&#'$#"%"&% !)% ")&*&!&%&!)%$ ' &"%"&+ &&+#%"$ "&!)% ( Fact check Fake News Research Help
Japanese l Most studies focus on US society and build English resources l Two things we have tackled for the expansion of Japanese resources l Japanese fake news dataset construction l Japanese false news collection system Background Japanese Dataset Fake news collection &$&"%&&! %&&"&!)% "%&!)%%% !&"$ !")&&&!)%%% "")"%& %% !&"$ %'!$%&!&!)% &""$)&%&%!)% &$& "%&!)%&&$"$!$& &&$& &&#'$#"%"&% !)% ")&*&!&%&!)%$ ' &"%"&+ &&+#%"$ "&!)% ( Fact check Fake News Research Help
datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts
datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label
datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label Binary Label (31/51) True or Real e.g., 5-scale Label : True / Mostly-True / Mixture / Mostly False/ False
news.” l Broad definition: “Fake news is false news.” [Zhou+ 2020] [Lazer+, 2018] l Narrow definition: “Fake news is a news article that is intentionally and verifiably false.” [Alcott+ 2017] [Kai+, 2017] [Xichen+, 2020] Definition of fake news
news.” l Broad definition: “Fake news is false news.” [Zhou+ 2020] [Lazer+, 2018] l Narrow definition: “Fake news is a news article that is intentionally and verifiably false.” [Alcott+ 2017] [Kai+, 2017] [Xichen+, 2020] Definition of fake news The phrase “Fake news” is ambiguous word
the issues at play” by Claire Wardle Fake news can be examined from three perspectives Criticism of the term “Fake news” l Mis-information: false information disseminated online by people who don't have a harmful intent l Dis-information: false information created and shared by people with harmful intent l Mal-information: the sharing of "genuine" information with the intent to cause harm
datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label A: No. Most of them are false news datasets.
datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label A: No. Most of them are false news datasets. • The label considers only the factual aspect of the news story • The intention of the disseminator, which is one of the component of fake news, is not taken into account.
for fake news story: l Factual aspect of news story (false or not) l User intention (Dis-information) l Harmfulness to society (Mal-information), etc. ⇒ We expect to analyze each type of false news based on the attributes labeled, e.g., what kind of false news is likely to spread. Background: Japanese dataset construction
organization (Fact Check Initiative Japan) from July 2019 and October 2021 ≈ç Identify and collect the source Social media posts ≈ç ≈ç Annotation 3 annotators ≈ Public Dataset (307 news stories 471,446 tweets)
What rate does the fact-checking site label to the news? Q2-1: Does the news disseminator know that the news is false? 1. Yes, the news disseminator definitely know the news is false 2. Yes, the news disseminator probably know the news is false 3. No, the news disseminator does not probably know the news is false 4. No, the news disseminator does not definitely know the news is false True Half-True Inaccurate Misleading False Pants on Fire Unknown Evidence Suspended Judgement 1 2 3 4
false news targeted? l Free writing Q4: Does the news flatter or denigrate the target? 1. Flattery 2. Denigration 3. Neither, No such intention Q5: What the purpose of the false news? 1. Satire / Parody 2. Partisan 3. Propaganda 4. No purpose / Unknown 1 2 3 1 2 3 4
harmful to society ? (average) l Rating from 0 to 5 Q7:What types of harm do the news have ? 1. Harmless (e.g. Satire / Parody) 2. Confusion and anxiety about society 3. Threat honor and trust in people, company and good 4. Threat correct understanding of politics and social events 5. Health 6. Prejudice against national and racial 7. Conspiracy Theory 0 〜 1 1〜 2 2〜 3 3〜 4 4〜 5 1 2 3 4 5 6 7
Japanese l Most studies focus on US society and build English resources l Two things we have tackled for the expansion of Japanese resources l Japanese fake news dataset construction l Japanese false news collection system Background Japanese Dataset Fake news collection &$&"%&&! %&&"&!)% "%&!)%%% !&"$ !")&&&!)%%% "")"%& %% !&"$ %'!$%&!&!)% &""$)&%&%!)% &$& "%&!)%&&$"$!$& &&$& &&#'$#"%"&% !)% ")&*&!&%&!)%$ ' &"%"&+ &&+#%"$ "&!)% ( Fact check Fake News Research Help
such as fake news detection dataset and fact-checking organizations Background: Japanese false news collection system Hoaxy [Pik-Mai+, 2018] NewsGuard It is difficult to apply these system to Japanese society because there is few fake news resources. ⇒ We aim to utilize Guardian posts for building Japanese fake news collection system
are social media users who perform the fact-checking intervention themselves towards the posts of uncertain truth. We try to discover Guardian posts and related fake news by searching for specific language patterns.
Japanese without Fake news detection dataset l Easy to apply to other languages l Not dependent on abundant resources l Utilize Guardian tweets Background: Japanese false news collection system " ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "&
),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step3: Tweet grouping l Group posts with similar meanings based on Word Mover’s Distance (WMD), which measures the proximity of meanings in sentences.
),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step4: Ranking l Ranking of collected fake news and guardian posts based on users' interest, representing such as Number of retweets, likes, and etc.
from two perspectives l Q1: Do the collected tweet point out the possibility of false in the news story? ⇒ 77% of collected tweet point out the possibility of false l Q2: Is the subjects of the collected tweet truly fake? ⇒ 52% of collected tweets were truly false
proposed a novel annotation scheme with fine-grained labels to capture various perspectives of fake news ü We built the first Japanese fake news dataset. ü We proposed a Japanese fake news collection system using Guardian posts, without rich fake news resources.
on Twitter as two cascades comprised of ordinary news and correction of fake news ü Chapter 3: We proposed a novel multi-modal method for fake news detection, combining text and user features and infectiousness values ü Chapter 4: We proposed a novel annotation scheme with fine- grained labels to capture various perspectives of fake news. We built the first Japanese fake news dataset and fake news collection system. Conclusion