Slide 1

Slide 1 text

Towards Keeping up with Fake News in the Social Media Ecosystem Taichi Murayama NAIST, Social Computing Lab., D3

Slide 2

Slide 2 text

2 Fake news is false or misleading content presented as news and communicated in formats spanning spoken, written, printed, electronic, and digital communication. -“The Anatomy of Fake News” by Nolan Higdon

Slide 3

Slide 3 text

3 Early Fake News? Great Moon Hoax in 1835 by The Sun German Corpse Factory in WWI by The Times of London

Slide 4

Slide 4 text

4 Fake News after 2010 Internet Social Media Easy to access news Easy to communicate Polarization Fake News Various social problems Online slander Echo Chamber

Slide 5

Slide 5 text

5 Fake news in 2016 US presidential election Example of Fake News: Politics 25% of tweets with news media link in the five months preceding the election day are either fake or extremely biased news. Example of fake news

Slide 6

Slide 6 text

6 Relationship between Fake news and COVID-19 Example of Fake News: Health Vitamin C cures COVID-19 5G network spreads COVID-19

Slide 7

Slide 7 text

7 How does society combat against fake news? Fact check organizations Regulations by platforms Education Legislation

Slide 8

Slide 8 text

8 Targeting of my dissertation Event: Spreading of fake news Social activities (especially in Japan) US researchers discover a huge cat!!!! Over 5 meters… RT RT RT RT Fact Check Education Legislation Reply Reply Very Interesting news !!!! Really ?? I can’t find the information source Help to connect

Slide 9

Slide 9 text

9 Targeting of my dissertation Event: Spreading of fake news Social activities (especially in Japan) US researchers discover a huge cat!!!! Over 5 meters… RT RT RT RT Fact Check Education Legislation Reply Reply Very Interesting news !!!! Really ?? I can’t find the information source Understanding the spread of fake news (Chapter 2) ↓ Detection of fake news (Chapter 3) Fundamental Toward applications Foundation for keeping up with fake news in Japan (Chapter 4)

Slide 10

Slide 10 text

10 1. Modeling the spread of fake news on Twitter 2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Contents

Slide 11

Slide 11 text

11 1. Modeling the spread of fake news on Twitter 2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Contents • Fundamental • Focus on time series of fake news posts • Application to Japan • Resource construction

Slide 12

Slide 12 text

12 1. Modeling the spread of fake news on Twitter 2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Understanding the spread of fake news Time from original post [h] Modeling Time from original post [h] Posting Activity 0 0 0 l 2 (t) l 1 (t) 20 20 40 40 40 20 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h]

Slide 13

Slide 13 text

13 Research objective Modeling the cascade of fake news, in terms of passage of time, on Twitter l How fake news is spread over time is not fully understood l Modeling method can reveal the characteristics of each fake news story. Background Modeling Time from original post [h] Posting Activity 0 0 l 2 (t) l 1 (t) 20 20 40 40 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h] 0 20 40 Fake news tweets Time from original post [h] Probability of posts Time from the initial post ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Modeling Posts of fake news on Twitter Possibility rates of posts on Twitter

Slide 14

Slide 14 text

14 Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Background Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction) User A @userA US researchers discover a huge cat!!!! Over 5 meters… URL User B @userB This is very very interesting news ! URL

Slide 15

Slide 15 text

15 Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Background Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction) User C @userC I think this is fake news…. URL User D @userD Really ?? I can’t find the information source URL

Slide 16

Slide 16 text

16 Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Proposed Model Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction)

Slide 17

Slide 17 text

17 Base Technique: Hawkes Process l Hawkes Process: one of point process models whose defining characteristic is that they “self-excite” l Calculate the probability of the next event from past events and elapsed time l Examples of application: Earthquake movement, Financial transactions Proposed Model Time series of event Modeling Time series of event probability

Slide 18

Slide 18 text

18 Base Model: Time-Dependent Hawkes Process[Kobayashi+, 2016] l Hawkes Process for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model Modeling Time from original post [h] Posting Activity 0 0 l 2 (t) l 1 (t) 20 20 40 40 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h] Modeling 0 20 40 Fake news tweets Time from original post [h] Probability of posts Time from the initial post Modeling The Possibility rates of posts on Twitter Time series of posts on Twitter

Slide 19

Slide 19 text

19 Base Model: Time-Dependent Hawkes Process[Kobayashi+, 2016] l Hawkes Process for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model The probability of post between t, t + ∆% = λ % ∆% λ % = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? "($): infection rate at time t '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember

Slide 20

Slide 20 text

20 Base Model: Time-Dependent Hawkes Process[Kobayashi+, 2016] l Hawkes Process for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model The probability of post between t, t + ∆% = λ % ∆% λ % = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? "($): infection rate at time t '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember ( % = / 1 − 1 sin 26 7% % + 8& 9'#/) *: the intensity of information +: the relative amplitude of the oscillation ,$: phase - : the characteristic time of popularity decay Circadian Rhythm the freshness

Slide 21

Slide 21 text

21 Proposed Model Proposed Model Time (hour) e ) Modeling Time (hour) Posting Activity : % = (* % ℎ* % + (+ (%)ℎ+ (%) $% Split the cascade at the correction time .& /' . 0' . /( (.)0( (.)

Slide 22

Slide 22 text

22 Proposed Model Proposed Model Time (hour) e ) Modeling Time (hour) Posting Activity : % = (* % ℎ* % + (+ (%)ℎ+ (%) $% Split the cascade at the correction time .& /' . 0' . /( (.)0( (.) ℎ* % = < !:#!$%!,(#, #") ,! -(% − %! ) ℎ+ % = < !:#" $ #! $# ,! -(% − %! ) from $% until $% 0(1) *: the intensity of information +: the relative amplitude of the oscillation ,$: phase - : the characteristic time of popularity decay

Slide 23

Slide 23 text

23 Experimental Settings l Task: Predict the number of posts about fake news in the future Evaluation l Setting the modeling period as the front half of the observation time [0, 0.5T] l Setting the test period as the back half of observation time [0.5T, T] Cumulative number of posts Time Test period Modeling period l Evaluation metrics: We evaluate the the the number of posts per an hour. l Mean Average Error : the smaller value is, the better is. l Median Average Error: the smaller value is, the better is.

Slide 24

Slide 24 text

24 Dataset l Recent Fake News (RFN) l Collect 7 fake news, which reported by “Politifact” and “Snopes” from March to May, 2019. l Each news had over 300 posts and kept posting over 36 hours from the initial post. l Fake News in Tohoku earthquake (Tohoku) l Collect 19 fake news related to Tohoku earthquake, which reported by Japanese news media from March 12th to March 24th 2011. l Each news had over 300 posts and kept posting over 36 hours from the initial post. Evaluation

Slide 25

Slide 25 text

25 Experimental Results Proposed method achieved higher accuracy than other methods in two dataset. (100% in RFN and 89% in Tohoku) Evaluation The smaller value is better

Slide 26

Slide 26 text

26 Experimental Results l Magenta indicates the prediction number of the proposed model and Black indicates the actual number of posts. l Proposed model achieves the close to actual number of posts. Evaluation

Slide 27

Slide 27 text

27 Characteristics of estimated parameter Discussion l Parameter / : Compared of / in the first cascade, / in the second cascade shows a tendency to be small (the intensity of information is weak). l The correction time =2 is estimated as around 40 hours from the initial post in both dataset. ( % = > 1 − 1 sin 26 7% % + 8& 9'#/)

Slide 28

Slide 28 text

28 Verification of the correction time !! Discussion l Check the text around the correctio time !! l Verify whether the proposed model can properly estimate the correction time %3 from text of posts l Count words that mean hoax or correction (e.g. fake, not true…) • Our hypothesis The first cascade has the characteristics of ordinary news, while second cascade has the characteristics of correction Blue line indicate the correction time Black line indicates the number of fake words

Slide 29

Slide 29 text

29 Verification of the correction time !!: Word cloud Discussion l Example: Fake news “Turkey Donates 10 Billion Yen to Japan” l Compare the word cloud around the correction time (%3 =37) l Before =2, words related to the news “pro-Japanese” appear. l After =2, some user point out the news is “false rumor.”

Slide 30

Slide 30 text

30 Summary: Modeling the spread of fake news on Twitter ü Model posts of fake news on Twitter as two cascades comprised of ordinary news and correction of fake news ü The proposed model achieves higher accuracy in the prediction task of number of posts. ü The proposed model achieves higher accuracy in the prediction task of number of posts.

Slide 31

Slide 31 text

31 1. Modeling the spread of fake news on Twitter 2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Detection of fake news %/ 0# #$ $ & ") * + #*/#+ " %0- 1# %! #' %%((&&& %. / ! #& %%((&&& Time (hours)

Slide 32

Slide 32 text

32 Research objective Propose the fake news detection model leveraging temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background

Slide 33

Slide 33 text

33 Research objective Propose the fake news detection model leveraging temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background 0 20 40 Fake news tweets Time from original post [h] ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Posts of fake news on Twitter Who post (User) What’s post (Linguistic) Existing Model Fake or Real Extraction Existing fake news detection model

Slide 34

Slide 34 text

34 Research objective Propose the fake news detection model leveraging temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background 0 20 40 Fake news tweets Time from original post [h] ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Posts of fake news on Twitter Who post (User) What’s post (Linguistic) Proposed Model Fake or Real Extraction Proposed model When post (Temporal)

Slide 35

Slide 35 text

35 Case study (Revisit Chapter 2) Background An example of real news An example of fake news Number of posts Time (hours) Time (hours)

Slide 36

Slide 36 text

36 l Problem statement l Input: Text & User & Temporal features extracted from social media posts l Output: the news story is fake or not (Multi-label) Fake news detection from social media posts

Slide 37

Slide 37 text

37 Proposed Model %/ 0# #$ $ & ") * + #*/#+ " %0- 1# %! #' %%((&&& %. / ! #& %%((&&& Time (hours)

Slide 38

Slide 38 text

38 Proposed Model %/ 0# #$ $ & ") * + #*/#+ " %0- 1# %! #' %%((&&& %. / ! #& %%((&&& Time (hours) ≈

Slide 39

Slide 39 text

39 Proposed Model %/ 0# #$ $ & ") * + #*/#+ " %0- 1# %! #' %%((&&& %. / ! #& %%((&&& Time (hours) ≈

Slide 40

Slide 40 text

40 Proposed Model %/ 0# #$ $ & ") * + #*/#+ " %0- 1# %! #' %%((&&& %. / ! #& %%((&&& Time (hours) ≈

Slide 41

Slide 41 text

41 Proposed Model %/ 0# #$ $ & ") * + #*/#+ " %0- 1# %! #' %%((&&& %. / ! #& %%((&&& Time (hours) ≈

Slide 42

Slide 42 text

42 Convert temporal features The probability of post between t, t + ∆% = λ % ∆% = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? /(.): infection rate at time 1 '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember SEISMIC [Zhao+, 2015] (Hawkes Process) Time series of posts on social media Time series of /(.)

Slide 43

Slide 43 text

43 Proposed Model %/ 0# #$ $ & ") * + #*/#+ " %0- 1# %! #' %%((&&& %. / ! #& %%((&&& Time (hours) ≈

Slide 44

Slide 44 text

44 &YQFSJNFOUBM4FUUJOHT l Dataset: Each news consists from multiple posts in SNS l Weibo [Ma+, 2016] l Twitter15 [Ma+, 2017] l Twitter16 [Ma+, 2017] l Metrics l Accuracy: Percentage of correct predictions l F1-Score: The harmonic mean of the precision and recall Experiment No. of true news article No. of fake news article No. of unverified article No. of debunking article

Slide 45

Slide 45 text

45 Experimental Results Experiment T: True, F: False, U: Unverified, D: Debunking Label l Proposed model performed the best for most measures and datasets l Our model did not produce good results for classifying unverified label l As ablation study (proposed model without temporal features), the proposed model achieves higher than proposed (w/o time).

Slide 46

Slide 46 text

46 Experimental Results Experiment T: True, F: False, U: Unverified, D: Debunking Label l Proposed model performed the best for most measures and datasets l Our model did not produce good results for classifying unverified label l As ablation study (proposed model without temporal features), the proposed model achieves higher than proposed (w/o time).

Slide 47

Slide 47 text

47 Experimental Results Experiment T: True, F: False, U: Unverified, D: Debunking Label l Proposed model performed the best for most measures and datasets l Our model did not produce good results for classifying unverified label l As ablation study (proposed model without temporal features), the proposed model achieves higher than proposed (w/o time).

Slide 48

Slide 48 text

48 Summary: Fake News Detection using Temporal Features Extracted via Point Process ü We proposed a novel multi-modal method for fake news detection, combining text and user features and infectiousness values. ü The experimental results empirically showed the effectiveness of temporal features in our proposed model.

Slide 49

Slide 49 text

49 1. Modeling the spread of fake news on Twitter 2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Detection of fake news

Slide 50

Slide 50 text

50 Research objective Promote fact-checking and fake news research in Japanese l Most studies focus on US society and build English resources l Two things we have tackled for the expansion of Japanese resources l Japanese fake news dataset construction l Japanese false news collection system Background Japanese Dataset Fake news collection &$&"%&&! %&&"&!)% "%&!)%%% !&"$ !")&&&!)%%% "")"%& %% !&"$ %'!$%&!&!)% &""$)&%&%!)% &$& "%&!)%&&$"$!$& &&$& &&#'$#"%"&% !)% ")&*&!&%&!)%$ ' &"%"&+ &&+#%"$ "&!)% ( Fact check Fake News Research Help

Slide 51

Slide 51 text

51 Research objective Promote fact-checking and fake news research in Japanese l Most studies focus on US society and build English resources l Two things we have tackled for the expansion of Japanese resources l Japanese fake news dataset construction l Japanese false news collection system Background Japanese Dataset Fake news collection &$&"%&&! %&&"&!)% "%&!)%%% !&"$ !")&&&!)%%% "")"%& %% !&"$ %'!$%&!&!)% &""$)&%&%!)% &$& "%&!)%&&$"$!$& &&$& &&#'$#"%"&% !)% ")&*&!&%&!)%$ ' &"%"&+ &&+#%"$ "&!)% ( Fact check Fake News Research Help

Slide 52

Slide 52 text

52 Background: Japanese dataset construction Q: Can existing fake news datasets really be called “fake news” datasets?

Slide 53

Slide 53 text

53 Background: Japanese dataset construction Q: Can existing fake news datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts

Slide 54

Slide 54 text

54 Background: Japanese dataset construction Q: Can existing fake news datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label

Slide 55

Slide 55 text

55 Background: Japanese dataset construction Q: Can existing fake news datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label Binary Label (31/51) True or Real e.g., 5-scale Label : True / Mostly-True / Mixture / Mostly False/ False

Slide 56

Slide 56 text

56 There is no set definition of the phrase “Fake news.” l Broad definition: “Fake news is false news.” [Zhou+ 2020] [Lazer+, 2018] l Narrow definition: “Fake news is a news article that is intentionally and verifiably false.” [Alcott+ 2017] [Kai+, 2017] [Xichen+, 2020] Definition of fake news

Slide 57

Slide 57 text

57 There is no set definition of the phrase “Fake news.” l Broad definition: “Fake news is false news.” [Zhou+ 2020] [Lazer+, 2018] l Narrow definition: “Fake news is a news article that is intentionally and verifiably false.” [Alcott+ 2017] [Kai+, 2017] [Xichen+, 2020] Definition of fake news The phrase “Fake news” is ambiguous word

Slide 58

Slide 58 text

58 “The phrase ‘fake news’ is ‘woefully inadequate’ to describe the issues at play” by Claire Wardle Fake news can be examined from three perspectives Criticism of the term “Fake news” l Mis-information: false information disseminated online by people who don't have a harmful intent l Dis-information: false information created and shared by people with harmful intent l Mal-information: the sharing of "genuine" information with the intent to cause harm

Slide 59

Slide 59 text

59 Criticism of the term “Fake news” The British government decided that the term "fake news" will no longer be used in official documents in 2018 because it is "a poorly- defined and misleading term.

Slide 60

Slide 60 text

60 Background: Japanese dataset construction Q: Can existing fake news datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label A: No. Most of them are false news datasets.

Slide 61

Slide 61 text

61 Background: Japanese dataset construction Q: Can existing fake news datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label A: No. Most of them are false news datasets. • The label considers only the factual aspect of the news story • The intention of the disseminator, which is one of the component of fake news, is not taken into account.

Slide 62

Slide 62 text

62 Research objective Propose the annotation scheme with fine-grained labels for fake news story: l Factual aspect of news story (false or not) l User intention (Dis-information) l Harmfulness to society (Mal-information), etc. ⇒ We expect to analyze each type of false news based on the attributes labeled, e.g., what kind of false news is likely to spread. Background: Japanese dataset construction

Slide 63

Slide 63 text

63 Our dataset construction scheme (Japanese) Annotation scheme Fact checking organization (Fact Check Initiative Japan) from July 2019 and October 2021 ≈ç Identify and collect the source Social media posts ≈ç ≈ç Annotation 3 annotators ≈ Public Dataset (307 news stories 471,446 tweets)

Slide 64

Slide 64 text

64 7 Questions in our annotation scheme Annotation scheme Q1: What rate does the fact-checking site label to the news? Q2-1: Does the news disseminator know that the news is false? 1. Yes, the news disseminator definitely know the news is false 2. Yes, the news disseminator probably know the news is false 3. No, the news disseminator does not probably know the news is false 4. No, the news disseminator does not definitely know the news is false True Half-True Inaccurate Misleading False Pants on Fire Unknown Evidence Suspended Judgement 1 2 3 4

Slide 65

Slide 65 text

65 Annotation scheme Q2-2A: If Yes, how was the news created? 1. Disinformation, Fabricated content 2. Disinformation, Manipulated image 3. Disinformation, Manipulated text 4. Disinformation, False context Q2-2B: If No, how does the disseminator misunderstand the news? 1. Misinformation, Trusting other sources 2. Misinformation, Inadequate understanding 3. Misinformation, Misleading 1 2 3 4 1 2 3

Slide 66

Slide 66 text

66 Annotation scheme Q3: At Who or what is the false news targeted? l Free writing Q4: Does the news flatter or denigrate the target? 1. Flattery 2. Denigration 3. Neither, No such intention Q5: What the purpose of the false news? 1. Satire / Parody 2. Partisan 3. Propaganda 4. No purpose / Unknown 1 2 3 1 2 3 4

Slide 67

Slide 67 text

67 Annotation scheme Q6: To what extent is the news harmful to society ? (average) l Rating from 0 to 5 Q7:What types of harm do the news have ? 1. Harmless (e.g. Satire / Parody) 2. Confusion and anxiety about society 3. Threat honor and trust in people, company and good 4. Threat correct understanding of politics and social events 5. Health 6. Prejudice against national and racial 7. Conspiracy Theory 0 〜 1 1〜 2 2〜 3 3〜 4 4〜 5 1 2 3 4 5 6 7

Slide 68

Slide 68 text

68 Example &$&"%&&! %&&"&!)% "%&!)%%% !&"$ !")&&&!)%%% "")"%& %% !&"$ %'!$%&!&!)% &""$)&%&%!)% &$& "%&!)%&&$"$!$& &&$& &&#'$#"%"&% !)% ")&*&!&%&!)%$ ' &"%"&+ &&+#%"$ "&!)% (

Slide 69

Slide 69 text

69 Example %#%!$%% $%%!% ($ !$% ($$$ %!# !(%%% ($$$ ! $!(($% ($ #% %!!#(%$%$ ($ %#% !$% ($%%#!# #% %%#% %%""!$!%$ ($ !(%)% %$% ($#& %!$!%* %%*"$!#!% ($ ' !

Slide 70

Slide 70 text

70 Word cloud: about Q7 Understanding of politics and social events Health Prejudice against national and racial Statistics

Slide 71

Slide 71 text

71 Bot user percentage Statistics FakeNewsNet Our dataset from [Kai+, 2019] Bot user Real user 22% 78% 93% 7%

Slide 72

Slide 72 text

72 Research objective Promote fact-checking and fake news research in Japanese l Most studies focus on US society and build English resources l Two things we have tackled for the expansion of Japanese resources l Japanese fake news dataset construction l Japanese false news collection system Background Japanese Dataset Fake news collection &$&"%&&! %&&"&!)% "%&!)%%% !&"$ !")&&&!)%%% "")"%& %% !&"$ %'!$%&!&!)% &""$)&%&%!)% &$& "%&!)%&&$"$!$& &&$& &&#'$#"%"&% !)% ")&*&!&%&!)%$ ' &"%"&+ &&+#%"$ "&!)% ( Fact check Fake News Research Help

Slide 73

Slide 73 text

73 Existing fake news collection system depends on rich resource such as fake news detection dataset and fact-checking organizations Background: Japanese false news collection system Hoaxy [Pik-Mai+, 2018] NewsGuard It is difficult to apply these system to Japanese society because there is few fake news resources. ⇒ We aim to utilize Guardian posts for building Japanese fake news collection system

Slide 74

Slide 74 text

74 What’s Guardian? Background: Japanese false news collection system Guardians are social media users who perform the fact-checking intervention themselves towards the posts of uncertain truth.

Slide 75

Slide 75 text

75 What’s Guardian? Background: Japanese false news collection system Guardians are social media users who perform the fact-checking intervention themselves towards the posts of uncertain truth. We try to discover Guardian posts and related fake news by searching for specific language patterns.

Slide 76

Slide 76 text

76 Research objective Propose the fake news collection system for Japanese without Fake news detection dataset l Easy to apply to other languages l Not dependent on abundant resources l Utilize Guardian tweets Background: Japanese false news collection system " ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "&

Slide 77

Slide 77 text

77 Fake news collection system " ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "&

Slide 78

Slide 78 text

78 Fake news collection system " ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step1: Crawling Guardian posts from Twitter l FAKEWORDS: {σϚ, ϑΣΠΫ, ؒҧ͍, ෆਖ਼֬, ޡใ, ڏِ, ࣄ࣮ແࠜ} l Collect about 5000 tweets per day

Slide 79

Slide 79 text

79 Fake news collection system " ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step2: Noise tweet removal l Remove irrelevant posts using fine-tuned BERT-based model

Slide 80

Slide 80 text

80 Fake news collection system " ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step3: Tweet grouping l Group posts with similar meanings based on Word Mover’s Distance (WMD), which measures the proximity of meanings in sentences.

Slide 81

Slide 81 text

81 Fake news collection system " ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step4: Ranking l Ranking of collected fake news and guardian posts based on users' interest, representing such as Number of retweets, likes, and etc.

Slide 82

Slide 82 text

82 Fake news collection system " ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "& Step5: Visualization and dataset collection l Visualize high-ranking fake news l Collect relevant posts of visualized fake news

Slide 83

Slide 83 text

83 Fake news collection system

Slide 84

Slide 84 text

84 Evaluation Evaluate the collected posts between 2021/11/1 and 11/14 from two perspectives l Q1: Do the collected tweet point out the possibility of false in the news story? ⇒ 77% of collected tweet point out the possibility of false l Q2: Is the subjects of the collected tweet truly fake? ⇒ 52% of collected tweets were truly false

Slide 85

Slide 85 text

85 Summary: Towards countermeasures against the fake news ü We proposed a novel annotation scheme with fine-grained labels to capture various perspectives of fake news ü We built the first Japanese fake news dataset. ü We proposed a Japanese fake news collection system using Guardian posts, without rich fake news resources.

Slide 86

Slide 86 text

86 ü Chapter 2: We model posts of fake news on Twitter as two cascades comprised of ordinary news and correction of fake news ü Chapter 3: We proposed a novel multi-modal method for fake news detection, combining text and user features and infectiousness values ü Chapter 4: We proposed a novel annotation scheme with fine- grained labels to capture various perspectives of fake news. We built the first Japanese fake news dataset and fake news collection system. Conclusion

Slide 87

Slide 87 text

87 ü Construction of English fake news detection dataset following our proposal annotation scheme ü Construction of larger Japanese dataset by fake news collection system Future Work " ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " & ! $$ "& # "& & # & "# # & $$ # "&& ' "&