Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards Keeping up with Fake News in the Social Media Ecosystem (Dissertation Slides)

Towards Keeping up with Fake News in the Social Media Ecosystem (Dissertation Slides)

公聴会スライド

99329e6255a0fe867a64beb897cf3c84?s=128

taichi_murayama

December 29, 2021
Tweet

Transcript

  1. Towards Keeping up with Fake News in the Social Media

    Ecosystem Taichi Murayama NAIST, Social Computing Lab., D3
  2. 2 Fake news is false or misleading content presented as

    news and communicated in formats spanning spoken, written, printed, electronic, and digital communication. -“The Anatomy of Fake News” by Nolan Higdon
  3. 3 Early Fake News? Great Moon Hoax in 1835 by

    The Sun German Corpse Factory in WWI by The Times of London
  4. 4 Fake News after 2010 Internet Social Media Easy to

    access news Easy to communicate Polarization Fake News Various social problems Online slander Echo Chamber
  5. 5 Fake news in 2016 US presidential election Example of

    Fake News: Politics 25% of tweets with news media link in the five months preceding the election day are either fake or extremely biased news. Example of fake news
  6. 6 Relationship between Fake news and COVID-19 Example of Fake

    News: Health Vitamin C cures COVID-19 5G network spreads COVID-19
  7. 7 How does society combat against fake news? Fact check

    organizations Regulations by platforms Education Legislation
  8. 8 Targeting of my dissertation Event: Spreading of fake news

    Social activities (especially in Japan) US researchers discover a huge cat!!!! Over 5 meters… RT RT RT RT Fact Check Education Legislation Reply Reply Very Interesting news !!!! Really ?? I can’t find the information source Help to connect
  9. 9 Targeting of my dissertation Event: Spreading of fake news

    Social activities (especially in Japan) US researchers discover a huge cat!!!! Over 5 meters… RT RT RT RT Fact Check Education Legislation Reply Reply Very Interesting news !!!! Really ?? I can’t find the information source Understanding the spread of fake news (Chapter 2) ↓ Detection of fake news (Chapter 3) Fundamental Toward applications Foundation for keeping up with fake news in Japan (Chapter 4)
  10. 10 1. Modeling the spread of fake news on Twitter

    2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Contents
  11. 11 1. Modeling the spread of fake news on Twitter

    2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Contents • Fundamental • Focus on time series of fake news posts • Application to Japan • Resource construction
  12. 12 1. Modeling the spread of fake news on Twitter

    2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Understanding the spread of fake news Time from original post [h] Modeling Time from original post [h] Posting Activity 0 0 0 l 2 (t) l 1 (t) 20 20 40 40 40 20 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h]
  13. 13 Research objective Modeling the cascade of fake news, in

    terms of passage of time, on Twitter l How fake news is spread over time is not fully understood l Modeling method can reveal the characteristics of each fake news story. Background Modeling Time from original post [h] Posting Activity 0 0 l 2 (t) l 1 (t) 20 20 40 40 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h] 0 20 40 Fake news tweets Time from original post [h] Probability of posts Time from the initial post ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Modeling Posts of fake news on Twitter Possibility rates of posts on Twitter
  14. 14         

                                          Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Background Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction) User A @userA US researchers discover a huge cat!!!! Over 5 meters… URL User B @userB This is very very interesting news ! URL
  15. 15         

                                          Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Background Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction) User C @userC I think this is fake news…. URL User D @userD Really ?? I can’t find the information source URL
  16. 16         

                                          Hypothesis: the cascade of fake news Fake news cascade on Twitter is comprised of two stage cascades l First cascade has the characteristics as ordinary news story l Second cascade has the characteristics of correction because users recognize the falsity of the news item around a correction time !! Proposed Model Timeline of posts Firs Cascade (the characteristics of ordinary news) Second cascade (the characteristics of correction)
  17. 17 Base Technique: Hawkes Process l Hawkes Process: one of

    point process models whose defining characteristic is that they “self-excite” l Calculate the probability of the next event from past events and elapsed time l Examples of application: Earthquake movement, Financial transactions Proposed Model Time series of event Modeling Time series of event probability
  18. 18 Base Model: Time-Dependent Hawkes Process[Kobayashi+, 2016] l Hawkes Process

    for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model Modeling Time from original post [h] Posting Activity 0 0 l 2 (t) l 1 (t) 20 20 40 40 l(t) 0 40 20 Fake news tweets 1st stage: News 2nd stage: Correction = tc Time from original post [h] Modeling 0 20 40 Fake news tweets Time from original post [h] Probability of posts Time from the initial post Modeling The Possibility rates of posts on Twitter Time series of posts on Twitter
  19. 19 Base Model: Time-Dependent Hawkes Process[Kobayashi+, 2016] l Hawkes Process

    for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model The probability of post between t, t + ∆% = λ % ∆% λ % = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? "($): infection rate at time t '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember
  20. 20 Base Model: Time-Dependent Hawkes Process[Kobayashi+, 2016] l Hawkes Process

    for social media: considering the two characteristics in social media l The concept of freshness of information l User circadian rhythm (e.g. user is not active at midnight) Proposed Model The probability of post between t, t + ∆% = λ % ∆% λ % = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? "($): infection rate at time t '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember ( % = / 1 − 1 sin 26 7% % + 8& 9'#/) *: the intensity of information +: the relative amplitude of the oscillation ,$: phase - : the characteristic time of popularity decay Circadian Rhythm the freshness
  21. 21 Proposed Model Proposed Model     

                                                                                                         Time (hour) e )        Modeling        Time (hour) Posting Activity   : % = (* % ℎ* % + (+ (%)ℎ+ (%) $% Split the cascade at the correction time .& /' . 0' . /( (.)0( (.)
  22. 22 Proposed Model Proposed Model     

                                                                                                         Time (hour) e )        Modeling        Time (hour) Posting Activity   : % = (* % ℎ* % + (+ (%)ℎ+ (%) $% Split the cascade at the correction time .& /' . 0' . /( (.)0( (.) ℎ* % = < !:#!$%!,(#, #") ,! -(% − %! ) ℎ+ % = < !:#" $ #! $# ,! -(% − %! ) from $% until $% 0(1) *: the intensity of information +: the relative amplitude of the oscillation ,$: phase - : the characteristic time of popularity decay
  23. 23 Experimental Settings l Task: Predict the number of posts

    about fake news in the future Evaluation l Setting the modeling period as the front half of the observation time [0, 0.5T] l Setting the test period as the back half of observation time [0.5T, T] Cumulative number of posts Time Test period Modeling period l Evaluation metrics: We evaluate the the the number of posts per an hour. l Mean Average Error : the smaller value is, the better is. l Median Average Error: the smaller value is, the better is.
  24. 24 Dataset l Recent Fake News (RFN) l Collect 7

    fake news, which reported by “Politifact” and “Snopes” from March to May, 2019. l Each news had over 300 posts and kept posting over 36 hours from the initial post. l Fake News in Tohoku earthquake (Tohoku) l Collect 19 fake news related to Tohoku earthquake, which reported by Japanese news media from March 12th to March 24th 2011. l Each news had over 300 posts and kept posting over 36 hours from the initial post. Evaluation
  25. 25 Experimental Results Proposed method achieved higher accuracy than other

    methods in two dataset. (100% in RFN and 89% in Tohoku) Evaluation The smaller value is better
  26. 26 Experimental Results l Magenta indicates the prediction number of

    the proposed model and Black indicates the actual number of posts. l Proposed model achieves the close to actual number of posts. Evaluation
  27. 27 Characteristics of estimated parameter Discussion l Parameter / :

    Compared of / in the first cascade, / in the second cascade shows a tendency to be small (the intensity of information is weak). l The correction time =2 is estimated as around 40 hours from the initial post in both dataset. ( % = > 1 − 1 sin 26 7% % + 8& 9'#/)
  28. 28 Verification of the correction time !! Discussion l Check

    the text around the correctio time !! l Verify whether the proposed model can properly estimate the correction time %3 from text of posts l Count words that mean hoax or correction (e.g. fake, not true…) • Our hypothesis The first cascade has the characteristics of ordinary news, while second cascade has the characteristics of correction Blue line indicate the correction time Black line indicates the number of fake words
  29. 29 Verification of the correction time !!: Word cloud Discussion

    l Example: Fake news “Turkey Donates 10 Billion Yen to Japan” l Compare the word cloud around the correction time (%3 =37) l Before =2, words related to the news “pro-Japanese” appear. l After =2, some user point out the news is “false rumor.”
  30. 30 Summary: Modeling the spread of fake news on Twitter

    ü Model posts of fake news on Twitter as two cascades comprised of ordinary news and correction of fake news ü The proposed model achieves higher accuracy in the prediction task of number of posts. ü The proposed model achieves higher accuracy in the prediction task of number of posts.
  31. 31 1. Modeling the spread of fake news on Twitter

    2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Detection of fake news                %/  0#      #$ $   &                  ") * +  #*/#+                   "           %0-  1#      % ! #'  %%((&&&   %.  /      ! #&    %%((&&& Time (hours)
  32. 32 Research objective Propose the fake news detection model leveraging

    temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background
  33. 33 Research objective Propose the fake news detection model leveraging

    temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background 0 20 40 Fake news tweets Time from original post [h] ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Posts of fake news on Twitter Who post (User) What’s post (Linguistic) Existing Model Fake or Real Extraction Existing fake news detection model
  34. 34 Research objective Propose the fake news detection model leveraging

    temporal features from social media posts l Examining the effectiveness of temporal features (when user post) in detecting fake news l Aiming to achieve the highest detection performance Background 0 20 40 Fake news tweets Time from original post [h] ≈ US researchers discover a huge cat!!!! Over 5 meters… ≈ This news is very Interesting !!! Posts of fake news on Twitter Who post (User) What’s post (Linguistic) Proposed Model Fake or Real Extraction Proposed model When post (Temporal)
  35. 35 Case study (Revisit Chapter 2) Background An example of

    real news An example of fake news Number of posts Time (hours) Time (hours)
  36. 36 l Problem statement l Input: Text & User &

    Temporal features extracted from social media posts l Output: the news story is fake or not (Multi-label) Fake news detection from social media posts
  37. 37 Proposed Model       

            %/  0#      #$ $   &                  ") * +  #*/#+                   "           %0-  1#      % ! #'  %%((&&&   %.  /      ! #&    %%((&&& Time (hours)
  38. 38 Proposed Model       

            %/  0#      #$ $   &                  ") * +  #*/#+                   "           %0-  1#      % ! #'  %%((&&&   %.  /      ! #&    %%((&&& Time (hours) ≈
  39. 39 Proposed Model       

            %/  0#      #$ $   &                  ") * +  #*/#+                   "           %0-  1#      % ! #'  %%((&&&   %.  /      ! #&    %%((&&& Time (hours) ≈
  40. 40 Proposed Model       

            %/  0#      #$ $   &                  ") * +  #*/#+                   "           %0-  1#      % ! #'  %%((&&&   %.  /      ! #&    %%((&&& Time (hours) ≈
  41. 41 Proposed Model       

            %/  0#      #$ $   &                  ") * +  #*/#+                   "           %0-  1#      % ! #'  %%((&&&   %.  /      ! #&    %%((&&& Time (hours) ≈
  42. 42 Convert temporal features The probability of post between t,

    t + ∆% = λ % ∆% = ((%) ∑ !:#!$# ,! -(% − %! ) Infection rate How many people remember? /(.): infection rate at time 1 '!: number of followers at ("# post $!: time at ("# post ): the function of how much do you remember SEISMIC [Zhao+, 2015] (Hawkes Process) Time series of posts on social media Time series of /(.)
  43. 43 Proposed Model       

            %/  0#      #$ $   &                  ") * +  #*/#+                   "           %0-  1#      % ! #'  %%((&&&   %.  /      ! #&    %%((&&& Time (hours) ≈
  44. 44 &YQFSJNFOUBM4FUUJOHT l Dataset: Each news consists from multiple posts

    in SNS l Weibo [Ma+, 2016] l Twitter15 [Ma+, 2017] l Twitter16 [Ma+, 2017] l Metrics l Accuracy: Percentage of correct predictions l F1-Score: The harmonic mean of the precision and recall Experiment No. of true news article No. of fake news article No. of unverified article No. of debunking article
  45. 45 Experimental Results Experiment T: True, F: False, U: Unverified,

    D: Debunking Label l Proposed model performed the best for most measures and datasets l Our model did not produce good results for classifying unverified label l As ablation study (proposed model without temporal features), the proposed model achieves higher than proposed (w/o time).
  46. 46 Experimental Results Experiment T: True, F: False, U: Unverified,

    D: Debunking Label l Proposed model performed the best for most measures and datasets l Our model did not produce good results for classifying unverified label l As ablation study (proposed model without temporal features), the proposed model achieves higher than proposed (w/o time).
  47. 47 Experimental Results Experiment T: True, F: False, U: Unverified,

    D: Debunking Label l Proposed model performed the best for most measures and datasets l Our model did not produce good results for classifying unverified label l As ablation study (proposed model without temporal features), the proposed model achieves higher than proposed (w/o time).
  48. 48 Summary: Fake News Detection using Temporal Features Extracted via

    Point Process ü We proposed a novel multi-modal method for fake news detection, combining text and user features and infectiousness values. ü The experimental results empirically showed the effectiveness of temporal features in our proposed model.
  49. 49 1. Modeling the spread of fake news on Twitter

    2. Fake News Detection using Temporal Features Extracted via Point Process 3. Towards countermeasures against the fake news: - Dataset - Collection system Detection of fake news                
  50. 50 Research objective Promote fact-checking and fake news research in

    Japanese l Most studies focus on US society and build English resources l Two things we have tackled for the expansion of Japanese resources l Japanese fake news dataset construction l Japanese false news collection system Background Japanese Dataset Fake news collection                             &$&"%&&! %&&"&!)%     "%&!)%%% !&"$ !")&&&!)%%%        "")"%& %% !&"$ %'!$%&!&!)%        &""$)&%&%!)% &$&   "%&!)%&&$"$!$& &&$&     &&#'$#"%"&% !)%     ")&*&!&%&!)%$ ' &"%"&+   &&+#%"$ "&!)% (   Fact check Fake News Research Help
  51. 51 Research objective Promote fact-checking and fake news research in

    Japanese l Most studies focus on US society and build English resources l Two things we have tackled for the expansion of Japanese resources l Japanese fake news dataset construction l Japanese false news collection system Background Japanese Dataset Fake news collection                             &$&"%&&! %&&"&!)%     "%&!)%%% !&"$ !")&&&!)%%%        "")"%& %% !&"$ %'!$%&!&!)%        &""$)&%&%!)% &$&   "%&!)%&&$"$!$& &&$&     &&#'$#"%"&% !)%     ")&*&!&%&!)%$ ' &"%"&+   &&+#%"$ "&!)% (   Fact check Fake News Research Help
  52. 52 Background: Japanese dataset construction Q: Can existing fake news

    datasets really be called “fake news” datasets?
  53. 53 Background: Japanese dataset construction Q: Can existing fake news

    datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts
  54. 54 Background: Japanese dataset construction Q: Can existing fake news

    datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label
  55. 55 Background: Japanese dataset construction Q: Can existing fake news

    datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label Binary Label (31/51) True or Real e.g., 5-scale Label : True / Mostly-True / Mixture / Mostly False/ False
  56. 56 There is no set definition of the phrase “Fake

    news.” l Broad definition: “Fake news is false news.” [Zhou+ 2020] [Lazer+, 2018] l Narrow definition: “Fake news is a news article that is intentionally and verifiably false.” [Alcott+ 2017] [Kai+, 2017] [Xichen+, 2020] Definition of fake news
  57. 57 There is no set definition of the phrase “Fake

    news.” l Broad definition: “Fake news is false news.” [Zhou+ 2020] [Lazer+, 2018] l Narrow definition: “Fake news is a news article that is intentionally and verifiably false.” [Alcott+ 2017] [Kai+, 2017] [Xichen+, 2020] Definition of fake news The phrase “Fake news” is ambiguous word
  58. 58 “The phrase ‘fake news’ is ‘woefully inadequate’ to describe

    the issues at play” by Claire Wardle Fake news can be examined from three perspectives Criticism of the term “Fake news” l Mis-information: false information disseminated online by people who don't have a harmful intent l Dis-information: false information created and shared by people with harmful intent l Mal-information: the sharing of "genuine" information with the intent to cause harm
  59. 59 Criticism of the term “Fake news” The British government

    decided that the term "fake news" will no longer be used in official documents in 2018 because it is "a poorly- defined and misleading term.
  60. 60 Background: Japanese dataset construction Q: Can existing fake news

    datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label A: No. Most of them are false news datasets.
  61. 61 Background: Japanese dataset construction Q: Can existing fake news

    datasets really be called “fake news” datasets? Existing fake news dataset construction scheme Fact checking organizations Identify and collect the source Media A News article Social media posts Dataset Label A: No. Most of them are false news datasets. • The label considers only the factual aspect of the news story • The intention of the disseminator, which is one of the component of fake news, is not taken into account.
  62. 62 Research objective Propose the annotation scheme with fine-grained labels

    for fake news story: l Factual aspect of news story (false or not) l User intention (Dis-information) l Harmfulness to society (Mal-information), etc. ⇒ We expect to analyze each type of false news based on the attributes labeled, e.g., what kind of false news is likely to spread. Background: Japanese dataset construction
  63. 63 Our dataset construction scheme (Japanese) Annotation scheme Fact checking

    organization (Fact Check Initiative Japan) from July 2019 and October 2021 ō Identify and collect the source Social media posts ō ō Annotation 3 annotators ŠPublic Dataset (307 news stories 471,446 tweets)
  64. 64 7 Questions in our annotation scheme Annotation scheme Q1:

    What rate does the fact-checking site label to the news? Q2-1: Does the news disseminator know that the news is false? 1. Yes, the news disseminator definitely know the news is false 2. Yes, the news disseminator probably know the news is false 3. No, the news disseminator does not probably know the news is false 4. No, the news disseminator does not definitely know the news is false True Half-True Inaccurate Misleading False Pants on Fire Unknown Evidence Suspended Judgement 1 2 3 4
  65. 65 Annotation scheme Q2-2A: If Yes, how was the news

    created? 1. Disinformation, Fabricated content 2. Disinformation, Manipulated image 3. Disinformation, Manipulated text 4. Disinformation, False context Q2-2B: If No, how does the disseminator misunderstand the news? 1. Misinformation, Trusting other sources 2. Misinformation, Inadequate understanding 3. Misinformation, Misleading 1 2 3 4 1 2 3
  66. 66 Annotation scheme Q3: At Who or what is the

    false news targeted? l Free writing Q4: Does the news flatter or denigrate the target? 1. Flattery 2. Denigration 3. Neither, No such intention Q5: What the purpose of the false news? 1. Satire / Parody 2. Partisan 3. Propaganda 4. No purpose / Unknown 1 2 3 1 2 3 4
  67. 67 Annotation scheme Q6: To what extent is the news

    harmful to society ? (average) l Rating from 0 to 5 Q7:What types of harm do the news have ? 1. Harmless (e.g. Satire / Parody) 2. Confusion and anxiety about society 3. Threat honor and trust in people, company and good 4. Threat correct understanding of politics and social events 5. Health 6. Prejudice against national and racial 7. Conspiracy Theory 0 〜 1 1〜 2 2〜 3 3〜 4 4〜 5 1 2 3 4 5 6 7
  68. 68 Example        

       &$&"%&&! %&&"&!)%     "%&!)%%% !&"$ !")&&&!)%%%        "")"%& %% !&"$ %'!$%&!&!)%        &""$)&%&%!)% &$&   "%&!)%&&$"$!$& &&$&     &&#'$#"%"&% !)%     ")&*&!&%&!)%$ ' &"%"&+   &&+#%"$ "&!)% (  
  69. 69 Example        

       %#%!$%%  $%%!% ($   !$% ($$$ %!#  !(%%% ($$$    !    $!(($% ($ #%      %!!#(%$%$ ($ %#%    !$% ($%%#!# #% %%#%   %%"&#"!$!%$ ($    !(%)% %$% ($#& %!$!%*   %%*"$!#!% ($ '   !
  70. 70 Word cloud: about Q7 Understanding of politics and social

    events Health Prejudice against national and racial Statistics
  71. 71 Bot user percentage Statistics FakeNewsNet Our dataset from [Kai+,

    2019] Bot user Real user 22% 78% 93% 7%
  72. 72 Research objective Promote fact-checking and fake news research in

    Japanese l Most studies focus on US society and build English resources l Two things we have tackled for the expansion of Japanese resources l Japanese fake news dataset construction l Japanese false news collection system Background Japanese Dataset Fake news collection                             &$&"%&&! %&&"&!)%     "%&!)%%% !&"$ !")&&&!)%%%        "")"%& %% !&"$ %'!$%&!&!)%        &""$)&%&%!)% &$&   "%&!)%&&$"$!$& &&$&     &&#'$#"%"&% !)%     ")&*&!&%&!)%$ ' &"%"&+   &&+#%"$ "&!)% (   Fact check Fake News Research Help
  73. 73 Existing fake news collection system depends on rich resource

    such as fake news detection dataset and fact-checking organizations Background: Japanese false news collection system Hoaxy [Pik-Mai+, 2018] NewsGuard It is difficult to apply these system to Japanese society because there is few fake news resources. ⇒ We aim to utilize Guardian posts for building Japanese fake news collection system
  74. 74 What’s Guardian? Background: Japanese false news collection system Guardians

    are social media users who perform the fact-checking intervention themselves towards the posts of uncertain truth.        
  75. 75 What’s Guardian? Background: Japanese false news collection system Guardians

    are social media users who perform the fact-checking intervention themselves towards the posts of uncertain truth.         We try to discover Guardian posts and related fake news by searching for specific language patterns.
  76. 76 Research objective Propose the fake news collection system for

    Japanese without Fake news detection dataset l Easy to apply to other languages l Not dependent on abundant resources l Utilize Guardian tweets Background: Japanese false news collection system "              ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " &  ! $$  "&     #          "& &   #    &               "# #    &    $$        #    "&&    ' "&                                                              
  77. 77 Fake news collection system "    

             ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " &  ! $$  "&     #          "& &   #    &               "# #    &    $$        #    "&&    ' "&                                                              
  78. 78 Fake news collection system "    

             ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " &  ! $$  "&     #          "& &   #    &               "# #    &    $$        #    "&&    ' "&                                                               Step1: Crawling Guardian posts from Twitter l FAKEWORDS: {σϚ, ϑΣΠΫ, ؒҧ͍, ෆਖ਼֬, ޡใ, ڏِ, ࣄ࣮ແࠜ} l Collect about 5000 tweets per day
  79. 79 Fake news collection system "    

             ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " &  ! $$  "&     #          "& &   #    &               "# #    &    $$        #    "&&    ' "&                                                               Step2: Noise tweet removal l Remove irrelevant posts using fine-tuned BERT-based model
  80. 80 Fake news collection system "    

             ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " &  ! $$  "&     #          "& &   #    &               "# #    &    $$        #    "&&    ' "&                                                               Step3: Tweet grouping l Group posts with similar meanings based on Word Mover’s Distance (WMD), which measures the proximity of meanings in sentences.
  81. 81 Fake news collection system "    

             ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " &  ! $$  "&     #          "& &   #    &               "# #    &    $$        #    "&&    ' "&                                                               Step4: Ranking l Ranking of collected fake news and guardian posts based on users' interest, representing such as Number of retweets, likes, and etc.
  82. 82 Fake news collection system "    

             ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " &  ! $$  "&     #          "& &   #    &               "# #    &    $$        #    "&&    ' "&                                                               Step5: Visualization and dataset collection l Visualize high-ranking fake news l Collect relevant posts of visualized fake news
  83. 83 Fake news collection system

  84. 84 Evaluation Evaluate the collected posts between 2021/11/1 and 11/14

    from two perspectives l Q1: Do the collected tweet point out the possibility of false in the news story? ⇒ 77% of collected tweet point out the possibility of false l Q2: Is the subjects of the collected tweet truly fake? ⇒ 52% of collected tweets were truly false        
  85. 85 Summary: Towards countermeasures against the fake news ü We

    proposed a novel annotation scheme with fine-grained labels to capture various perspectives of fake news ü We built the first Japanese fake news dataset. ü We proposed a Japanese fake news collection system using Guardian posts, without rich fake news resources.
  86. 86 ü Chapter 2: We model posts of fake news

    on Twitter as two cascades comprised of ordinary news and correction of fake news ü Chapter 3: We proposed a novel multi-modal method for fake news detection, combining text and user features and infectiousness values ü Chapter 4: We proposed a novel annotation scheme with fine- grained labels to capture various perspectives of fake news. We built the first Japanese fake news dataset and fake news collection system. Conclusion
  87. 87 ü Construction of English fake news detection dataset following

    our proposal annotation scheme ü Construction of larger Japanese dataset by fake news collection system Future Work "              ),%*) ),%** ),%+- ),%-, ),%-- ),%-. "# ' " &  ! $$  "&     #          "& &   #    &               "# #    &    $$        #    "&&    ' "&