Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Recommender Systems Part 3 - 2022.01.17

059fb717431a8cd2b509ffebc57d905a?s=47 Y. Yamamoto
January 16, 2022

Recommender Systems Part 3 - 2022.01.17

1. Programming assignments review
2. Problems on Simple Collaborative Filtering
3. Matrix Factorization
4. Challenges for recommender systems

059fb717431a8cd2b509ffebc57d905a?s=128

Y. Yamamoto

January 16, 2022
Tweet

More Decks by Y. Yamamoto

Other Decks in Science

Transcript

  1. Matrix Factorization: Beyond Simple Collaborative Filtering Yusuke Yamamoto Associate Professor,

    Faculty of Informatics yusuke_yamamoto@acm.org Data Engineering (Recommender Systems 3) 2022.01.17
  2. 0 2 Review programming assignment for the last lecture

  3. 3 Visit the following URL: https://recsys2021.hontolab.org/

  4. 4 Click this link to see my sample answers

  5. 1 5 Problems on Simple Collaborative Filtering

  6. User-based Collaborative Filtering 6 Predicts a target user’s rating for

    an item based on rating tendency of similar users 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢! , 𝑖 = 𝑟"! + ∑"∈$" 𝑠𝑖𝑚(𝑢! , 𝑢) 1 (𝑟",& − 𝑟"! ) ∑"∈$" 𝑠𝑖𝑚(𝑢! , 𝑢) Item5 sim Average Rating Alice ? 1 4 User1 3 0.85 2.4 User2 5 0.71 3.8 Similar users
  7. Item-based Collaborative Filtering 7 Item1 Item2 Item3 Item4 Item5 Alice

    5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 3 3 1 5 4 User4 1 5 5 2 1 similar Predicts unknown rating scores based on rating tendency for similar items similar 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢! , 𝑖" = ∑#∈%! 𝑠𝑖𝑚(𝑖" , 𝑖) 0 𝑟&",# ∑#∈%! 𝑠𝑖𝑚(𝑖" , 𝑖)
  8. Problems on CF approaches (1/3) 8 Image reference: https://rafalab.github.io/dsbook/recommendation-systems.html Real

    data is quite sparse!! Even on large e-commerce sites, there are few intersections between user vectors (& item vectors).
  9. Problems on CF approaches (2/3) 9 The curse of dimensionality

    • In high-dimensional space, it’s difficult to handle similarity • Usually, item/user vectors have quite high dimensionality (b/c rating matrix is quite large)
  10. Problems on CF approaches (3/3) 10 High computational cost •

    A rating matrix is directly used every time systems try to find similar user/items and make predictions • CF approaches do not scale for most real world scenarios User 1 3 2 4 … 3 5 2 4 … 1 … Rating matrix Similar users for user 1 3 2 4 … 3 5 2 4 … 1 … Rating matrix Suggested items for user 1 Compute Compute User 2 3 2 4 … 3 5 2 4 … 1 … Rating matrix Similar users for user 2 3 2 4 … 3 5 2 4 … 1 … Rating matrix Suggested items for user 2 Compute Compute
  11. Recent approaches for recommender systems 11 Model-based approach - Based

    on offline pre-processing - At run-time, only pre-trained model is used for rating prediction - Pre-trained models can be updated 3 2 4 … 3 5 2 4 … 1 … Rating matrix Pre-trained Model Computation offline User Suggested Items for user Computation online
  12. Memory-based approach vs. model-based approach 12 Memory-based approach - User-based

    CF - Item-based CF Model-based approach - Matrix factorization - Association rule mining - Probabilistic model - Other ML techniques
  13. 2 13 Matrix Factorization

  14. User God father Termin- ator Money game Titanic Back to

    the future … X-men Alice 5 1 4 4 3 … 2 Basic idea 1 (1/2) 14 I don’t like horror… Sci-Fis often move me. I love humane and dramatic movies! User Horror Sci-Fi Humanity Drama … Alice 0.3 2.1 6.1 4.7 … Latent factors (which cannot be observed) Assumes that latent factors exist in users/items
  15. Basic idea 1 (2/2) 15 Assumes that latent factors exist

    in users/items Image from Amazon.com Latent factors (which cannot be observed) User God father Termin- ator Money game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 Movie Horror Sci-Fi Humanity Drama … God father 2.3 0.4 4.9 5.7 …
  16. Basic idea 2 16 User God father Termin- ator Money

    game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 User Horror Sci-Fi Humanity Drama … Alice 0.3 2.1 6.1 4.7 … Assumes that rating scores derive from latent factors of users and items Movie Horror Sci-Fi Humanity Drama … God father 2.3 0.4 4.9 5.7 … ×
  17. Summary of matrix factorization 17 3 … 3 5 2

    … 1 … Rating matrix = R (m users × n items) ≈ P 1.3 … 0.7 0.2 … 4.9 … × 0.2 1.1 … 0.9 3.1 … 0.7 1 … Latent user matrix (m users × k latent factors) Latent item matrix (k latent factors x n items) Q R ≈ T × • Rating matrix can be decomposed to latent factors of users and items • The dimension of latent factors (vectors) is much less than the number of users and items (k ≪ m, n) □ □ □ (□ = unknown scores) User Item Rating
  18. 3 … 3 5 2 … 1 … Prediction using

    matrix factorization 18 Rating matrix = R R □ □ □ Original (raw)
  19. 3 … 3 5 2 … 1 … Prediction using

    matrix factorization 19 Rating matrix = R ≒ P 1.3 … 0.7 0.2 … 4.9 … × 0.2 1.1 … 0.9 3.1 … 0.7 1 … Latent user matrix Latent item matrix Q R ≒ T × □ □ □ Original (raw)
  20. Prediction using matrix factorization 20 3 … 3 5 2

    … 1 … Predicted rating matrix = R* = P 1.3 … 0.7 0.2 … 4.9 … × 0.2 1.1 … 0.9 3.1 … 0.7 1 … Latent user matrix Latent item matrix Q R = T × If we obtain latent user/item matrix, we can predict unknown scores by multiplying the two latent matrix How to obtain latent user/item matrix? * 2 5 4 □ □ □
  21. SVD for recommender systems 21 SVD: singular value decomposition -

    A famous linear algebra technique for matrix decomposition - It is often used for dimensionality reduction - SVD delivers essentially the same result as PCA does U Σ X = T × V × m x n matrix m x m unitary matrix n x n unitary matrix Rectangular diagonal matrix (diagonal values are called as singular values)
  22. Example of SVD 22 U Σ X = T ×

    V × 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V=
  23. Important features of SVD (1/4) 23 We can approximate a

    given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V= Largest singular values
  24. Important features of SVD (2/4) 24 -0.369 -0.325 0.282 0.343

    0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V= Ignore unimportant values! We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
  25. 13.368 0 0 0 0 0 4.708 0 0 0

    0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.90 Important features of SVD (3/4) 25 Σ2 = -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 U2 = -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 V2 = We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
  26. Important features of SVD (4/4) 26 U2 Σ2 T ×

    V2 × = 1.08 2.24 3.29 2.98 0.84 2.72 4.17 1.52 2.41 3.04 1.46 2.93 4.02 3.71 1.19 3.24 5.03 2.04 3.04 3.59 0.57 1.64 3.86 3.18 0.15 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 ≈ = X We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
  27. Apply SVD for recommender systems (1/2) 27 Item1 Item2 Item3

    Item4 Item5 Alice 1 3 3 3 ? User1 2 4 2 2 4 User2 1 3 3 5 1 User3 4 5 2 3 3 User4 1 1 5 2 1 Regarded as zero 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 = X Convert to matrix Step 1
  28. Apply SVD for recommender systems (2/2) 28 U2 Σ2 T

    V2 U Σ X = T × V × Focus on important features × × Step 2 Step 3 Run SVD Multiply three matrix Step 4 = 1.08 2.24 3.29 2.98 0.84 2.72 4.17 1.52 2.41 3.04 1.46 2.93 4.02 3.71 1.19 3.24 5.03 2.04 3.04 3.59 0.57 1.64 3.86 3.18 0.15 Check the values which were zero before running SVD Step 5
  29. Problems on SVD 29 Predicted values are often negative -

    SVD does not take rating score range into account Zero replacement decreases prediction quality - SVD analyzes the relation between all data in matrix - The meaning of “zero” is different from that of “unknown”
  30. Bad example of SVD-based recommendation 30 Music 1 Music 2

    Music 3 Music 4 User1 5 User2 3 4 User3 2 1 User4 5 4 User5 5 5 0 0 0 3 4 0 0 2 0 1 0 0 5 0 4 0 0 0 5 Example from: http://smrmkt.hatenablog.jp/entry/2014/08/23/211555 U2 , Σ2 , V2 Apply SVD Zero replacement Multiply matrix 3.53 1.88 0.16 -0.26 3.62 4.04 0.13 2.17 1.44 0.76 0.07 -0.12 2.76 5.41 0.06 4.34 1.67 5.97 0 5.74 -0.26 2.91 -0.06 3.52
  31. Netflix Prize (2006-2010) 31 Image ref: http://blogs.itmedia.co.jp/saito/2009/09/httpjournalmyco.html Netflix held an

    open competition to advance collaborative filtering algorithms and to seek the best algorithm.
  32. Simon Funk’s Matrix Factorization (2006) 32 Without using SVD (with

    zero replacement), the Simon’s method learns matrix P and Q by using only observed values in R 𝑚𝑖𝑛(,) 2 &,# ∈* 𝑟&,# − 𝒑& 𝒒# + , + 𝜆( 𝒑& , + 𝒒# ,) Target optimization function m users n items Rating matrix R ≈ × User Item P Q u: user u; i: item i; pu : u’s latent vector; qi : i’s latent vector
  33. Various approaches have been developed … 33 Ref: https://www.slideshare.net/databricks/deep-learning-for-recommender-systems-with-nick-pentreath 2003

    2006-2009 2010 2013 Scalable models Amazon’s item-based CF Netflix Prize The rise of matrix factorization like Simon Funk’s method Factorization machine Generalized matrix factorization for dealing with various factors Deep Learning ・Deep Factorization machine ・Content2Vec to get content embeddings
  34. 3 34 Challenges for recommender systems

  35. Remaining challenges 35 Cold start problem How to recommend new

    items? What to recommend to new users? Serendipity - Only recommendation of favorable items cannot expand user interests - How to recommend unexpected and surprising items? Explanation of recommendation reasons - Recommendation algorithms just decide what to recommend - How can systems persuade users to purchase recommended items? Reviewer’s trust How to remove spam reviewers? How to find high-quality reviewers?
  36. Remaining challenges 36 Cold start problem How to recommend new

    items? What to recommend to new users? Serendipity - Only recommendation of favorable items cannot expand user interests - How to recommend unexpected and surprising items? Providing explanations - Recommendation algorithms just decide what to recommend - How can systems persuade users to purchase recommended items? Reviewer’s trust How to remove spam reviewers? How to find high-quality reviewers?
  37. Cold start problem 37 Item1 Item2 Item3 Item4 item5 Kate

    User1 3 1 2 3 User2 4 3 4 3 User3 3 3 1 5 User4 1 5 5 2 New user New item CF approaches don’t work for new items/users New items/users have no clues to predict unknown scores b/c the CF cannot find neighbor users/items
  38. Possible solutions for cold start problem 38 On-the-fly preference prediction

    Systems ask/force users to rate several items to understand their preference Content-based recommendation Finds similar items based on content features ExtVision: Augmentation of Visual Experiences with Generation of Context Images for Peripheral Vision Using Deep Neural Network ABSTRACT We propose a system, called ExtVision, to augment visual experiences by generating and projecting context-images onto the periphery of the television or computer screen. A peripheral projection of the context-image is one of the most effective techniques to enhance visual experiences. However, the projection is not commonly used at present, because of the difficulty in preparing the context-image. In this paper, we propose a deep neural network-based method to generate context-images for peripheral projection. A user study was performed to investigate the manner in which the proposed system augments traditional visual experiences. In addition, we present applications and future prospects of the developed system. Author Keywords Spatially augmented reality; immersion; augmented video; large field of view; human vision; video extrapolation ACM Classification Keywords H.5.1. Multimedia Information Systems: Artificial, aug- mented, and virtual realities INTRODUCTION Presentation of images to a viewer’s peripheral vision makes the experience of watching a video more immersive and exciting. In this paper, we describe how images are displayed as context-images in peripheral areas to enhance visual experiences. Illumiroom [10] projects context-images or visual effects around the TV screen and enhances the traditional gaming experience. ScreenX [20] and BarcoEscape [21] provide sub-screens to display video on both sides of the main screen, providing a more immersive movie experience. However, peripheral presentation systems are not very popular at present, because it is difficult to prepare the context- images. Researchers have attempted the boundary value shooting method and computer vision processing to address this, but a self-sufficient method still remains to be developed. In recent years, deep neural network (DNN)-based methods have shown considerable success in image completion. The images generated by DNN-based methods are very plausible but somewhat inaccurate. Human peripheral vision is lower in resolution than the central visual field and is therefore weak in distinguishing visual accuracy; therefore, DNN- based methods are suitable for generating context-images that meet the requirements of peripheral vision. In this study, we investigated context-image generation using DNN. The contributions of this study are as follows: 1. Implementation of two new DNN-based methods to generate context-images Naoki Kimura University of Tokyo Tokyo, Japan kimura-naoki@g.ecc.u-tokyo.ac.jp Jun Rekimoto University of Tokyo / Sony CSL Tokyo, Japan rekimoto@acm.org Figure 1. ExtVision augments visual experiences. These images are examples of our application. The context-images are generated and projected onto the peripheral area around the TV by our system. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. CHI 2018, April 21–26, 2018, Montreal, QC, Canada © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-5620-6/18/04…$15.00 https://doi.org/10.1145/3173574.3174001 CHI 2018 Honourable Mention CHI 2018, April 21–26, 2018, Montréal, QC, Canada Paper 427 Page 1 Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes Srijan Kumar∗ University of Maryland srijan@cs.umd.edu Robert West Stanford University west@cs.stanford.edu Jure Leskovec Stanford University jure@cs.stanford.edu ABSTRACT Wikipedia is a major source of information for many people. How- ever, false information on Wikipedia raises concerns about its cred- ibility. One way in which false information may be presented on Wikipedia is in the form of hoax articles, i.e., articles containing fabricated facts about nonexistent entities or events. In this paper we study false information on Wikipedia by focusing on the hoax articles that have been created throughout its history. We make several contributions. First, we assess the real-world impact of hoax articles by measuring how long they survive before being de- bunked, how many pageviews they receive, and how heavily they are referred to by documents on the Web. We find that, while most hoaxes are detected quickly and have little impact on Wikipedia, a small number of hoaxes survive long and are well cited across the Web. Second, we characterize the nature of successful hoaxes by comparing them to legitimate articles and to failed hoaxes that were discovered shortly after being created. We find characteristic differences in terms of article structure and content, embeddedness into the rest of Wikipedia, and features of the editor who created the hoax. Third, we successfully apply our findings to address a series of classification tasks, most notably to determine whether a given article is a hoax. And finally, we describe and evaluate a task involving humans distinguishing hoaxes from non-hoaxes. We find that humans are not good at solving this task and that our automated classifier outperforms them by a big margin. 1. INTRODUCTION The Web is a space for all, where, in principle, everybody can read, and everybody can publish and share, information. Thus, knowledge can be transmitted at a speed and breadth unprecedented in human history, which has had tremendous positive effects on the lives of billions of people. But there is also a dark side to the un- reigned proliferation of information over the Web: it has become a breeding ground for false information [6, 7, 12, 15, 19, 43]. The reasons for communicating false information vary widely: on the one extreme, misinformation is conveyed in the honest but ∗Research done partly during a visit at Stanford University. Copyright is held by the International World Wide Web Conference Com- mittee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author’s site if the Material is used in electronic media. WWW 2016, April 11–15, 2016, Montréal, Québec, Canada. ACM 978-1-4503-4143-1/16/04. http://dx.doi.org/10.1145/2872427.2883085. mistaken belief that the relayed incorrect facts are true; on the other extreme, disinformation denotes false facts that are conceived in order to deliberately deceive or betray an audience [11, 17]. A third class of false information has been called bullshit, where the agent’s primary purpose is not to mislead an audience into believing false facts, but rather to “convey a certain impression of himself” [14]. All these types of false information are abundant on the Web, and regardless of whether a fact is fabricated or misrepresented on pur- pose or not, the effects it has on people’s lives may be detrimental and even fatal, as in the case of medical lies [16, 20, 22, 30]. Hoaxes. This paper focuses on a specific kind of disinformation, namely hoaxes. Wikipedia defines a hoax as “a deliberately fabri- cated falsehood made to masquerade as truth.” The Oxford English Dictionary adds another aspect by defining a hoax as “a humorous or mischievous deception” (italics ours). We study hoaxes in the context of Wikipedia, for which there are two good reasons: first, anyone can insert information into Wiki- pedia by creating and editing articles; and second, as the world’s largest encyclopedia and one of the most visited sites on the Web, Wikipedia is a major source of information for many people. In other words: Wikipedia has the potential to both attract and spread false information in general, and hoaxes in particular. The impact of some Wikipedia hoaxes has been considerable, and anecdotes are aplenty. The hoax article about a fake language called “Balboa Creole French”, supposed to be spoken on Balboa Island in California, is reported to have resulted in “people com- ing to [. . . ] Balboa Island to study this imaginary language” [38]. Some hoaxes have made it into books, as in the case of the al- leged (but fake) Aboriginal Australian god “Jar’Edo Wens”, who inspired a character’s name in a science fiction book [10] and has been listed as a real god in at least one nonfiction book [24], all before it came to light in March 2015 that the article was a hoax. Another hoax (“Bicholim conflict”) was so elaborate that it was of- ficially awarded “good article” status and maintained it for half a decade, before finally being debunked in 2012 [27]. The list of extreme cases could be continued, and the popular press has covered such incidents widely. What is less available, however, is a more general understanding of Wikipedia hoaxes that goes beyond such cherry-picked examples. Our contributions: impact, characteristics, and detection of Wikipedia hoaxes. This paper takes a broad perspective by start- ing from the set of all hoax articles ever created on Wikipedia and illuminating them from several angles. We study over 20,000 hoax articles, identified by the fact that they were explicitly flagged as potential hoaxes by a Wikipedia editor at some point and deleted after a discussion among editors who concluded that the article was sciencemag.org SCIENCE ILLUSTRATION: SÉBASTIEN THIBAULT By David M. J. Lazer, Matthew A. Baum, Yochai Benkler, Adam J. Berinsky, Kelly M. Greenhill, Filippo Menczer, Miriam J. Metzger, Brendan Nyhan, Gordon Pennycook, David Rothschild, Michael Schudson, Steven A. Sloman, Cass R. Sunstein, Emily A. Thorson, Duncan J. Watts, Jonathan L. Zittrain The rise of fake news highlights the erosion of long-standing institutional bulwarks against misinformation in the internet age. Concern over the problem is global. However, much remains unknown regarding the vul- nerabilities of individuals, institutions, and society to manipulations by malicious actors. A new system of safeguards is needed. Below, we discuss extant social and computer sci- ence research regarding belief in fake news and the mechanisms by which it spreads. Fake news has a long history, but we focus on unanswered scientific questions raised by the proliferation of its most recent, politically oriented incarnation. Beyond selected refer- ences in the text, suggested further reading can be found in the supplementary materials. WHAT IS FAKE NEWS? We define “fake news” to be fabricated in- formation that mimics news media content in form but not in organizational process or intent. Fake-news outlets, in turn, lack the news media’s editorial norms and processes for ensuring the accuracy and credibility of information. Fake news overlaps with other information disorders, such as misinforma- tion (false or misleading information) and disinformation (false information that is pur- posely spread to deceive people). Fake news has primarily drawn recent at- tention in a political context but it also has been documented in information promul- gated about topics such as vaccination, nu- trition, and stock values. It is particularly pernicious in that it is parasitic on standard news outlets, simultaneously benefiting from and undermining their credibility. Some—notably First Draft and Facebook— favor the term “false news” because of the use of fake news as a political weapon (1). We have retained it because of its value as a scientific construct, and because its politi- cal salience draws attention to an impor- tant subject. THE HISTORICAL SETTING Journalistic norms of objectivity and bal- ance arose as a backlash among journalists against the widespread use of propaganda in World War I (particularly their own role in propagating it) and the rise of corporate public relations in the 1920s. Local and na- tional oligopolies created by the dominant 20th century technologies of information distribution (print and broadcast) sustained these norms. The internet has lowered the cost of entry to new competitors—many of which have rejected those norms—and un- dermined the business models of traditional news sources that had enjoyed high levels of public trust and credibility. General trust in the mass media collapsed to historic lows in 2016, especially on the political right, with SOCIAL S CIENCE The science of fake news Addressing fake news requires a multidisciplinary effort INSIGHTS POLICY FORUM The list of author affiliations is provided in the supplementary materials. Email: d.lazer@northeastern.edu 1094 9 MARCH 2018 • VOL 359 ISSUE 6380 DA_0309PolicyForum.indd 1094 3/7/18 12:16 PM Published by AAAS on March 13, 2018 http://science.sciencemag.org/ Downloaded from Paper list Click to view !! Textually similar (on TF-IDF, D2V etc.) sciencemag.org SCIENCE ILLUSTRATION: SÉBASTIEN THIBAULT By David M. J. Lazer, Matthew A. Baum, Yochai Benkler, Adam J. Berinsky, Kelly M. Greenhill, Filippo Menczer, Miriam J. Metzger, Brendan Nyhan, Gordon Pennycook, David Rothschild, Michael Schudson, Steven A. Sloman, Cass R. Sunstein, Emily A. Thorson, Duncan J. Watts, Jonathan L. Zittrain The rise of fake news highlights the erosion of long-standing institutional bulwarks against misinformation in the internet age. Concern over the problem is global. However, much remains unknown regarding the vul- nerabilities of individuals, institutions, and society to manipulations by malicious actors. A new system of safeguards is needed. Below, we discuss extant social and computer sci- ence research regarding belief in fake news and the mechanisms by which it spreads. Fake news has a long history, but we focus on unanswered scientific questions raised by the proliferation of its most recent, politically oriented incarnation. Beyond selected refer- ences in the text, suggested further reading can be found in the supplementary materials. WHAT IS FAKE NEWS? We define “fake news” to be fabricated in- formation that mimics news media content in form but not in organizational process or intent. Fake-news outlets, in turn, lack the news media’s editorial norms and processes for ensuring the accuracy and credibility of information. Fake news overlaps with other information disorders, such as misinforma- tion (false or misleading information) and disinformation (false information that is pur- posely spread to deceive people). Fake news has primarily drawn recent at- tention in a political context but it also has been documented in information promul- gated about topics such as vaccination, nu- trition, and stock values. It is particularly pernicious in that it is parasitic on standard news outlets, simultaneously benefiting from and undermining their credibility. Some—notably First Draft and Facebook— favor the term “false news” because of the use of fake news as a political weapon (1). We have retained it because of its value as a scientific construct, and because its politi- cal salience draws attention to an impor- tant subject. THE HISTORICAL SETTING Journalistic norms of objectivity and bal- ance arose as a backlash among journalists against the widespread use of propaganda in World War I (particularly their own role in propagating it) and the rise of corporate public relations in the 1920s. Local and na- tional oligopolies created by the dominant 20th century technologies of information distribution (print and broadcast) sustained these norms. The internet has lowered the cost of entry to new competitors—many of which have rejected those norms—and un- dermined the business models of traditional news sources that had enjoyed high levels of public trust and credibility. General trust in the mass media collapsed to historic lows in 2016, especially on the political right, with SOCIAL S CIENCE The science of fake news Addressing fake news requires a multidisciplinary effort INSIGHTS POLICY FORUM The list of author affiliations is provided in the supplementary materials. Email: d.lazer@northeastern.edu 1094 9 MARCH 2018 • VOL 359 ISSUE 6380 DA_0309PolicyForum.indd 1094 3/7/18 12:16 PM Published by AAAS on March 13, 2018 http://science.sciencemag.org/ Downloaded from Similar items are recommended
  39. Reviewer Trust 39 T. Wang, D. Wang. (2014) “Why Amazon’s

    Ratings Might Mislead You: The Story of Herding Effects”, Journal of Big Data, Vol.2, No.4, pp.196-204. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 P−Value Cumulative Dis 0 50 100 150 200 250 2 3 4 5 Sequence Number of Rating Mean Rating Books Electronics Movies & TV Music −0 −0 −0 0 0 Pearson’s Correlation Coefficient 0.05 C FIG. 1. (A) Cumulative distribution of p-values of Augment Dickey–Fuller test Fig: Average rating scores on Amazon.com How to find trustworthy reviewers (rating experts)? - People often give good scores to items - Some reviewers intentionally give too high/low scores to items (spammers)
  40. 3 Programming Work 40

  41. 41 Visit the following URL: https://recsys2021.hontolab.org/

  42. 42 Click this link to see assignments