WISE2019 発表資料

Highlighting Weasel Sentences for Promoting Critical Information Seeking on the
Web Fumiaki Saito1 , Yoshiyuki Shoji2 ,Yusuke Yamamoto1 1:Shizuoka university, Japan 2: Aoyama Gakuin ,Japan January 19,2020 1 WISE 2019: Session S7

Background: Web information is not always correct 2 The number
of medical Web sites authorized by medical experts: < 50%* * E. Sillence et al., “Trust and Mistrust of Online Health Sites”, ACM CHI, pp.663-670, 2004

Possible approach in information science 3 Obtaining correct information (Semi-)
automatic analysis on information credibility

Examples of credibility analysis systems *2 Y. Yamamoto and K.
Tanaka. Enhancing Credibility Judgment of Web Search Results. In Proceedings of the 29th ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2011), pages 1235–1244, 2011. *1 Yin, X., Han, J., & Philip, S. Y. (2008). Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on Knowledge and Data Engineering, 20(6), 796-808. TruthFinder*1 Scores the consistency of fact describing objects CowSearch*2 Provides supporting information for credibility judgment The analysis does not guarantee the correctness of information Limitation 4

Possible approach in information science 5 Obtaining correct information (Semi-)
automatic analysis on information credibility Careful examination on information by users

57% Many people are not aware of Web information credibility!
82% # of young Japanese people who trust Web information *1 *1 Adobe Inc., “The State of Content : Rules of Engagement”, 2015 *2 S.Nakamura et al., “Trustworthiness analysis of Web search results (ECDL 2007) Never mind, never mind. # of people who trust information on SERP *2 I trust Google 6

Why don’t people often take care of Web information credibility
lTrust in search engine’s ranking1 lWrong metrics for quality judgment (e.g. appearance of websites2) lCognitive bias3 7 1:Kahneman, D.: Thinking, fast and slow, Macmillan (2011) 2:Pan, B., Hembrooke, H., Joachims, et al In Google We Trust: Users’ Decisions on Rank, Position, and Relevance 3:Fogg, B. J, Soohoo, Cathy ,Danielson, David R, et al How Do Users Evaluate the Credibility of Web Sites? A Study with over 2,500 Participants ♥

Research question How can we promote critical information seeking when
users do Web browsing? 8

Proposed system 9 Automatically detects and highlights ambiguous sentences on
the Web

Proposed system 10 Automatically detects and highlights ambiguous sentences on
the Web They lack in evidences but seem to have no problems in their claims Ambiguous sentences → In this study

Proposed system to highlight ambiguous sentences 11 Makes users aware
of information credibility and promotes careful information seeking !!

12 Ambiguous sentence classifier 1.

How to detect ambiguous sentences? Q. 13 We focus on
weasel expressions A.

What are weasel expressions ? 14 l“Certain person concerned is
... ” l“Research has shown...” 14 l“ It is often said ... ” l“ It is widely thought ... ” Who said that? What is the truth? Weasel Expressions create an impression that something specific and meaningful has been said, although their claim is ambiguous and lacks in evidence.

Classification of weasel sentences We built a simple weasel sentence
classiﬁer with Wikipedia data 15 Wikipedia Ambiguous sentences Non-ambiguous ones Training/test set ML (SVM)

Training/test dataset to classify weasel sentences (1/2) We focused on
Wikipedia editting rules. Some weasel expressions are annotated with the special tags in Wikipedia. Ex: Who said？ 16 It is said that Company A has said this hoax. [By whom?]

Weasel expressions in Wikipedia l“The hoax is said to have
been company A, a media and advertising company that wanted to show off their influence.[by whom?] ” l“ ~ is an established theory, but there are some objections. [who?]” 17 Image reference https://en.wikipedia.org/wiki/Wikipedia_logo

Training/test dataset to classify weasel sentences (2/2) Positive examples (2236
senteneces) Sentences annotated with [who?][by whom?] on Wikipedia 18 Negative examples(2236 senteneces) Sentences without [who?][by whom?] tags in Wikipedia articles where weasel expressions appear.

Features for classification lBag-of-Words :nouns, verbs, and adjectives lTypical example
of ambiguous expression listed in Wikipedia (27 phrases) – "It is said that ..." – "An expert argues that ..." and so on 19

Classifier performance Performance evaluation with 5-fold cross-validation 20 Accuracy Precision
Recall F1 SVM(RBF) 0.764 0.772 0.748 0.760

Improvements for better classification (1/2) 21 We need to get
more data from editing histories on Wikipedia Frequency of weasel annotation varies depending on topics on Wikipedia Topic categories of positive examples may be biased Improvements in dataset

Improvements for better classiﬁcation (2/2) 22 Improvements in classification features
Because bag-of-words is simple as a feature, features that consider surrounding sentences and word order are required.

23 Effect of our prototype on user behaviors 2.

How does our prototype affect users behaviors? 24 ? 8IBUEPFTPVSQSPUPUZQFBGGFDUVTFSˏTCFIBWJPST
24 Can our prototype encourage users to browse webpages more carefully? If so, how do user behaviors change? Q.

Hypothesis lH1 The proposed system extends the time expended in
web information seeking. lH2 The proposed system increases the number of visited webpages. lH3 The proposed system improves the users’ confidence in their decisions made through web information seeking. lH4 The above-mentioned effects vary with users’ familiarity with the search topics. 25

User study 26 Participants were asked to search for webpages
to answer medical questions. • Fixed search results: Participants sought 100 ﬁxed search results • 4 search tasks: We prepared 4 search tasks about medical topics. “Is cinnamon effective for diabetes? Report your answer by Web search ”

Participant The participants were recruited by crowdsourcing and randomly assigned
to two groups by UI condition lProposed group（105 participants ） – The system highlighted the weasel sentences while viewing the webpages – Authors manually decided which sentence should be highlighted as weasel sentences – Participants learnt what highlighted sentences meant before the task starts lControlled group（83 participants ） –Highlighting function was disabled 27

Focused behaviors Measuring behavior on the web –Session time –SERP
dwelling time –Average dwelling time on one webpage –Pageview (the number of visited webpages) 28

Questionnaire lPre-questionnaire（6 levels） – Degree of prior knowledge about search
topic – Expected answer – Confidence about the answer lPost questionnaire（6 levels） – Answers for search task – Confidence about the answers 29

Analysis method lWe use Bayesian GLMM (generalized linear mixture model)
to model user behaviors lFixed eﬀects – UI condition – Topic familiarity – Interaction between UI condition & topic familiarity lRandom eﬀects – User – Search topic 30

Session time 31 Participants with our prototype spent a longer
time in search tasks

Session time If participants were more familiar with search topic,
there was greater difference in session time due to UI conditions 32

Pageview Participants with our prototype viewed more webpages in search
tasks 33

Pageview The higher topic familiarity participants had, the more webpages
they viewed 34

Change of confidence 35 The interaction between UI and topic
familiarity affected confidence change. If participants were not familiar with topics, our prototype changed prior confidence more significantly. If participants were familiar with topics, our prototype changed prior confidence less significantly.

Discussion lThe user study suggests that the proposed system can
enhance user engagement in careful information seeking on the web – The system increased pageview and session time. lWe need to investigate how user feel and use highlighted sentences – we didn’t understand why participants with our prototype viewed more webpages and spent longer time in search session 36

Conclusion lDesigning a system to highlight weasel sentences – The
classification performance of weasel sentences were generally good performance but needs more improvement lUser study to examine the system effect – The proposed system can enhance user engagement in critical information seeking on the web. lFuture works – Improvement of weasel classifier – Investigating weasel sentences on the Web 37

WISE2019 発表資料

WISE2019 発表資料

Other Decks in Research

Featured

Transcript