Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Measuring Quality Content
Search
Adam Hyland
August 04, 2012
Research
2
68
Measuring Quality Content
Presentation to Wikimania 2012 on Article Feedback Tool statistics.
Adam Hyland
August 04, 2012
Tweet
Share
More Decks by Adam Hyland
See All by Adam Hyland
Here Comes (a significant fraction of) Everybody
protonk
0
73
Boston Data Swap: Data Vis Under Uncertainty
protonk
0
51
Why Nate Silver is Famous
protonk
1
110
Data Visualization under Uncertainty
protonk
0
740
Phillips Academy Wikipedia Introduction
protonk
0
75
Other Decks in Research
See All in Research
Target trial emulationの概要
shuntaros
2
1.1k
株式会社リクルートホールディングス 企業分析
frandle256
0
130
生成AIを用いたText to SQLの最前線
masatoto
1
2.3k
Source Code Diff Revolution (JetBrains Open Reading Club)
tsantalis
0
260
Trezor Safe 3 ファーストインプレッション
toshihr
0
190
「EBPMエコシステム」の可能性
daimoriwaki
0
200
デフスポーツにおける支援技術 〜競技特性・ルールと技術との関係〜
slab
0
210
僕たちがグラフニューラルネットワークを学ぶ理由
joisino
7
890
SSII2023 医療支援における画像処理研究の動向と展望
moda0
0
110
Equivalence of Geodesics and Importance Weighting from the Perspective of Information Geometry
mkimura
0
140
第4回ナレッジグラフ勉強会:ISWC2023論文読み会
kg_wakate
1
210
NeurIPS-23 参加報告 + DPO 解説
akifumi_wachi
4
1.5k
Featured
See All Featured
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
34
8.9k
Building Flexible Design Systems
yeseniaperezcruz
319
37k
The Illustrated Children's Guide to Kubernetes
chrisshort
31
46k
Robots, Beer and Maslow
schacon
PRO
155
7.9k
The MySQL Ecosystem @ GitHub 2015
samlambert
243
12k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
7
1k
4 Signs Your Business is Dying
shpigford
175
21k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
116
18k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
14
1.5k
The Mythical Team-Month
searls
216
42k
The World Runs on Bad Software
bkeepers
PRO
61
6.7k
Into the Great Unknown - MozCon
thekraken
10
990
Transcript
Measuring Article Quality Peer Review and the Article Feedback Tool
Adam Hyland protonk @ en-wp
Look Familiar?
Maybe This Version?
None
Article Feedback Tool • Deployed in 2010 • Version 4
(the current version) ramped up in 2011 • Designed to offer an avenue for reader feedback • High volume of reader feedback
• 6 months of public data • 795,353 articles --
2,487,522 responses
Featured Articles (FA) • 3,599 articles (0.09% of all articles)
• 2,267 Featured Lists (FL) • Most rigorous peer review process on the English Wikipedia • Very sensitive to editor preferences • Some idiosyncrasies
Good Articles (GA) • 15,357 articles • Relatively rigorous peer
review (yes I know reasonable minds may disagree) • Less idiosyncratic than FA in some ways • Perhaps less dependent on editor preference
Data • Article name • Length (in bytes) • GA/FA
status (including former/not- promoted) • Some user data
None
Beyond Summaries • Reader ratings follow pageviews • Predominantly non-editors
• Popular articles: • Call of Duty • Justin Bieber • Jimmy Wales (avg. rating: 1.10585)
Power Laws Everywhere!
Classical(ish) Models • Logistic regression model supports a relationship between
rating and likelihood of FA/GA • Linear model does, but with a twist • Can’t escape Cambridge Endogeneity Police!
None
Data Mining • Predicting featured status from reader ratings and
minimal meta-data. • Bayesian classifier able to roughly predict featured status (with a high false positive rate)
But the system’s changing! • AFT v4 is a multi-category
quantitative measure • AFT v5 is, roughly, YES/NO • Is this a problem? • Frank Harrell and the perils of dichotomization.
Actual Reader Ratings
Another Look
For the skeptics
Information • We can imagine we might not lose information
in shifting to v5 • This is born out by the classifier, to some degree. • We don’t lose a lot of power when dichotomizing individual ratings
A Look Ahead • Really exciting! • Great compliment to
current research methods • Long exposures can help discover reader/editor divergence • Predictive analytics • Need more open data
Questions? • Of course you have questions! • All work
is or soon will be available on github under a free license • Full writeup on en-wp forthcoming