Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Measuring Quality Content
Search
Adam Hyland
August 04, 2012
Research
2
80
Measuring Quality Content
Presentation to Wikimania 2012 on Article Feedback Tool statistics.
Adam Hyland
August 04, 2012
Tweet
Share
More Decks by Adam Hyland
See All by Adam Hyland
Here Comes (a significant fraction of) Everybody
protonk
0
78
Boston Data Swap: Data Vis Under Uncertainty
protonk
0
57
Why Nate Silver is Famous
protonk
1
130
Data Visualization under Uncertainty
protonk
0
770
Phillips Academy Wikipedia Introduction
protonk
0
92
Other Decks in Research
See All in Research
OWASP KansaiDAY 2025.09_文系OSINTハンズオン
owaspkansai
0
110
【SIGGRAPH Asia 2025】Lo-Fi Photograph with Lo-Fi Communication
toremolo72
0
110
学習型データ構造:機械学習を内包する新しいデータ構造の設計と解析
matsui_528
6
3.1k
When Learned Data Structures Meet Computer Vision
matsui_528
1
2.8k
CoRL2025速報
rpc
4
4.1k
その推薦システムの評価指標、ユーザーの感覚とズレてるかも
kuri8ive
1
310
世界モデルにおける分布外データ対応の方法論
koukyo1994
6
1.2k
Satellites Reveal Mobility: A Commuting Origin-destination Flow Generator for Global Cities
satai
3
500
Upgrading Multi-Agent Pathfinding for the Real World
kei18
0
180
AIスパコン「さくらONE」の オブザーバビリティ / Observability for AI Supercomputer SAKURAONE
yuukit
2
1.2k
都市交通マスタープランとその後への期待@熊本商工会議所・熊本経済同友会
trafficbrain
0
120
Community Driveプロジェクト(CDPJ)の中間報告
smartfukushilab1
0
160
Featured
See All Featured
Rebuilding a faster, lazier Slack
samanthasiow
85
9.4k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
From π to Pie charts
rasagy
0
120
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
49
AI: The stuff that nobody shows you
jnunemaker
PRO
2
250
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1k
Odyssey Design
rkendrick25
PRO
1
490
Skip the Path - Find Your Career Trail
mkilby
0
54
Code Reviewing Like a Champion
maltzj
527
40k
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
110
Optimizing for Happiness
mojombo
379
71k
Mind Mapping
helmedeiros
PRO
0
80
Transcript
Measuring Article Quality Peer Review and the Article Feedback Tool
Adam Hyland protonk @ en-wp
Look Familiar?
Maybe This Version?
None
Article Feedback Tool • Deployed in 2010 • Version 4
(the current version) ramped up in 2011 • Designed to offer an avenue for reader feedback • High volume of reader feedback
• 6 months of public data • 795,353 articles --
2,487,522 responses
Featured Articles (FA) • 3,599 articles (0.09% of all articles)
• 2,267 Featured Lists (FL) • Most rigorous peer review process on the English Wikipedia • Very sensitive to editor preferences • Some idiosyncrasies
Good Articles (GA) • 15,357 articles • Relatively rigorous peer
review (yes I know reasonable minds may disagree) • Less idiosyncratic than FA in some ways • Perhaps less dependent on editor preference
Data • Article name • Length (in bytes) • GA/FA
status (including former/not- promoted) • Some user data
None
Beyond Summaries • Reader ratings follow pageviews • Predominantly non-editors
• Popular articles: • Call of Duty • Justin Bieber • Jimmy Wales (avg. rating: 1.10585)
Power Laws Everywhere!
Classical(ish) Models • Logistic regression model supports a relationship between
rating and likelihood of FA/GA • Linear model does, but with a twist • Can’t escape Cambridge Endogeneity Police!
None
Data Mining • Predicting featured status from reader ratings and
minimal meta-data. • Bayesian classifier able to roughly predict featured status (with a high false positive rate)
But the system’s changing! • AFT v4 is a multi-category
quantitative measure • AFT v5 is, roughly, YES/NO • Is this a problem? • Frank Harrell and the perils of dichotomization.
Actual Reader Ratings
Another Look
For the skeptics
Information • We can imagine we might not lose information
in shifting to v5 • This is born out by the classifier, to some degree. • We don’t lose a lot of power when dichotomizing individual ratings
A Look Ahead • Really exciting! • Great compliment to
current research methods • Long exposures can help discover reader/editor divergence • Predictive analytics • Need more open data
Questions? • Of course you have questions! • All work
is or soon will be available on github under a free license • Full writeup on en-wp forthcoming