Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Measuring Quality Content
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Adam Hyland
August 04, 2012
Research
2
81
Measuring Quality Content
Presentation to Wikimania 2012 on Article Feedback Tool statistics.
Adam Hyland
August 04, 2012
Tweet
Share
More Decks by Adam Hyland
See All by Adam Hyland
Here Comes (a significant fraction of) Everybody
protonk
0
78
Boston Data Swap: Data Vis Under Uncertainty
protonk
0
57
Why Nate Silver is Famous
protonk
1
130
Data Visualization under Uncertainty
protonk
0
770
Phillips Academy Wikipedia Introduction
protonk
0
93
Other Decks in Research
See All in Research
ローテーション別のサイドアウト戦略 ~なぜあのローテは回らないのか?~
vball_panda
0
300
討議:RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
0
650
FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing
satai
3
260
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
satai
3
670
From Data Meshes to Data Spaces
posedio
PRO
0
480
言語モデルから言語について語る際に押さえておきたいこと
eumesy
PRO
5
1.8k
Thirty Years of Progress in Speech Synthesis: A Personal Perspective on the Past, Present, and Future
ktokuda
0
190
「車1割削減、渋滞半減、公共交通2倍」を 熊本から岡山へ@RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
1
810
離散凸解析に基づく予測付き離散最適化手法 (IBIS '25)
taihei_oki
1
730
姫路市 -都市OSの「再実装」-
hopin
0
1.7k
第二言語習得研究における 明示的・暗示的知識の再検討:この分類は何に役に立つか,何に役に立たないか
tam07pb915
0
2.1k
視覚から身体性を持つAIへ: 巧緻な動作の3次元理解
tkhkaeio
1
220
Featured
See All Featured
Java REST API Framework Comparison - PWX 2021
mraible
34
9.2k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.3k
WENDY [Excerpt]
tessaabrams
9
37k
Testing 201, or: Great Expectations
jmmastey
46
8.1k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.4k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
650
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.5k
Building Applications with DynamoDB
mza
96
7k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
2.5k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Code Review Best Practice
trishagee
74
20k
Transcript
Measuring Article Quality Peer Review and the Article Feedback Tool
Adam Hyland protonk @ en-wp
Look Familiar?
Maybe This Version?
None
Article Feedback Tool • Deployed in 2010 • Version 4
(the current version) ramped up in 2011 • Designed to offer an avenue for reader feedback • High volume of reader feedback
• 6 months of public data • 795,353 articles --
2,487,522 responses
Featured Articles (FA) • 3,599 articles (0.09% of all articles)
• 2,267 Featured Lists (FL) • Most rigorous peer review process on the English Wikipedia • Very sensitive to editor preferences • Some idiosyncrasies
Good Articles (GA) • 15,357 articles • Relatively rigorous peer
review (yes I know reasonable minds may disagree) • Less idiosyncratic than FA in some ways • Perhaps less dependent on editor preference
Data • Article name • Length (in bytes) • GA/FA
status (including former/not- promoted) • Some user data
None
Beyond Summaries • Reader ratings follow pageviews • Predominantly non-editors
• Popular articles: • Call of Duty • Justin Bieber • Jimmy Wales (avg. rating: 1.10585)
Power Laws Everywhere!
Classical(ish) Models • Logistic regression model supports a relationship between
rating and likelihood of FA/GA • Linear model does, but with a twist • Can’t escape Cambridge Endogeneity Police!
None
Data Mining • Predicting featured status from reader ratings and
minimal meta-data. • Bayesian classifier able to roughly predict featured status (with a high false positive rate)
But the system’s changing! • AFT v4 is a multi-category
quantitative measure • AFT v5 is, roughly, YES/NO • Is this a problem? • Frank Harrell and the perils of dichotomization.
Actual Reader Ratings
Another Look
For the skeptics
Information • We can imagine we might not lose information
in shifting to v5 • This is born out by the classifier, to some degree. • We don’t lose a lot of power when dichotomizing individual ratings
A Look Ahead • Really exciting! • Great compliment to
current research methods • Long exposures can help discover reader/editor divergence • Predictive analytics • Need more open data
Questions? • Of course you have questions! • All work
is or soon will be available on github under a free license • Full writeup on en-wp forthcoming