Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Brianna Laugher
November 09, 2009
Technology
220
0
Share
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
250
Crowd funded free software
pfctdayelise
0
200
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
280
Funcargs and other fun with pytest
pfctdayelise
0
290
Zookeepr: home grown conference management software
pfctdayelise
0
190
Why "gender" should be a text field
pfctdayelise
0
270
Distributed wikis
pfctdayelise
0
210
Neurosexism
pfctdayelise
0
320
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
200
Other Decks in Technology
See All in Technology
Swift Sequence の便利 API 再発見
treastrain
1
290
2026年春のAgentCoreアプデ 細かいやつ全部まとめ
minorun365
4
240
CARTA HOLDINGS エンジニア向け 採用ピッチ資料 / CARTA-GUIDE-for-Engineers
carta_engineering
0
47k
"スキルファースト"で作る、AIの自走環境
subroh0508
0
590
SpeechTranscriber + AIによる文字起こし機能
kazuki1220
0
110
障害対応のRunbookは作った、でも本当に動くの? AWS FIS で EKS の AZ 障害を再現してみた
tk3fftk
0
100
JaSSTに関わることで変わった人生観 #jasstnano
makky_tyuyan
0
130
React Compiler導入の効果と運用の工夫
kakehashi
PRO
3
260
写真で見るAWS Summit Singapore 2026
k_adachi_01
0
110
アプリブロック機能のつくりかたと、AIとHTMLの不合理な相性の良さについて
kumamotone
1
260
AWSアップデートから考える継続的な運用改善
toru_kubota
2
300
オライリーイベント登壇資料「鉄リサイクル・産廃業界におけるAI技術実応用のカタチ」
takarasawa_
0
410
Featured
See All Featured
Designing for Performance
lara
611
70k
The browser strikes back
jonoalderson
0
1.1k
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
110
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
270
Rails Girls Zürich Keynote
gr2m
96
14k
A Tale of Four Properties
chriscoyier
163
24k
Believing is Seeing
oripsolob
1
120
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
110k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
62k
Un-Boring Meetings
codingconduct
0
290
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher