Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
210
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
230
Crowd funded free software
pfctdayelise
0
180
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
260
Funcargs and other fun with pytest
pfctdayelise
0
270
Zookeepr: home grown conference management software
pfctdayelise
0
180
Why "gender" should be a text field
pfctdayelise
0
250
Distributed wikis
pfctdayelise
0
190
Neurosexism
pfctdayelise
0
300
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
190
Other Decks in Technology
See All in Technology
コンテナセキュリティの最新事情 ~ 2026年版 ~
kyohmizu
5
810
登壇駆動学習のすすめ — CfPのネタの見つけ方と書くときに意識していること
bicstone
3
130
仕様書駆動AI開発の実践: Issue→Skill→PRテンプレで 再現性を作る
knishioka
2
680
[CV勉強会@関東 World Model 読み会] Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models (Mousakhan+, NeurIPS 2025)
abemii
0
150
ブロックテーマでサイトをリニューアルした話 / 2026-01-31 Kansai WordPress Meetup
torounit
0
480
ClickHouseはどのように大規模データを活用したAIエージェントを全社展開しているのか
mikimatsumoto
0
270
顧客の言葉を、そのまま信じない勇気
yamatai1212
1
360
Oracle AI Database移行・アップグレード勉強会 - RAT活用編
oracle4engineer
PRO
0
110
プロポーザルに込める段取り八分
shoheimitani
1
610
モダンUIでフルサーバーレスなAIエージェントをAmplifyとCDKでサクッとデプロイしよう
minorun365
4
220
CDK対応したAWS DevOps Agentを試そう_20260201
masakiokuda
1
380
インフラエンジニア必見!Kubernetesを用いたクラウドネイティブ設計ポイント大全
daitak
1
380
Featured
See All Featured
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
76
What the history of the web can teach us about the future of AI
inesmontani
PRO
1
430
A Modern Web Designer's Workflow
chriscoyier
698
190k
How to Think Like a Performance Engineer
csswizardry
28
2.5k
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
1
280
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
460
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.7k
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
120
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
3.9k
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
110
Balancing Empowerment & Direction
lara
5
890
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
460
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher