Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Brianna Laugher
November 09, 2009
Technology
0
220
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
230
Crowd funded free software
pfctdayelise
0
190
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
270
Funcargs and other fun with pytest
pfctdayelise
0
280
Zookeepr: home grown conference management software
pfctdayelise
0
180
Why "gender" should be a text field
pfctdayelise
0
250
Distributed wikis
pfctdayelise
0
200
Neurosexism
pfctdayelise
0
310
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
190
Other Decks in Technology
See All in Technology
ナレッジワーク IT情報系キャリア研究セッション資料(情報処理学会 第88回全国大会 )
kworkdev
PRO
0
200
脳内メモリ、思ったより揮発性だった
koutorino
0
360
非情報系研究者へ送る Transformer入門
rishiyama
11
7.5k
VLAモデル構築のための AIロボット向け模倣学習キット
kmatsuiugo
0
140
マルチプレーンGPUネットワークを実現するシャッフルアーキテクチャの整理と考察
markunet
2
250
身体を持ったパーソナルAIエージェントの 可能性を探る開発
yokomachi
1
120
会社紹介資料 / Sansan Company Profile
sansan33
PRO
16
410k
内製AIチャットボットで学んだDatadog LLM Observability活用術
mkdev10
0
110
複数クラスタ運用と検索の高度化:ビズリーチにおけるElastic活用事例 / ElasticON Tokyo2026
visional_engineering_and_design
0
160
楽しく学ぼう!コミュニティ入門 AWSと人が つむいできたストーリー
hiroramos4
PRO
1
200
Agent ServerはWeb Serverではない。ADKで考えるAgentOps
akiratameto
0
110
VPCエンドポイント意外とお金かかるなぁ。せや、共有したろ!
tommy0124
1
600
Featured
See All Featured
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
220
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.1k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
140
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
For a Future-Friendly Web
brad_frost
183
10k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
480
The Cult of Friendly URLs
andyhume
79
6.8k
Rails Girls Zürich Keynote
gr2m
96
14k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.7k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher