Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
200
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
210
Crowd funded free software
pfctdayelise
0
170
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
230
Funcargs and other fun with pytest
pfctdayelise
0
250
Zookeepr: home grown conference management software
pfctdayelise
0
160
Why "gender" should be a text field
pfctdayelise
0
220
Distributed wikis
pfctdayelise
0
170
Neurosexism
pfctdayelise
0
280
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
160
Other Decks in Technology
See All in Technology
AI時代を生き抜くエンジニアキャリアの築き方 (AI-Native 時代、エンジニアという道は 「最大の挑戦の場」となる) / Building an Engineering Career to Thrive in the Age of AI (In the AI-Native Era, the Path of Engineering Becomes the Ultimate Arena of Challenge)
jeongjaesoon
0
140
Codeful Serverless / 一人運用でもやり抜く力
_kensh
7
430
要件定義・デザインフェーズでもAIを活用して、コミュニケーションの密度を高める
kazukihayase
0
110
2つのフロントエンドと状態管理
mixi_engineers
PRO
3
110
Webブラウザ向け動画配信プレイヤーの 大規模リプレイスから得た知見と学び
yud0uhu
0
230
これでもう迷わない!Jetpack Composeの書き方実践ガイド
zozotech
PRO
0
860
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
9
73k
KotlinConf 2025_イベントレポート
sony
1
140
TS-S205_昨年対比2倍以上の機能追加を実現するデータ基盤プロジェクトでのAI活用について
kaz3284
1
160
250905 大吉祥寺.pm 2025 前夜祭 「プログラミングに出会って20年、『今』が1番楽しい」
msykd
PRO
1
930
5分でカオスエンジニアリングを分かった気になろう
pandayumi
0
240
【実演版】カンファレンス登壇者・スタッフにこそ知ってほしいマイクの使い方 / 大吉祥寺.pm 2025
arthur1
1
850
Featured
See All Featured
The Cult of Friendly URLs
andyhume
79
6.6k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.1k
Raft: Consensus for Rubyists
vanstee
140
7.1k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
The Art of Programming - Codeland 2020
erikaheidi
56
13k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
188
55k
KATA
mclloyd
32
14k
The Pragmatic Product Professional
lauravandoore
36
6.9k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.9k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher