Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Brianna Laugher
November 09, 2009
Technology
0
210
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
230
Crowd funded free software
pfctdayelise
0
180
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
260
Funcargs and other fun with pytest
pfctdayelise
0
270
Zookeepr: home grown conference management software
pfctdayelise
0
180
Why "gender" should be a text field
pfctdayelise
0
250
Distributed wikis
pfctdayelise
0
190
Neurosexism
pfctdayelise
0
300
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
190
Other Decks in Technology
See All in Technology
ブロックテーマ、WordPress でウェブサイトをつくるということ / 2026.02.07 Gifu WordPress Meetup
torounit
0
200
私たち準委任PdEは2つのプロダクトに挑戦する ~ソフトウェア、開発支援という”二重”のプロダクトエンジニアリングの実践~ / 20260212 Naoki Takahashi
shift_evolve
PRO
1
110
こんなところでも(地味に)活躍するImage Modeさんを知ってるかい?- Image Mode for OpenShift -
tsukaman
1
170
Cosmos World Foundation Model Platform for Physical AI
takmin
0
960
Cloud Runでコロプラが挑む 生成AI×ゲーム『神魔狩りのツクヨミ』の裏側
colopl
0
120
M&A 後の統合をどう進めるか ─ ナレッジワーク × Poetics が実践した組織とシステムの融合
kworkdev
PRO
1
490
AIと新時代を切り拓く。これからのSREとメルカリIBISの挑戦
0gm
2
3.1k
Why Organizations Fail: ノーベル経済学賞「国家はなぜ衰退するのか」から考えるアジャイル組織論
kawaguti
PRO
1
160
AIエージェントを開発しよう!-AgentCore活用の勘所-
yukiogawa
0
180
生成AIと余白 〜開発スピードが向上した今、何に向き合う?〜
kakehashi
PRO
0
120
[CV勉強会@関東 World Model 読み会] Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models (Mousakhan+, NeurIPS 2025)
abemii
0
150
OpenShiftでllm-dを動かそう!
jpishikawa
0
140
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Abbi's Birthday
coloredviolet
1
4.8k
Game over? The fight for quality and originality in the time of robots
wayneb77
1
120
How to Talk to Developers About Accessibility
jct
2
140
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
79
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
290
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.7k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
14k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
94
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.6k
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
240
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher