Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
210
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
220
Crowd funded free software
pfctdayelise
0
180
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
250
Funcargs and other fun with pytest
pfctdayelise
0
260
Zookeepr: home grown conference management software
pfctdayelise
0
170
Why "gender" should be a text field
pfctdayelise
0
240
Distributed wikis
pfctdayelise
0
180
Neurosexism
pfctdayelise
0
300
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
180
Other Decks in Technology
See All in Technology
Eight Engineering Unit 紹介資料
sansan33
PRO
0
5.7k
Databricksによるエージェント構築
taka_aki
1
110
Dify on AWS の選択肢
ysekiy
0
130
Symfony AI in Action
el_stoffel
2
350
ページの可視領域を算出する方法について整理する
yamatai1212
0
160
Multimodal AI Driving Solutions to Societal Challenges
keio_smilab
PRO
1
110
「え?!それ今ではHTMLだけでできるの!?」驚きの進化を遂げたモダンHTML
riyaamemiya
9
4.3k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
47k
知っていると得する!Movable Type 9 の新機能を徹底解説
masakah
0
180
私も懇親会は苦手でした ~苦手だからこそ懇親会を楽しむ方法~ / 20251127 Masaki Okuda
shift_evolve
PRO
4
540
Introduction to Bill One Development Engineer
sansan33
PRO
0
320
Ryzen NPUにおけるAI Engineプログラミング
anjn
0
150
Featured
See All Featured
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
Mobile First: as difficult as doing things right
swwweet
225
10k
Navigating Team Friction
lara
191
16k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.2k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
RailsConf 2023
tenderlove
30
1.3k
4 Signs Your Business is Dying
shpigford
186
22k
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
Raft: Consensus for Rubyists
vanstee
140
7.2k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
370
Learning to Love Humans: Emotional Interface Design
aarron
274
41k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher