Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
210
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
230
Crowd funded free software
pfctdayelise
0
180
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
250
Funcargs and other fun with pytest
pfctdayelise
0
270
Zookeepr: home grown conference management software
pfctdayelise
0
180
Why "gender" should be a text field
pfctdayelise
0
240
Distributed wikis
pfctdayelise
0
190
Neurosexism
pfctdayelise
0
300
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
180
Other Decks in Technology
See All in Technology
産業的変化も組織的変化も乗り越えられるチームへの成長 〜チームの変化から見出す明るい未来〜
kakehashi
PRO
1
240
AWSと生成AIで学ぶ!実行計画の読み解き方とSQLチューニングの実践
yakumo
2
120
Bedrock AgentCore Evaluationsで学ぶLLM as a judge入門
shichijoyuhi
2
310
会社紹介資料 / Sansan Company Profile
sansan33
PRO
11
390k
歴史から学ぶ、Goのメモリ管理基礎
logica0419
9
1.7k
AI with TiDD
shiraji
1
330
Qiita Bash アドカレ LT #1
okaru
0
150
_第4回__AIxIoTビジネス共創ラボ紹介資料_20251203.pdf
iotcomjpadmin
0
170
ESXi のAIOps だ!2025冬
unnowataru
0
470
松尾研LLM講座2025 応用編Day3「軽量化」 講義資料
aratako
15
4.8k
Node vs Deno vs Bun 〜推しランタイムを見つけよう〜
kamekyame
1
110
スクラムを一度諦めたチームにアジャイルコーチが入ってどう変化したか
kyamashiro73
0
120
Featured
See All Featured
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.5k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
34
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
72
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
0
400
Building the Perfect Custom Keyboard
takai
2
670
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
1
210
Testing 201, or: Great Expectations
jmmastey
46
7.8k
More Than Pixels: Becoming A User Experience Designer
marktimemedia
2
270
Designing for humans not robots
tammielis
254
26k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
0
36
The #1 spot is gone: here's how to win anyway
tamaranovitovic
1
880
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher