Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
220
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
230
Crowd funded free software
pfctdayelise
0
190
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
260
Funcargs and other fun with pytest
pfctdayelise
0
280
Zookeepr: home grown conference management software
pfctdayelise
0
180
Why "gender" should be a text field
pfctdayelise
0
250
Distributed wikis
pfctdayelise
0
190
Neurosexism
pfctdayelise
0
310
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
190
Other Decks in Technology
See All in Technology
OpenClawで回す組織運営
jacopen
3
590
Claude Codeの進化と各機能の活かし方
oikon48
15
6.7k
わたしがセキュアにAWSを使えるわけないじゃん、ムリムリ!(※ムリじゃなかった!?)
cmusudakeisuke
1
250
EMからVPoEを経てCTOへ:マネジメントキャリアパスにおける葛藤と成長
kakehashi
PRO
7
1k
OCI Security サービス 概要
oracle4engineer
PRO
2
13k
All About Sansan – for New Global Engineers
sansan33
PRO
1
1.4k
kintone開発のプラットフォームエンジニアの紹介
cybozuinsideout
PRO
0
830
自動テストが巻き起こした開発プロセス・チームの変化 / Impact of Automated Testing on Development Cycles and Team Dynamics
codmoninc
2
1.1k
Introduction to Bill One Development Engineer
sansan33
PRO
0
380
Oracle Cloud Infrastructure:2026年2月度サービス・アップデート
oracle4engineer
PRO
0
220
8万デプロイ
iwamot
PRO
2
150
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
4
22k
Featured
See All Featured
Producing Creativity
orderedlist
PRO
348
40k
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
GraphQLの誤解/rethinking-graphql
sonatard
75
11k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.1k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.2k
We Are The Robots
honzajavorek
0
190
YesSQL, Process and Tooling at Scale
rocio
174
15k
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
130
Odyssey Design
rkendrick25
PRO
2
540
Git: the NoSQL Database
bkeepers
PRO
432
66k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher