Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
150
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
170
Crowd funded free software
pfctdayelise
0
130
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
190
Funcargs and other fun with pytest
pfctdayelise
0
200
Zookeepr: home grown conference management software
pfctdayelise
0
130
Why "gender" should be a text field
pfctdayelise
0
170
Distributed wikis
pfctdayelise
0
120
Neurosexism
pfctdayelise
0
230
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
120
Other Decks in Technology
See All in Technology
【令和最新版】AWS Direct Connectと愉快なGWたちのおさらい
minorun365
PRO
5
760
組織成長を加速させるオンボーディングの取り組み
sudoakiy
2
180
Engineer Career Talk
lycorp_recruit_jp
0
180
飲食店データの分析事例とそれを支えるデータ基盤
kimujun
0
120
安心してください、日本語使えますよ―Ubuntu日本語Remix提供休止に寄せて― 2024-11-17
nobutomurata
1
1k
アジャイルでの品質の進化 Agile in Motion vol.1/20241118 Hiroyuki Sato
shift_evolve
0
170
【Startup CTO of the Year 2024 / Audience Award】アセンド取締役CTO 丹羽健
niwatakeru
0
1.3k
IBC 2024 動画技術関連レポート / IBC 2024 Report
cyberagentdevelopers
PRO
1
110
Shopifyアプリ開発における Shopifyの機能活用
sonatard
4
250
Terraform未経験の御様に対してどの ように導⼊を進めていったか
tkikuchi
2
450
EventHub Startup CTO of the year 2024 ピッチ資料
eventhub
0
120
Platform Engineering for Software Developers and Architects
syntasso
1
520
Featured
See All Featured
Being A Developer After 40
akosma
87
590k
4 Signs Your Business is Dying
shpigford
180
21k
It's Worth the Effort
3n
183
27k
The Pragmatic Product Professional
lauravandoore
31
6.3k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Building Flexible Design Systems
yeseniaperezcruz
327
38k
Large-scale JavaScript Application Architecture
addyosmani
510
110k
Imperfection Machines: The Place of Print at Facebook
scottboms
265
13k
The Power of CSS Pseudo Elements
geoffreycrofte
73
5.3k
Become a Pro
speakerdeck
PRO
25
5k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
8
890
Building Your Own Lightsaber
phodgson
103
6.1k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher