Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
140
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
150
Crowd funded free software
pfctdayelise
0
94
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
160
Funcargs and other fun with pytest
pfctdayelise
0
180
Zookeepr: home grown conference management software
pfctdayelise
0
100
Why "gender" should be a text field
pfctdayelise
0
140
Distributed wikis
pfctdayelise
0
98
Neurosexism
pfctdayelise
0
200
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
91
Other Decks in Technology
See All in Technology
推薦システムを本番導入する上で一番優先すべきだったこと~NewsPicks記事推薦機能の改善事例を元に~
morinota
0
130
Azure Pipelinesを使用したCICDベースラインアーキテクチャ実践
yuriemori
0
190
地理情報とAPIのトレンド
nagix
0
160
GoとアクターモデルでES+CQRSを実践! / proto_actor_es_cqrs
ytake
1
150
Git 研修 Basic【MIXI 24新卒技術研修】
mixi_engineers
PRO
0
310
AWSサービスメニュー開発をしていてAWSを好きだ!と感じた瞬間
toru_kubota
0
130
AutomatedLabを使って内部ペンテストを勉強しよう! -やられ社内ネットワークの自動構築-
n_etupirka
1
610
累計ダウンロード数1億8000万を超えるアプリケーションプラットフォームのレガシーシステム脱却とモダン化への道
kmitsuhashi
0
120
Github Actions 로 Android 팀의 효율성 극대화
hadonghyun
0
160
データベース研修 DB基礎【MIXI 24新卒技術研修】
mixi_engineers
PRO
0
210
[I/O Extended Android 2024] What`s new in Android 2024
kyeongwan
0
220
エンジニアの生存戦略 〜クラウド潮流の経験から紐解く技術トレンドのメカニズムと乗りこなし方〜
shimy
9
1.9k
Featured
See All Featured
5 minutes of I Can Smell Your CMS
philhawksworth
200
19k
Being A Developer After 40
akosma
72
580k
Typedesign – Prime Four
hannesfritz
37
2.2k
Designing with Data
zakiwarfel
96
5k
Done Done
chrislema
179
15k
YesSQL, Process and Tooling at Scale
rocio
166
14k
Optimizing for Happiness
mojombo
373
69k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
248
20k
Writing Fast Ruby
sferik
623
60k
Designing the Hi-DPI Web
ddemaree
276
34k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
13
430
Clear Off the Table
cherdarchuk
89
320k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher