Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
190
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
200
Crowd funded free software
pfctdayelise
0
170
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
230
Funcargs and other fun with pytest
pfctdayelise
0
250
Zookeepr: home grown conference management software
pfctdayelise
0
160
Why "gender" should be a text field
pfctdayelise
0
220
Distributed wikis
pfctdayelise
0
170
Neurosexism
pfctdayelise
0
280
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
160
Other Decks in Technology
See All in Technology
20250807_Kiroと私の反省会
riz3f7
0
220
Amazon S3 Vectorsは大規模ベクトル検索を低コスト化するサーバーレスなベクトルデータベースだ #jawsugsaga / S3 Vectors As A Serverless Vector Database
quiver
1
460
20250807 Applied Engineer Open House
sakana_ai
PRO
2
380
Telemetry APIから学ぶGoogle Cloud ObservabilityとOpenTelemetryの現在 / getting-started-telemetry-api-with-google-cloud
k6s4i53rx
0
150
家族の思い出を形にする 〜 1秒動画の生成を支えるインフラアーキテクチャ
ojima_h
3
1.1k
大規模イベントに向けた ABEMA アーキテクチャの遍歴 ~ Platform Strategy 詳細解説 ~
nagapad
0
230
AI関数が早くなったので試してみよう
kumakura
0
290
Nx × AI によるモノレポ活用 〜コードジェネレーター編〜
puku0x
0
560
Findy Freelance 利用シーン別AI活用例
ness
0
490
データモデリング通り #2オンライン勉強会 ~方法論の話をしよう~
datayokocho
0
160
Rubyの国のPerlMonger
anatofuz
3
740
「AIと一緒にやる」が当たり前になるまでの奮闘記
kakehashi
PRO
3
140
Featured
See All Featured
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
Typedesign – Prime Four
hannesfritz
42
2.7k
Faster Mobile Websites
deanohume
308
31k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.4k
Practical Orchestrator
shlominoach
190
11k
Testing 201, or: Great Expectations
jmmastey
45
7.6k
BBQ
matthewcrist
89
9.8k
How to Ace a Technical Interview
jacobian
278
23k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Site-Speed That Sticks
csswizardry
10
770
How to train your dragon (web standard)
notwaldorf
96
6.2k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher