Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
200
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
210
Crowd funded free software
pfctdayelise
0
170
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
230
Funcargs and other fun with pytest
pfctdayelise
0
250
Zookeepr: home grown conference management software
pfctdayelise
0
160
Why "gender" should be a text field
pfctdayelise
0
220
Distributed wikis
pfctdayelise
0
170
Neurosexism
pfctdayelise
0
280
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
160
Other Decks in Technology
See All in Technology
オブザーバビリティが広げる AIOps の世界 / The World of AIOps Expanded by Observability
aoto
PRO
0
340
BPaaSにおける人と協働する前提のAIエージェント-AWS登壇資料
kentarofujii
0
130
エラーとアクセシビリティ
schktjm
1
1.2k
AIのグローバルトレンド2025 #scrummikawa / global ai trend
kyonmm
PRO
1
260
大「個人開発サービス」時代に僕たちはどう生きるか
sotarok
20
9.7k
未経験者・初心者に贈る!40分でわかるAndroidアプリ開発の今と大事なポイント
operando
5
350
バッチ処理で悩むバックエンドエンジニアに捧げるAWS Glue入門
diggymo
3
190
ChatGPTとPlantUML/Mermaidによるソフトウェア設計
gowhich501
1
130
なぜテストマネージャの視点が 必要なのか? 〜 一歩先へ進むために 〜
moritamasami
0
210
AWSで始める実践Dagster入門
kitagawaz
1
580
なぜSaaSがMCPサーバーをサービス提供するのか?
sansantech
PRO
8
2.7k
AI開発ツールCreateがAnythingになったよ
tendasato
0
120
Featured
See All Featured
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Why You Should Never Use an ORM
jnunemaker
PRO
59
9.5k
Why Our Code Smells
bkeepers
PRO
339
57k
Facilitating Awesome Meetings
lara
55
6.5k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
53k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.9k
Art, The Web, and Tiny UX
lynnandtonic
302
21k
Building Better People: How to give real-time feedback that sticks.
wjessup
368
19k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
Bash Introduction
62gerente
615
210k
Typedesign – Prime Four
hannesfritz
42
2.8k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher