Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
170
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
190
Crowd funded free software
pfctdayelise
0
150
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
210
Funcargs and other fun with pytest
pfctdayelise
0
230
Zookeepr: home grown conference management software
pfctdayelise
0
140
Why "gender" should be a text field
pfctdayelise
0
190
Distributed wikis
pfctdayelise
0
150
Neurosexism
pfctdayelise
0
250
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
140
Other Decks in Technology
See All in Technology
RubyKaigi 2025でプロポーザルが初めて採択されるまでにやったこと
yuuu
1
130
一人QA時代が終わり、 QAチームが立ち上がった話
ma_cho29
0
150
LangGraphを使ったAIエージェント実装
iwakiyusaku
1
180
TechBullエンジニアコミュニティの取り組みについて
rvirus0817
0
580
【Oracle Cloud ウェビナー】VMware環境を短期間でクラウド化!ベネッセ様事例に学ぶ仮想環境クラウド移行のリアル
oracle4engineer
PRO
2
110
もうVPNは古い? VPNを使わずに オンプレサーバーを 管理する手法あれこれ
ebibibi
0
170
Kubernetesを手元で学ぼう! 初心者向けローカル環境のススメ
nayaaaa
PRO
2
820
パスキー導入の課題と ベストプラクティス、今後の展望
ritou
6
540
Engineering Managementのグローバルトレンド #emoasis / Engineering Management Global Trend
kyonmm
PRO
4
810
バックエンドエンジニアによるフロントエンドテスト拡充の具体的手法
kinosuke01
1
390
MLflowの現在と未来 / MLflow Present and Future
databricksjapan
1
230
LINE API Deep Dive Q1 2025: Unlocking New Possibilities
linedevth
1
130
Featured
See All Featured
Designing for Performance
lara
605
69k
How STYLIGHT went responsive
nonsquared
99
5.4k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
2.9k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
233
17k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
8
690
YesSQL, Process and Tooling at Scale
rocio
172
14k
Documentation Writing (for coders)
carmenintech
69
4.7k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
176
52k
4 Signs Your Business is Dying
shpigford
183
22k
Rebuilding a faster, lazier Slack
samanthasiow
80
8.9k
It's Worth the Effort
3n
184
28k
Code Review Best Practice
trishagee
67
18k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher