Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
230
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
260
Crowd funded free software
pfctdayelise
0
200
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
290
Funcargs and other fun with pytest
pfctdayelise
0
290
Zookeepr: home grown conference management software
pfctdayelise
0
200
Why "gender" should be a text field
pfctdayelise
0
270
Distributed wikis
pfctdayelise
0
220
Neurosexism
pfctdayelise
0
330
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
210
Other Decks in Technology
See All in Technology
AI時代のコスト管理を考えよう〜明日から使える実践AWSノウハウ~
yoshimi0227
0
940
秘密度ラベル初心者が第1歩でつまづかないための「設計・運用」ポイント
seafay
PRO
1
510
IaC コードを資産へ:AWS CDK 社内ライブラリと横断展開 / aws-summit-japan-2026
gotok365
10
1.6k
スタートアップにAmazon EKSは早すぎる? マルチプロダクト戦略を加速する Platform Engineeringの実践 / Is Amazon EKS Too Soon for Startups? Practical Platform Engineering to Accelerate a Multi-Product Strategy
elmodev09
1
1.9k
CVE-2026-20833_脆弱性対応とAES 化について
jukishiya
0
110
AI 不只幫你寫 Code: 當專案從 300 暴增到 1500, 我們如何撐住 DevOps
appleboy
0
280
現場のトークンマネジメント
dak2
1
200
FPGAの開発コンペでZephyrを使ってみた
iotengineer22
0
220
時期が悪い!それでもRaspberry Piを買って遊んで活用するには / 20260627-osc26do-rpi-jikigawarui
akkiesoft
1
900
PostgreSQL 19 新機能概要 OSC Hokkaido 2026
nori_shinoda
0
260
ご挨拶「10周年を迎える共創ラボのこれまでとこれから」
iotcomjpadmin
0
150
#エンジニアBooks 30分でわかる 「技術記事を書く技術」 / engineer-books 2026-06-30
jnchito
1
130
Featured
See All Featured
Why Our Code Smells
bkeepers
PRO
340
58k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
610
Everyday Curiosity
cassininazir
0
240
VelocityConf: Rendering Performance Case Studies
addyosmani
333
25k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
490
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Prompt Engineering for Job Search
mfonobong
0
350
Java REST API Framework Comparison - PWX 2021
mraible
34
9.4k
Scaling GitHub
holman
464
140k
The #1 spot is gone: here's how to win anyway
tamaranovitovic
3
1.1k
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
1.1k
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
170
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher