Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
200
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
210
Crowd funded free software
pfctdayelise
0
180
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
240
Funcargs and other fun with pytest
pfctdayelise
0
260
Zookeepr: home grown conference management software
pfctdayelise
0
170
Why "gender" should be a text field
pfctdayelise
0
220
Distributed wikis
pfctdayelise
0
170
Neurosexism
pfctdayelise
0
290
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
170
Other Decks in Technology
See All in Technology
三菱電機・ソニーグループ共同の「Agile Japan企業内サテライト」_2025
sony
0
110
自動テストのコストと向き合ってみた
qa
0
210
「使い方教えて」「事例教えて」じゃもう遅い! Microsoft 365 Copilot を触り倒そう!
taichinakamura
0
110
M5製品で作るポン置きセルラー対応カメラ
sayacom
0
170
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
9.1k
10年の共創が示す、これからの開発者と企業の関係 ~ Crossroad
soracom
PRO
1
660
ACA でMAGI システムを社内で展開しようとした話
mappie_kochi
1
310
スタートアップにおけるこれからの「データ整備」
shomaekawa
2
330
Function calling機能をPLaMo2に実装するには / PFN LLMセミナー
pfn
PRO
0
1k
カンファレンスに託児サポートがあるということ / Having Childcare Support at Conferences
nobu09
1
460
社内お問い合わせBotの仕組みと学び
nish01
1
520
[Keynote] What do you need to know about DevEx in 2025
salaboy
0
140
Featured
See All Featured
Making Projects Easy
brettharned
119
6.4k
BBQ
matthewcrist
89
9.8k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
RailsConf 2023
tenderlove
30
1.2k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Building an army of robots
kneath
306
46k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
620
Rails Girls Zürich Keynote
gr2m
95
14k
The Art of Programming - Codeland 2020
erikaheidi
56
14k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.6k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Code Review Best Practice
trishagee
72
19k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher