Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
170
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
190
Crowd funded free software
pfctdayelise
0
150
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
210
Funcargs and other fun with pytest
pfctdayelise
0
230
Zookeepr: home grown conference management software
pfctdayelise
0
140
Why "gender" should be a text field
pfctdayelise
0
190
Distributed wikis
pfctdayelise
0
140
Neurosexism
pfctdayelise
0
250
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
140
Other Decks in Technology
See All in Technology
どうすると生き残れないのか/how-not-to-survive
hanhan1978
10
8.4k
あなたが人生で成功するための5つの普遍的法則 #jawsug #jawsdays2025 / 20250301 HEROZ
yoshidashingo
2
480
Aurora PostgreSQLがCloudWatch Logsに 出力するログの課金を削減してみる #jawsdays2025
non97
1
280
Amazon Bedrock Knowledge basesにLangfuse導入してみた
sonoda_mj
2
290
ライフステージの変化を乗り越える 探索型のキャリア選択
tenshoku_draft
2
360
User Story Mapping + Inclusive Team
kawaguti
PRO
3
610
エンジニアの健康管理術 / Engineer Health Management Techniques
y_sone
8
6.3k
20250307_エンジニアじゃないけどAzureはじめてみた
ponponmikankan
2
270
フォーイット_エンジニア向け会社紹介資料_Forit_Company_Profile.pdf
forit_tech
1
1.7k
Cracking the Coding Interview 6th Edition
gdplabs
14
28k
目標と時間軸 〜ベイビーステップでケイパビリティを高めよう〜
kakehashi
PRO
8
1.1k
DevinでAI AWSエンジニア製造計画 序章 〜CDKを添えて〜/devin-load-to-aws-engineer
tomoki10
0
260
Featured
See All Featured
Build The Right Thing And Hit Your Dates
maggiecrowley
34
2.5k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
29
1.1k
How STYLIGHT went responsive
nonsquared
99
5.4k
Designing Experiences People Love
moore
140
23k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7.1k
GitHub's CSS Performance
jonrohan
1030
460k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.3k
Fashionably flexible responsive web design (full day workshop)
malarkey
406
66k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
120k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
30
4.6k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher