Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
170
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
180
Crowd funded free software
pfctdayelise
0
140
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
200
Funcargs and other fun with pytest
pfctdayelise
0
220
Zookeepr: home grown conference management software
pfctdayelise
0
140
Why "gender" should be a text field
pfctdayelise
0
190
Distributed wikis
pfctdayelise
0
140
Neurosexism
pfctdayelise
0
250
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
130
Other Decks in Technology
See All in Technology
[SRE kaigi 2025] ガバメントクラウドに向けた開発と変化するSRE組織のあり方 / Development for Government Cloud and the Evolving Role of SRE Teams
kazeburo
4
1.9k
さいきょうのアーキテクチャを生み出すセンスメイキング
jgeem
0
270
プロダクト価値を引き上げる、「課題の再定義」という習慣
moeka__c
0
210
Tech Blog執筆のモチベート向上作戦
imamura_ko_0314
0
740
あなたの興味は信頼性?それとも生産性? SREとしてのキャリアに悩むみなさまに伝えたい選択肢
jacopen
6
3.1k
Tokyo RubyKaigi 12 - Scaling Ruby at GitHub
jhawthorn
2
210
信頼性を支えるテレメトリーパイプラインの構築 / Building Telemetry Pipeline with OpenTelemetry
ymotongpoo
9
5k
エラーバジェット枯渇の原因 - 偽陽性との戦い -
phaya72
1
100
プロダクト観点で考えるデータ基盤の育成戦略 / Growth Strategy of Data Analytics Platforms from a Product Perspective
yamamotoyuta
0
210
日本語プログラミングとSpring Bootアプリケーション開発 #kanjava
yusuke
2
340
ChatGPTを使ったブログ執筆と校正の実践テクニック/登壇資料(井田 献一朗)
hacobu
1
160
Microsoft Ignite 2024 最新情報!Microsoft 365 Agents SDK 概要 / Microsoft Ignite 2024 latest news Microsoft 365 Agents SDK overview
karamem0
0
190
Featured
See All Featured
Producing Creativity
orderedlist
PRO
343
39k
Product Roadmaps are Hard
iamctodd
PRO
50
11k
How STYLIGHT went responsive
nonsquared
96
5.3k
The Pragmatic Product Professional
lauravandoore
32
6.4k
Learning to Love Humans: Emotional Interface Design
aarron
274
40k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
6
220
The Illustrated Children's Guide to Kubernetes
chrisshort
48
49k
Automating Front-end Workflow
addyosmani
1367
200k
Building Adaptive Systems
keathley
39
2.4k
KATA
mclloyd
29
14k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher