$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
210
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
220
Crowd funded free software
pfctdayelise
0
180
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
250
Funcargs and other fun with pytest
pfctdayelise
0
260
Zookeepr: home grown conference management software
pfctdayelise
0
180
Why "gender" should be a text field
pfctdayelise
0
240
Distributed wikis
pfctdayelise
0
180
Neurosexism
pfctdayelise
0
300
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
180
Other Decks in Technology
See All in Technology
AWS Bedrock AgentCoreで作る 1on1支援AIエージェント 〜Memory × Evaluationsによる実践開発〜
yusukeshimizu
6
380
ブロックテーマとこれからの WordPress サイト制作 / Toyama WordPress Meetup Vol.81
torounit
0
530
法人支出管理領域におけるソフトウェアアーキテクチャに基づいたテスト戦略の実践
ogugu9
1
210
計算機科学をRubyと歩む 〜DFA型正規表現エンジンをつくる~
ydah
3
210
【AWS re:Invent 2025速報】AIビルダー向けアップデートをまとめて解説!
minorun365
4
480
AWS Trainium3 をちょっと身近に感じたい
bigmuramura
1
130
グレートファイアウォールを自宅に建てよう
ctes091x
0
140
最近のLinux普段づかいWaylandデスクトップ元年
penguin2716
1
680
AWSを使う上で最低限知っておきたいセキュリティ研修を社内で実施した話 ~みんなでやるセキュリティ~
maimyyym
2
230
Ruby で作る大規模イベントネットワーク構築・運用支援システム TTDB
taketo1113
1
220
Microsoft Agent 365 を 30 分でなんとなく理解する
skmkzyk
1
1k
ログ管理の新たな可能性?CloudWatchの新機能をご紹介
ikumi_ono
1
580
Featured
See All Featured
Navigating Team Friction
lara
191
16k
Fireside Chat
paigeccino
41
3.7k
Reflections from 52 weeks, 52 projects
jeffersonlam
355
21k
RailsConf 2023
tenderlove
30
1.3k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.4k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.1k
Speed Design
sergeychernyshev
33
1.4k
Raft: Consensus for Rubyists
vanstee
141
7.2k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.8k
The Cult of Friendly URLs
andyhume
79
6.7k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher