Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
210
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
220
Crowd funded free software
pfctdayelise
0
180
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
250
Funcargs and other fun with pytest
pfctdayelise
0
260
Zookeepr: home grown conference management software
pfctdayelise
0
180
Why "gender" should be a text field
pfctdayelise
0
240
Distributed wikis
pfctdayelise
0
180
Neurosexism
pfctdayelise
0
300
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
180
Other Decks in Technology
See All in Technology
生成AIでテスト設計はどこまでできる? 「テスト粒度」を操るテーラリング術
shota_kusaba
0
760
年間40件以上の登壇を続けて見えた「本当の発信力」/ 20251213 Masaki Okuda
shift_evolve
PRO
1
130
GitHub Copilotを使いこなす 実例に学ぶAIコーディング活用術
74th
3
3.1k
EM歴1年10ヶ月のぼくがぶち当たった苦悩とこれからへ向けて
maaaato
0
280
文字列の並び順 / Unicode Collation
tmtms
3
580
20251209_WAKECareer_生成AIを活用した設計・開発プロセス
syobochim
7
1.6k
因果AIへの招待
sshimizu2006
0
980
Ruby で作る大規模イベントネットワーク構築・運用支援システム TTDB
taketo1113
1
300
5分で知るMicrosoft Ignite
taiponrock
PRO
0
360
コンテキスト情報を活用し個社最適化されたAI Agentを実現する4つのポイント
kworkdev
PRO
0
1.2k
【AWS re:Invent 2025速報】AIビルダー向けアップデートをまとめて解説!
minorun365
4
520
[デモです] NotebookLM で作ったスライドの例
kongmingstrap
0
140
Featured
See All Featured
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.2k
Speed Design
sergeychernyshev
33
1.4k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.5k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
4 Signs Your Business is Dying
shpigford
186
22k
RailsConf 2023
tenderlove
30
1.3k
Git: the NoSQL Database
bkeepers
PRO
432
66k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.8k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
970
How to Think Like a Performance Engineer
csswizardry
28
2.4k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher