Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
100
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
120
Crowd funded free software
pfctdayelise
0
72
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
140
Funcargs and other fun with pytest
pfctdayelise
0
160
Zookeepr: home grown conference management software
pfctdayelise
0
80
Why "gender" should be a text field
pfctdayelise
0
100
Distributed wikis
pfctdayelise
0
77
Neurosexism
pfctdayelise
0
180
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
70
Other Decks in Technology
See All in Technology
アクセシビリティを考慮したUI/CSSフレームワーク・ライブラリ選定
yajihum
2
410
開発生産性向上サービスを作るFindyが自分たちで開発生産性を爆上げした組織づくりの歩み / Findy's path to boosting its own development productivity 2024-04-17
ma3tk
3
350
「共通基盤」を超えよ! 今、Platform Engineeringに取り組むべき理由
jacopen
25
5.9k
オブザーバビリティの Primary Signals
onk
PRO
0
550
o11y入門_外形監視を利用したWebアプリケーションへの最適なモニタリング_TechBrew
k5k
3
100
自動生成を活用した、運用保守コストを抑える Error/Alert/Runbook の一元集約管理 / Centralized management of Error/Alert/Runbook to minimize operational costs using automated code generation
biwashi
10
2.2k
「手動オペレーションに定評がある」と言われた私が心がけていること / phpcon_odawara2024
blue_goheimochi
2
320
アプリがつくるNOT A HOTELブランド
hokuts
1
450
Oracle Cloud Infrastructure:2024年4月度サービス・アップデート
oracle4engineer
PRO
1
120
レガシーをぶっ壊せ。AEONで始めるDevRelの話 / Qiita Night 2024-2-22
aeonpeople
3
150
"好き"との生活/Regularly update profile with GitHub Actions
judeeeee
0
150
株式会社EventHub・エンジニア採用資料
eventhub
0
1.9k
Featured
See All Featured
Building Your Own Lightsaber
phodgson
98
5.7k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
60
14k
How To Stay Up To Date on Web Technology
chriscoyier
782
250k
Art, The Web, and Tiny UX
lynnandtonic
288
19k
The Invisible Side of Design
smashingmag
294
49k
Infographics Made Easy
chrislema
237
18k
Java REST API Framework Comparison - PWX 2021
mraible
PRO
18
6.9k
Practical Orchestrator
shlominoach
181
9.7k
Building Effective Engineering Teams - LeadDev
addyosmani
27
1.8k
Rebuilding a faster, lazier Slack
samanthasiow
72
8.2k
Fantastic passwords and where to find them - at NoRuKo
philnash
36
2.5k
Happy Clients
brianwarren
91
6.4k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher