Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
160
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
180
Crowd funded free software
pfctdayelise
0
130
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
200
Funcargs and other fun with pytest
pfctdayelise
0
210
Zookeepr: home grown conference management software
pfctdayelise
0
130
Why "gender" should be a text field
pfctdayelise
0
180
Distributed wikis
pfctdayelise
0
130
Neurosexism
pfctdayelise
0
240
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
130
Other Decks in Technology
See All in Technology
新機能VPCリソースエンドポイント機能検証から得られた考察
duelist2020jp
0
230
Amazon Kendra GenAI Index 登場でどう変わる? 評価から学ぶ最適なRAG構成
naoki_0531
0
130
LINEスキマニにおけるフロントエンド開発
lycorptech_jp
PRO
0
340
プロダクト開発を加速させるためのQA文化の築き方 / How to build QA culture to accelerate product development
mii3king
1
290
OCI技術資料 : ファイル・ストレージ 概要
ocise
3
11k
LINE Developersプロダクト(LIFF/LINE Login)におけるフロントエンド開発
lycorptech_jp
PRO
0
150
DUSt3R, MASt3R, MASt3R-SfM にみる3D基盤モデル
spatial_ai_network
2
260
GitHub Copilot のテクニック集/GitHub Copilot Techniques
rayuron
39
16k
効率的な技術組織が作れる!書籍『チームトポロジー』要点まとめ
iwamot
1
110
宇宙ベンチャーにおける最近の情シス取り組みについて
axelmizu
0
120
型情報を用いたLintでコード品質を向上させる
sansantech
PRO
2
140
C++26 エラー性動作
faithandbrave
2
820
Featured
See All Featured
Fontdeck: Realign not Redesign
paulrobertlloyd
82
5.3k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
48
2.2k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
8
1.2k
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
Rails Girls Zürich Keynote
gr2m
94
13k
Gamification - CAS2011
davidbonilla
80
5.1k
How to Think Like a Performance Engineer
csswizardry
22
1.2k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
365
25k
Optimizing for Happiness
mojombo
376
70k
Reflections from 52 weeks, 52 projects
jeffersonlam
347
20k
Navigating Team Friction
lara
183
15k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
26
1.9k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher