Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
220
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
230
Crowd funded free software
pfctdayelise
0
190
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
260
Funcargs and other fun with pytest
pfctdayelise
0
280
Zookeepr: home grown conference management software
pfctdayelise
0
180
Why "gender" should be a text field
pfctdayelise
0
250
Distributed wikis
pfctdayelise
0
190
Neurosexism
pfctdayelise
0
310
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
190
Other Decks in Technology
See All in Technology
まなび領域における生成AI活用事例
recruitengineers
PRO
2
100
AWS DevOps Agent vs SRE俺 / AWS DevOps Agent vs me, the SRE
sms_tech
2
250
「ストレッチゾーンに挑戦し続ける」ことって難しくないですか? メンバーの持続的成長を支えるEMの環境設計
sansantech
PRO
3
360
元エンジニアPdM、IDEが恋しすぎてCursorに全業務を集約したら、スライド作成まで爆速になった話
doiko123
1
410
OpenClawで回す組織運営
jacopen
3
610
男(監査)はつらいよ - Policy as CodeからAIエージェントへ
ken5scal
5
770
Eight Engineering Unit 紹介資料
sansan33
PRO
1
6.9k
新職業『オーケストレーター』誕生 — エージェント10体を同時に回すAgentOps
gunta
4
1.6k
タスク管理も1on1も、もう「管理」じゃない ― KiroとBedrock AgentCoreで変わった"判断の仕事"
yusukeshimizu
2
1.3k
バクラクのSREにおけるAgentic AIへの挑戦/Our Journey with Agentic AI
taddy_919
2
1.1k
類似画像検索モデルの開発ノウハウ
lycorptech_jp
PRO
4
990
ビズリーチにおける検索・推薦の取り組み / DEIM2026
visional_engineering_and_design
1
110
Featured
See All Featured
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
200
Facilitating Awesome Meetings
lara
57
6.8k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
210
Code Review Best Practice
trishagee
74
20k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
190
Building Experiences: Design Systems, User Experience, and Full Site Editing
marktimemedia
0
430
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
55k
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
340
The Limits of Empathy - UXLibs8
cassininazir
1
240
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher