Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
210
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
220
Crowd funded free software
pfctdayelise
0
180
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
250
Funcargs and other fun with pytest
pfctdayelise
0
270
Zookeepr: home grown conference management software
pfctdayelise
0
180
Why "gender" should be a text field
pfctdayelise
0
240
Distributed wikis
pfctdayelise
0
180
Neurosexism
pfctdayelise
0
300
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
180
Other Decks in Technology
See All in Technology
IAMユーザーゼロの運用は果たして可能なのか
yama3133
2
490
AWS re:Invent 2025で見たGrafana最新機能の紹介
hamadakoji
0
430
ExpoのインダストリーブースでみたAWSが見せる製造業の未来
hamadakoji
0
150
ChatGPTで論⽂は読めるのか
spatial_ai_network
11
29k
AIエージェント開発と活用を加速するワークフロー自動生成への挑戦
shibuiwilliam
4
310
大企業でもできる!ボトムアップで拡大させるプラットフォームの作り方
findy_eventslides
1
840
通勤手当申請チェックエージェント開発のリアル
whisaiyo
2
100
AI-DLCを現場にインストールしてみた:プロトタイプ開発で分かったこと・やめたこと
recruitengineers
PRO
2
160
MySQLとPostgreSQLのコレーション / Collation of MySQL and PostgreSQL
tmtms
1
1k
学習データって増やせばいいんですか?
ftakahashi
2
500
シニアソフトウェアエンジニアになるためには
kworkdev
PRO
3
190
Haskell を武器にして挑む競技プログラミング ─ 操作的思考から意味モデル思考へ
naoya
7
1.6k
Featured
See All Featured
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
220
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
32
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
The browser strikes back
jonoalderson
0
55
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
Automating Front-end Workflow
addyosmani
1371
200k
The Pragmatic Product Professional
lauravandoore
37
7.1k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Accessibility Awareness
sabderemane
0
13
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.2k
A designer walks into a library…
pauljervisheath
210
24k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher