Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
190
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
200
Crowd funded free software
pfctdayelise
0
170
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
230
Funcargs and other fun with pytest
pfctdayelise
0
240
Zookeepr: home grown conference management software
pfctdayelise
0
160
Why "gender" should be a text field
pfctdayelise
0
210
Distributed wikis
pfctdayelise
0
160
Neurosexism
pfctdayelise
0
270
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
160
Other Decks in Technology
See All in Technology
TableauLangchainとは何か?
cielo1985
1
140
United™️ Airlines®️ Customer®️ USA Contact Numbers: Complete 2025 Support Guide
flyunitedguide
0
730
DatabricksにOLTPデータベース『Lakebase』がやってきた!
inoutk
0
150
Enhancing SaaS Product Reliability and Release Velocity through Optimized Testing Approach
ropqa
1
250
〜『世界中の家族のこころのインフラ』を目指して”次の10年”へ〜 SREが導いたグローバルサービスの信頼性向上戦略とその舞台裏 / Towards the Next Decade: Enhancing Global Service Reliability
kohbis
2
900
ロールが細分化された組織でSREは何をするか?
tgidgd
1
160
QuickSight SPICE の効果的な運用戦略~S3 + Athena 構成での実践ノウハウ~/quicksight-spice-s3-athena-best-practices
emiki
0
240
SREのためのeBPF活用ステップアップガイド
egmc
1
780
Rethinking Incident Response: Context-Aware AI in Practice
rrreeeyyy
1
230
LLM時代の検索
shibuiwilliam
2
620
クラウド開発の舞台裏とSRE文化の醸成 / SRE NEXT 2025 Lunch Session
kazeburo
1
410
VGGT: Visual Geometry Grounded Transformer
peisuke
1
560
Featured
See All Featured
A better future with KSS
kneath
238
17k
Rebuilding a faster, lazier Slack
samanthasiow
83
9.1k
Testing 201, or: Great Expectations
jmmastey
43
7.6k
The Pragmatic Product Professional
lauravandoore
35
6.7k
Designing for Performance
lara
610
69k
Statistics for Hackers
jakevdp
799
220k
Git: the NoSQL Database
bkeepers
PRO
430
65k
Agile that works and the tools we love
rasmusluckow
329
21k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
46
9.6k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.6k
Building Adaptive Systems
keathley
43
2.7k
Building Applications with DynamoDB
mza
95
6.5k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher