Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
190
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
200
Crowd funded free software
pfctdayelise
0
170
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
220
Funcargs and other fun with pytest
pfctdayelise
0
240
Zookeepr: home grown conference management software
pfctdayelise
0
150
Why "gender" should be a text field
pfctdayelise
0
210
Distributed wikis
pfctdayelise
0
160
Neurosexism
pfctdayelise
0
270
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
160
Other Decks in Technology
See All in Technology
VISITS_AIIoTビジネス共創ラボ登壇資料.pdf
iotcomjpadmin
0
140
BigQuery Remote FunctionでLooker Studioをインタラクティブ化
cuebic9bic
2
190
LinkX_GitHubを基点にした_AI時代のプロジェクトマネジメント.pdf
iotcomjpadmin
0
150
API の仕様から紐解く「MCP 入門」 ~MCP の「コンテキスト」って何だ?~
cdataj
0
180
vLLM meetup Tokyo
jpishikawa
1
260
標準技術と独自システムで作る「つらくない」SaaS アカウント管理 / Effortless SaaS Account Management with Standard Technologies & Custom Systems
yuyatakeyama
2
290
(非公式) AWS Summit Japan と 海浜幕張 の歩き方 2025年版
coosuke
PRO
1
320
新規プロダクト開発、AIでどう変わった? #デザインエンジニアMeetup
bengo4com
0
490
Perk アプリの技術選定とリリースから1年弱経ってのふりかえり
stomk
0
120
Amazon Q Developer for GitHubとAmplify Hosting でサクッとデジタル名刺を作ってみた
kmiya84377
0
3.5k
Devin(Deep) Wiki/Searchの活用で変わる開発の世界観/devin-wiki-search-impact
tomoki10
0
790
CSS、JSをHTMLテンプレートにまとめるフロントエンド戦略
d120145
0
180
Featured
See All Featured
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
2.8k
Music & Morning Musume
bryan
46
6.6k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
6
690
BBQ
matthewcrist
89
9.7k
How GitHub (no longer) Works
holman
314
140k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.1k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
910
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
20
1.3k
Site-Speed That Sticks
csswizardry
10
640
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher