Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
140
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
150
Crowd funded free software
pfctdayelise
0
110
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
180
Funcargs and other fun with pytest
pfctdayelise
0
190
Zookeepr: home grown conference management software
pfctdayelise
0
110
Why "gender" should be a text field
pfctdayelise
0
150
Distributed wikis
pfctdayelise
0
110
Neurosexism
pfctdayelise
0
210
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
96
Other Decks in Technology
See All in Technology
Autonomous Database Cloud 技術詳細 / adb-s_technical_detail_jp
oracle4engineer
PRO
15
39k
AI でアップデートする既存テクノロジーと、クラウドエンジニアの生きる道
soracom
PRO
1
150
[RSJ24] Object Retrieval in Large-Scale Indoor Environments Using Dense Text with a Multi-Modal Large Language Model
keio_smilab
PRO
0
220
脆弱星に導かれて
nishimunea
1
1.5k
すぐに始めるAWSコスト削減。短期でできる改善策と長期的な運用負荷軽減への取り組み方を解説
ncdc
1
330
リクルートのデータマネジメント組織に 求められてきたコト
recruitengineers
PRO
4
340
LLMに日本語テキストを学習させる意義
ksaito
13
3.5k
Dive Deep in Cloud Run: Automatic Base Image update
aoto
PRO
0
900
技術力あげたい
hisaichi5518
2
2.8k
Staff Engineer / 20240827 Yuichiro Masui
shift_evolve
0
150
AWS版GitHub?Amazon CodeCatalystの全体像をまとめてみた
oshanqq
1
3.2k
データウェアハウス製品のSnowflakeでPythonが動くって知ってました?
foursue
1
150
Featured
See All Featured
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
109
6.9k
Become a Pro
speakerdeck
PRO
22
4.8k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
26
1.9k
Building Your Own Lightsaber
phodgson
101
5.9k
A designer walks into a library…
pauljervisheath
201
24k
Typedesign – Prime Four
hannesfritz
38
2.3k
Teambox: Starting and Learning
jrom
131
8.7k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
22
580
YesSQL, Process and Tooling at Scale
rocio
167
14k
Pencils Down: Stop Designing & Start Developing
hursman
118
11k
Fantastic passwords and where to find them - at NoRuKo
philnash
47
2.7k
GitHub's CSS Performance
jonrohan
1029
450k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher