Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
190
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
200
Crowd funded free software
pfctdayelise
0
170
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
220
Funcargs and other fun with pytest
pfctdayelise
0
240
Zookeepr: home grown conference management software
pfctdayelise
0
150
Why "gender" should be a text field
pfctdayelise
0
210
Distributed wikis
pfctdayelise
0
160
Neurosexism
pfctdayelise
0
270
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
160
Other Decks in Technology
See All in Technology
AWS全冠したので振りかえってみる
tajimon
0
150
Web3 のリアリティ / Web3 Reality
ks91
PRO
0
100
白金鉱業Meetup_Vol.19_PoCはデモで語れ!顧客の本音とインサイトを引き出すソリューション構築
brainpadpr
2
410
上長や社内ステークホルダーに対する解像度を上げて、より良い補完関係を築く方法 / How-to-increase-resolution-and-build-better-complementary-relationships-with-your-bosses-and-internal-stakeholders
madoxten
13
7.8k
New Cache Hierarchy for Container Images and OCI Artifacts in Kubernetes Clusters using Containerd / KubeCon + CloudNativeCon Japan
pfn
PRO
0
170
菸酒生在 LINE Taiwan 的後端雙刀流
line_developers_tw
PRO
0
230
20250623 Findy Lunch LT Brown
3150
0
390
Clineを含めたAIエージェントを 大規模組織に導入し、投資対効果を考える / Introducing AI agents into your organization
i35_267
1
290
Cloud Native Scalability for Internal Developer Platforms
hhiroshell
2
470
doda開発 生成AI元年宣言!自家製AIエージェントから始める生産性改革 / doda Development Declaration of the First Year of Generated AI! Productivity Reforms Starting with Home-grown AI Agents
techtekt
0
170
技術職じゃない私がVibe Codingで感じた、AGIが身近になる未来
blueb
0
130
OAuth/OpenID Connectで実現するMCPのセキュアなアクセス管理
kuralab
2
220
Featured
See All Featured
How to Think Like a Performance Engineer
csswizardry
24
1.7k
Balancing Empowerment & Direction
lara
1
300
Into the Great Unknown - MozCon
thekraken
39
1.8k
Visualization
eitanlees
146
16k
Automating Front-end Workflow
addyosmani
1370
200k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.3k
Unsuck your backbone
ammeep
671
58k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
123
52k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
5.8k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.3k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Stop Working from a Prison Cell
hatefulcrawdad
269
20k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher