Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Wiki[mp]edia data sources & the MediaWiki API

Wiki[mp]edia data sources & the MediaWiki API

Presented at Melbourne Hack Weekend, 2009.

Brianna Laugher

November 09, 2009

More Decks by Brianna Laugher

Other Decks in Technology


  1. Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for

    #melhack November 2009
  2. ...

  3. Wikipedia 13M articles total 3M+ articles in English 240+ languages

    Simple English!
  4. {{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack

  5. {{Infobox Company |name = Lonely Planet |logo = |type =

    [[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
  6. Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly

    photographs, but also diagrams, maps, flags
  7. None
  8. Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries

    French biggest at 1.5M (English second at 1.4M)
  9. JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing

  10.  Users  Logs  Pages, subpages, talk pages 

    Links, backlinks  Templates  Categories MediaWiki structure
  11. MediaWiki markup The only thing that completely understands it is

    MediaWiki :(
  12. XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database

  13. DBpedia Community project extracting structured data from Wikipedia and making

    it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
  14. MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!

  15. mwclient Python library for accessing MediaWiki APIs

  16. None
  17. toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks

    – tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
  18. TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia

    Commons Lets you query templates very much like SQL
  19. identi.ca/pfctdayelise [email protected] Thanks! Logos and screenshots may be copyright their

    respective owners Slides are otherwise © Brianna Laugher