$30 off During Our Annual Pro Sale. View Details »

Wiki[mp]edia data sources & the MediaWiki API

Wiki[mp]edia data sources & the MediaWiki API

Presented at Melbourne Hack Weekend, 2009.

Brianna Laugher

November 09, 2009
Tweet

More Decks by Brianna Laugher

Other Decks in Technology

Transcript

  1. Wiki[mp]edia
    data sources &
    the MediaWiki API
    Brianna Laugher
    for #melhack
    November 2009

    View Slide

  2. ...

    View Slide

  3. Wikipedia
    13M articles total
    3M+ articles in English
    240+ languages
    Simple English!

    View Slide

  4. {{coord|37|48|49|S|144|57|
    47|E|type:city_region:AU-VIC|
    display=inline,title}}
    stable.toolserver.org/geohack/
    wiki.toolserver.org/view/GeoHack

    View Slide

  5. {{Infobox Company
    |name = Lonely Planet
    |logo =
    |type = [[United Kingdom|British]] [[Government-owned
    company|government-owned]] (subsidiary of [[BBC Worldwide]])
    |genre = [[Guide book|Travel guides]]
    |foundation = 1972
    |founder = Tony Wheeler

    Maureen Wheeler
    |location_city = [[Footscray, Victoria]]
    |location_country = [[Australia]]
    |location =
    |origins =
    |key_people = Matt Goldberg
    (Global [[CEO]])
    |area_served = Worldwide
    |industry = [[Multi media]]
    |products = Travel [[guidebook,
    digital applications, online travel
    community]]
    |services =

    View Slide

  6. Wikimedia Commons
    commons.wikimedia.org
    Multilingual
    5M+ files
    “Self-created”, PD, Flickr
    Predominantly photographs,
    but also diagrams, maps, flags

    View Slide

  7. View Slide

  8. Wiktionary
    5M+ entries
    170+ languages
    13 languages > 100K entries
    French biggest at 1.5M
    (English second at 1.4M)

    View Slide

  9. JavaScript Wiktionary lookup plugin for
    third parties:
    http://bawolff.blogspot.com/2009/10/introducing-
    wiktionary-lookup-now-for.html
    http://en.wiktionary.org/wiki/Wiktionary:Parsing

    View Slide


  10. Users

    Logs

    Pages, subpages, talk pages

    Links, backlinks

    Templates

    Categories
    MediaWiki structure

    View Slide

  11. MediaWiki markup
    The only thing that completely
    understands it is MediaWiki :(

    View Slide

  12. XML
    download.wikimedia.org
    OR Amazon Public Data Sets
    meta.wikimedia.org/wiki/
    Data_dumps
    Database dumps

    View Slide

  13. DBpedia
    Community project extracting
    structured data from Wikipedia and
    making it available
    Can download data sets or query them
    online
    Ontology++
    e.g. dbpedia.org/page/Lonely_Planet

    View Slide

  14. MediaWiki API
    mediawiki.org/wiki/API
    en.wikipedia.org/w/api.php
    Client libraries!

    View Slide

  15. mwclient
    Python library for
    accessing MediaWiki
    APIs

    View Slide

  16. View Slide

  17. toolserver.org
    Server for community-developed plugins,
    addons, extensions, stats and hacks –
    tools
    Tools often explicitly implements implicit
    editing community standards
    (“community API”)
    Toolserver

    View Slide

  18. TemplateTiger
    toolserver.org/~kolossos/templatetiger/
    For a few dozen Wikipedia languages, &
    Wikimedia Commons
    Lets you query templates very much
    like SQL

    View Slide

  19. identi.ca/pfctdayelise
    [email protected]
    Thanks!
    Logos and screenshots
    may be copyright
    their respective owners
    Slides are otherwise
    © Brianna Laugher

    View Slide