Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Semantic Content Repositories

Semantic Content Repositories

Leveraging Drupal and other FOSS tools, for a connected meaningful web. Presented at FOSS.IN/2010, Bangalore.

Pratul Kalia

December 17, 2010
Tweet

More Decks by Pratul Kalia

Other Decks in Technology

Transcript

  1. FOSS.IN/2010, Bengaluru, India
    Semantic Content Repositories
    Leveraging Drupal and other FOSS tools
    for a connected meaningful web
    Pratul Kalia (lut4rp)
    17th December, 2010

    View Slide

  2. FOSS.IN/2010, Bengaluru, India
    Semantic Web
    • What is missing?
    • Data processing by machines
    • The most overused term today - web 2.0
    • Beyond the gloss, shine and reflections...
    • Tim Berners-Lee and his Web 3.0

    View Slide

  3. FOSS.IN/2010, Bengaluru, India
    The Boring Big Words
    • Metadata
    • Ontologies
    • RDF
    • Taxonomies
    • FOAF, DBpedia etc. etc.

    View Slide

  4. FOSS.IN/2010, Bengaluru, India
    Enough of talk...
    • How do you actually *do* it?
    • Not in the future...
    • Today. Here. Now.

    View Slide

  5. FOSS.IN/2010, Bengaluru, India
    Why Drupal?
    • Data abstraction
    • nodes - first class generic objects
    • Taxonomies
    • freeform, restricted, multilevel
    • RDFa support

    View Slide

  6. FOSS.IN/2010, Bengaluru, India
    Why Drupal?
    • FOSS == community
    • Contributions:
    • faceted search
    • Apache Solr integration
    • It is already happening!
    • data.gov.uk — all Drupal!

    View Slide

  7. FOSS.IN/2010, Bengaluru, India
    Semantics on the go
    • Why?
    • Cost, portability, ease of use
    • XML-RPC, REST
    • Android
    • Example - Drupal Services

    View Slide

  8. FOSS.IN/2010, Bengaluru, India
    Agropedia: what is it?
    • Semantically enabled
    agricultural knowledge
    (knowledge models)
    • Read/write web (CMS built
    using Drupal)
    • A “social networking”
    platform for agricultural
    knowledge exchange
    Started at IITK with
    funding from NAIP
    (National Agriculture
    Innovation Project) and
    ICAR (Indian Council of
    Agriculture Research)
    Now, 20 institutes and
    universities across India
    with 100+ agricultural
    scientists

    View Slide

  9. FOSS.IN/2010, Bengaluru, India
    Agropedia: deploying
    • Offline appliance
    • Android
    • Wikireader
    • Website

    View Slide

  10. FOSS.IN/2010, Bengaluru, India
    Agrotagger
    • What is it?
    • Taxonomies
    • Agrovoc
    • Agrotags
    • Document (pdf, ppt. odp, odt) conversion to text
    • Stop words removal and stemming of converted text
    • Intersection of list of words with Agrovoc terms...
    • a bag of Agrovoc terms generated!
    • Mapping of a bag of Agrovoc terms with Agrotags...
    • a set of Agrotags generated!
    • Probability analysis of Agrotags generated on...
    • Length of a phrase in words
    • Frequency of the words
    • Node Degree of the candidate terms
    • Occurrence based on location of the terms
    • Appearance: Binary Variable to check the presence
    of the terms.
    • Extracted top “n” Agrotags as an output!
    Yikes!

    View Slide

  11. FOSS.IN/2010, Bengaluru, India
    Where is the code?
    • Efforts are on to open up the code
    • (edit: might be open before this talk actually
    happens)

    View Slide

  12. FOSS.IN/2010, Bengaluru, India
    EOF
    • More questions?

    View Slide