Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DSpace at ILRI

DSpace at ILRI

Presentation to the KALRO Open Data and Open Science in Agriculture workshop in Nairobi, Kenya on June 18, 2015.

http://aims.fao.org/activity/blog/forum-open-data-and-open-science-agriculture-kenya

Alan Orth

June 18, 2015
Tweet

More Decks by Alan Orth

Other Decks in Technology

Transcript

  1. A semi-technical overview of “CGSpace” DSpace at ILRI Alan Orth

    Nairobi, Kenya - June 18, 2015 KAINET Open Data and Open Science Workshop
  2. History of DSpace at ILRI • 2009: ILRI launches Mahider

    (“repository” in Amharic) • 2010: Other CGIAR centers and programs join our platform and share hard / soft costs • 2011: Rebranded as “CGSpace” • 2015: 9 CGIAR centers, ~50,000 items, ~250k hits/month
  3. How we use DSpace • Content people embedded in each

    department help capture results (presentations, papers, brochures, etc) • Primary location for institutional outputs! • No posting PDFs on corporate website! • Integrate with website and blogs via RSS feeds • Direct ALL traffic to DSpace! • For data sets, videos, etc we make a metadata- only accession with a link to eg YouTube
  4. • Communities, sub-communities, and collections • Tempting to model after

    organization hierarchy! • (we did) • … but organization hierarchies change! DSpace hierarchies
  5. Metadata • Standard Dublin Core is available • No AGROVOC

    • You can create custom controlled vocabularies in arbitrary namespaces, eg: cg.subject.ilri
  6. “Discovery” facets • Context-aware metadata summaries • Side effect: helps

    spot metadata inconsistencies! • … Open Access, Open access, open Access, etc.
  7. Search engine optimization (SEO) Help Google Scholar consume your content!

    • XML sitemaps • Consistent domain name, eg: cgspace.cgiar.org • Persistent links for resources • Website speed and HTTPS both a plus • Sign up for Google Webmaster Tools to submit sitemap, control indexing, see stats, etc
  8. Importance of persistent links • Website addresses change… • mahider.ilri.org

    -> cgspace.cgiar.org • But resources stay the same! http://hdl.handle.net/10568/67073 • “Handle” service from handle.net • Everything under prefix 10568 is CGSpace • Default DSpace handle prefix is 123456789!
  9. Getting data INTO DSpace • Day-to-day submission is manual, by

    a small army of editors • One-time batch uploads of items from other systems in CSV format (InMagic!) • OAI-PMH for metadata only • OAI-ORE for metadata + bitstreams (eg, from another DSpace or Sharepoint, etc) • SWORD (haven't tried) • REST API (DSpace 5+, haven't tried)
  10. Getting data OUT OF DSpace • REST API for structured

    JSON or XML • OAI-PMH for metadata • OAI-ORE for metadata + bitstreams (PDFs, etc) • RSS feeds for websites / blogs • XML sitemaps for search engines* *Google discontinued the use of OAI for discovering site content in 2008! http: //googlewebmastercentral.blogspot. com/2008/04/retiring-support-for-oai-pmh-in.html
  11. Skills needed in your organization Besides content people(!)... • Prioritize

    Linux systems administration experience (Tomcat, httpd, PostgreSQL, DNS, SSH, git) • General: computer science background • Web developers a diverse bunch... • Java development experience doesn't hurt
  12. Extra considerations • Item mapping • Maintenance tasks (background batch

    jobs) • Backups of assetstore and PostgreSQL! • Altmetrics tracks social media mentions • Separate production / development environments • CGSpace server is $80/month • ~20GB of PDFs, ~8GB of Solr data