Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DSpace at ILRI

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

DSpace at ILRI

Presentation to the KALRO Open Data and Open Science in Agriculture workshop in Nairobi, Kenya on June 18, 2015.

http://aims.fao.org/activity/blog/forum-open-data-and-open-science-agriculture-kenya

Avatar for Alan Orth

Alan Orth

June 18, 2015
Tweet

More Decks by Alan Orth

Other Decks in Technology

Transcript

  1. A semi-technical overview of “CGSpace” DSpace at ILRI Alan Orth

    Nairobi, Kenya - June 18, 2015 KAINET Open Data and Open Science Workshop
  2. History of DSpace at ILRI • 2009: ILRI launches Mahider

    (“repository” in Amharic) • 2010: Other CGIAR centers and programs join our platform and share hard / soft costs • 2011: Rebranded as “CGSpace” • 2015: 9 CGIAR centers, ~50,000 items, ~250k hits/month
  3. How we use DSpace • Content people embedded in each

    department help capture results (presentations, papers, brochures, etc) • Primary location for institutional outputs! • No posting PDFs on corporate website! • Integrate with website and blogs via RSS feeds • Direct ALL traffic to DSpace! • For data sets, videos, etc we make a metadata- only accession with a link to eg YouTube
  4. • Communities, sub-communities, and collections • Tempting to model after

    organization hierarchy! • (we did) • … but organization hierarchies change! DSpace hierarchies
  5. Metadata • Standard Dublin Core is available • No AGROVOC

    • You can create custom controlled vocabularies in arbitrary namespaces, eg: cg.subject.ilri
  6. “Discovery” facets • Context-aware metadata summaries • Side effect: helps

    spot metadata inconsistencies! • … Open Access, Open access, open Access, etc.
  7. Search engine optimization (SEO) Help Google Scholar consume your content!

    • XML sitemaps • Consistent domain name, eg: cgspace.cgiar.org • Persistent links for resources • Website speed and HTTPS both a plus • Sign up for Google Webmaster Tools to submit sitemap, control indexing, see stats, etc
  8. Importance of persistent links • Website addresses change… • mahider.ilri.org

    -> cgspace.cgiar.org • But resources stay the same! http://hdl.handle.net/10568/67073 • “Handle” service from handle.net • Everything under prefix 10568 is CGSpace • Default DSpace handle prefix is 123456789!
  9. Getting data INTO DSpace • Day-to-day submission is manual, by

    a small army of editors • One-time batch uploads of items from other systems in CSV format (InMagic!) • OAI-PMH for metadata only • OAI-ORE for metadata + bitstreams (eg, from another DSpace or Sharepoint, etc) • SWORD (haven't tried) • REST API (DSpace 5+, haven't tried)
  10. Getting data OUT OF DSpace • REST API for structured

    JSON or XML • OAI-PMH for metadata • OAI-ORE for metadata + bitstreams (PDFs, etc) • RSS feeds for websites / blogs • XML sitemaps for search engines* *Google discontinued the use of OAI for discovering site content in 2008! http: //googlewebmastercentral.blogspot. com/2008/04/retiring-support-for-oai-pmh-in.html
  11. Skills needed in your organization Besides content people(!)... • Prioritize

    Linux systems administration experience (Tomcat, httpd, PostgreSQL, DNS, SSH, git) • General: computer science background • Web developers a diverse bunch... • Java development experience doesn't hurt
  12. Extra considerations • Item mapping • Maintenance tasks (background batch

    jobs) • Backups of assetstore and PostgreSQL! • Altmetrics tracks social media mentions • Separate production / development environments • CGSpace server is $80/month • ~20GB of PDFs, ~8GB of Solr data