Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DSpace at ILRI

DSpace at ILRI

Presentation to the KALRO Open Data and Open Science in Agriculture workshop in Nairobi, Kenya on June 18, 2015.



Alan Orth

June 18, 2015


  1. A semi-technical overview of “CGSpace” DSpace at ILRI Alan Orth

    Nairobi, Kenya - June 18, 2015 KAINET Open Data and Open Science Workshop
  2. History of DSpace at ILRI • 2009: ILRI launches Mahider

    (“repository” in Amharic) • 2010: Other CGIAR centers and programs join our platform and share hard / soft costs • 2011: Rebranded as “CGSpace” • 2015: 9 CGIAR centers, ~50,000 items, ~250k hits/month
  3. “CGSpace” in June, 2015

  4. How we use DSpace • Content people embedded in each

    department help capture results (presentations, papers, brochures, etc) • Primary location for institutional outputs! • No posting PDFs on corporate website! • Integrate with website and blogs via RSS feeds • Direct ALL traffic to DSpace! • For data sets, videos, etc we make a metadata- only accession with a link to eg YouTube
  5. • Communities, sub-communities, and collections • Tempting to model after

    organization hierarchy! • (we did) • … but organization hierarchies change! DSpace hierarchies
  6. Mostly organized by output type now...

  7. Metadata • Standard Dublin Core is available • No AGROVOC

    • You can create custom controlled vocabularies in arbitrary namespaces, eg: cg.subject.ilri
  8. Custom metadata in ILRI report Not AGROVOC!

  9. “Discovery” facets • Context-aware metadata summaries • Side effect: helps

    spot metadata inconsistencies! • … Open Access, Open access, open Access, etc.
  10. Search engine optimization (SEO) Help Google Scholar consume your content!

    • XML sitemaps • Consistent domain name, eg: cgspace.cgiar.org • Persistent links for resources • Website speed and HTTPS both a plus • Sign up for Google Webmaster Tools to submit sitemap, control indexing, see stats, etc
  11. Sitemap view in Google Webmaster Tools

  12. Importance of persistent links • Website addresses change… • mahider.ilri.org

    -> cgspace.cgiar.org • But resources stay the same! http://hdl.handle.net/10568/67073 • “Handle” service from handle.net • Everything under prefix 10568 is CGSpace • Default DSpace handle prefix is 123456789!
  13. dc.identifier.uri specifies an item’s persistent universal resource identifier (URI)

  14. Getting data INTO DSpace • Day-to-day submission is manual, by

    a small army of editors • One-time batch uploads of items from other systems in CSV format (InMagic!) • OAI-PMH for metadata only • OAI-ORE for metadata + bitstreams (eg, from another DSpace or Sharepoint, etc) • SWORD (haven't tried) • REST API (DSpace 5+, haven't tried)
  15. Getting data OUT OF DSpace • REST API for structured

    JSON or XML • OAI-PMH for metadata • OAI-ORE for metadata + bitstreams (PDFs, etc) • RSS feeds for websites / blogs • XML sitemaps for search engines* *Google discontinued the use of OAI for discovering site content in 2008! http: //googlewebmastercentral.blogspot. com/2008/04/retiring-support-for-oai-pmh-in.html
  16. CCAFS website, driven by Drupal + DSpace APIs

  17. “Latest outputs” on project blog populated via RSS, links to

  18. Open source workflow on GitHub https://github.com/ilri/DSpace

  19. Skills needed in your organization Besides content people(!)... • Prioritize

    Linux systems administration experience (Tomcat, httpd, PostgreSQL, DNS, SSH, git) • General: computer science background • Web developers a diverse bunch... • Java development experience doesn't hurt
  20. Extra considerations • Item mapping • Maintenance tasks (background batch

    jobs) • Backups of assetstore and PostgreSQL! • Altmetrics tracks social media mentions • Separate production / development environments • CGSpace server is $80/month • ~20GB of PDFs, ~8GB of Solr data
  21. Getting help • “DSpace Tech” mailing list • “dspace” tag

    on StackOverflow website • a.orth@cgiar.org