Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Maximizing the impact of institutional knowledge using DSpace

Maximizing the impact of institutional knowledge using DSpace

Presentation for a AIMS@FAO webinar:


Recorded webinar:


This presentation has a Creative Commons licence. You are free to re-use or distribute this work for non-commercial purposes, provided credit is given to ILRI.


Alan Orth

July 28, 2015


  1. Maximizing the impact of institutional knowledge using DSpace Alan Orth

    Nairobi, Kenya - July 28, 2015 Webinar for AIMS@FAO
  2. Overview • Why we use DSpace • How we use

    DSpace • Organizational tips for using DSpace • Technical tips for DSpace deployments
  3. DSpace helps make information “F.A.I.R” • Free: no subscriptions, “paywalls”,

    etc • Accessible: is publicly available • Indexed: can be found in search engines • Reusable: has a permissive license Addresses both the moral and legal imperatives… aka the “carrot” and the “stick”.
  4. History of DSpace at ILRI • Before: InMagic, physical library

    • 2009: ILRI launches Mahider (“repository” in Amharic) • 2010: Other CGIAR research centers and programs join our platform and share hard / soft costs • 2011: Rebranded as “CGSpace” • 2015: 9 CGIAR centers, ~50,000 items, ~200k hits/month
  5. “CGSpace” in July, 2015

  6. How we use DSpace • Primary location for institutional outputs!

    • (No posting PDFs on corporate website!) • Content people embedded in each department help capture results (presentations, papers, brochures, etc) • Integrate with website and blogs via RSS feeds • (Direct ALL traffic to DSpace!) • For data sets, videos, etc we make a metadata- only accession with a link to eg YouTube
  7. • Communities, sub-communities, and collections • Tempting to model after

    organization hierarchy! • (we did) • … but organization hierarchies change! DSpace hierarchies
  8. Mostly organized by output type now...

  9. Metadata • Standard Dublin Core is available • No AGROVOC!

    • You can create custom controlled vocabularies in arbitrary namespaces, eg: cg.subject.ilri • Display custom fields selectively in the XMLUI item list and view pages
  10. Custom metadata displayed on ILRI item page

  11. “Discovery” facets • Context-aware metadata summaries • Great for content

    people and users alike • Side effect: helps spot metadata inconsistencies! • … Open Access, Open access, open Access, etc. • DSpace 4+, XMLUI
  12. Search engine optimization (SEO) Help Google Scholar consume your content...

    1. XML sitemaps (see DSpace manual) 2. Submit sitemap to Google Webmaster Tools to control indexing, see stats, etc. 3. Single, consistent domain name, ie: cgspace.cgiar. org 4. Persistent links for resources (“Handle”) 5. Website speed and HTTPS both a plus 6. Bing, Yahoo, and Yandex less important
  13. SEO: crawling vs consuming • Traditionally search engines basically “stumble”

    upon your content • Using XML sitemaps they can consume it in a structured way • Google discontinued the use of OAI for discovering site content in 2008! Drinking from the firehose!
  14. Sitemap view in Google Webmaster Tools

  15. Meteoric rise in Google’s indexes

  16. Importance of persistent links • Website addresses change… • mahider.ilri.org

    -> cgspace.cgiar.org • But resources stay the same! http://hdl.handle.net/10568/67073 • “Handle” service from handle.net • Everything under prefix 10568 is CGSpace • Default DSpace handle prefix is 123456789!
  17. dc.identifier.uri: persistent universal resource identifier

  18. Getting data INTO DSpace • Day-to-day submission is manual (by

    a small army of editors) • One-time batch uploads of items from other systems in CSV format (InMagic!) • OAI-PMH for metadata only • OAI-ORE for metadata + bitstreams (eg, from another DSpace, Sharepoint, etc) • SWORD (haven't tried) • REST API (DSpace 5+, haven't tried)
  19. Getting data OUT OF DSpace • REST API for structured

    JSON or XML • OAI-PMH for metadata • OAI-ORE for metadata + bitstreams (PDFs, etc) • RSS feeds for websites / blogs • XML sitemaps for search engines
  20. CCAFS website, powered by Drupal + DSpace APIs

  21. “Latest outputs” on ILRI homepage, via DSpace RSS

  22. “Latest outputs” on project blog, via DSpace RSS

  23. CGSpace technology stack - NGINX 1.8 HTTP server - TLS

    termination, SPDY, redirects, virtual hosts - Tomcat 7 servlet engine - runs DSpace, bound to localhost - Ubuntu 14.04 GNU/Linux OS - long-term support release, good mix of stable / new
  24. https://github.com/ilri/DSpace Open source workflow on GitHub

  25. Skills needed in your organization Besides content people(!)... • Prioritize:

    Linux systems administration experience (Tomcat, httpd, PostgreSQL, DNS, SSH, git) • General: computer science background • Web developers a diverse bunch... • Java development experience doesn't hurt
  26. Extra considerations • Item mapping • Maintenance tasks (background batch

    jobs) • Backups of assetstore and PostgreSQL! • Altmetrics tracks social media mentions • Separate production / development environments • CGSpace server is $80/month • ~20GB of PDFs, ~8GB of Solr data
  27. Getting help • “DSpace Tech” mailing list • “dspace” tag

    on StackOverflow website • a.orth@cgiar.org
  28. This presentation has a Creative Commons licence. You are free

    to re-use or distribute this work for non-commercial purposes, provided credit is given to ILRI. better lives through livestock ilri.org Box 30709, Nairobi 00100, Kenya Phone +254 20 422 3000 Fax +254 20 422 3001 Email ilri-kenya@cgiar.org ilri.org better lives through livestock ILRI is a member of the CGIAR consortium ILRI has offices in: Central America • East Africa South Asia • Southeast and East Asia Southern Africa • West Africa