Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Maximizing the impact of institutional knowledge using DSpace

Maximizing the impact of institutional knowledge using DSpace

Presentation for a AIMS@FAO webinar:

http://aims.fao.org/activity/blog/new-webinaraims-maximizing-impact-institutional-knowledge-using-dspace

Recorded webinar:

http://aims.fao.org/capacity-development/webinars/maximizing-impact-institutional-knowledge-using-dspace

This presentation has a Creative Commons licence. You are free to re-use or distribute this work for non-commercial purposes, provided credit is given to ILRI.

Alan Orth

July 28, 2015
Tweet

More Decks by Alan Orth

Other Decks in Technology

Transcript

  1. Maximizing the impact of institutional knowledge using DSpace Alan Orth

    Nairobi, Kenya - July 28, 2015 Webinar for AIMS@FAO
  2. Overview • Why we use DSpace • How we use

    DSpace • Organizational tips for using DSpace • Technical tips for DSpace deployments
  3. DSpace helps make information “F.A.I.R” • Free: no subscriptions, “paywalls”,

    etc • Accessible: is publicly available • Indexed: can be found in search engines • Reusable: has a permissive license Addresses both the moral and legal imperatives… aka the “carrot” and the “stick”.
  4. History of DSpace at ILRI • Before: InMagic, physical library

    • 2009: ILRI launches Mahider (“repository” in Amharic) • 2010: Other CGIAR research centers and programs join our platform and share hard / soft costs • 2011: Rebranded as “CGSpace” • 2015: 9 CGIAR centers, ~50,000 items, ~200k hits/month
  5. How we use DSpace • Primary location for institutional outputs!

    • (No posting PDFs on corporate website!) • Content people embedded in each department help capture results (presentations, papers, brochures, etc) • Integrate with website and blogs via RSS feeds • (Direct ALL traffic to DSpace!) • For data sets, videos, etc we make a metadata- only accession with a link to eg YouTube
  6. • Communities, sub-communities, and collections • Tempting to model after

    organization hierarchy! • (we did) • … but organization hierarchies change! DSpace hierarchies
  7. Metadata • Standard Dublin Core is available • No AGROVOC!

    • You can create custom controlled vocabularies in arbitrary namespaces, eg: cg.subject.ilri • Display custom fields selectively in the XMLUI item list and view pages
  8. “Discovery” facets • Context-aware metadata summaries • Great for content

    people and users alike • Side effect: helps spot metadata inconsistencies! • … Open Access, Open access, open Access, etc. • DSpace 4+, XMLUI
  9. Search engine optimization (SEO) Help Google Scholar consume your content...

    1. XML sitemaps (see DSpace manual) 2. Submit sitemap to Google Webmaster Tools to control indexing, see stats, etc. 3. Single, consistent domain name, ie: cgspace.cgiar. org 4. Persistent links for resources (“Handle”) 5. Website speed and HTTPS both a plus 6. Bing, Yahoo, and Yandex less important
  10. SEO: crawling vs consuming • Traditionally search engines basically “stumble”

    upon your content • Using XML sitemaps they can consume it in a structured way • Google discontinued the use of OAI for discovering site content in 2008! Drinking from the firehose!
  11. Importance of persistent links • Website addresses change… • mahider.ilri.org

    -> cgspace.cgiar.org • But resources stay the same! http://hdl.handle.net/10568/67073 • “Handle” service from handle.net • Everything under prefix 10568 is CGSpace • Default DSpace handle prefix is 123456789!
  12. Getting data INTO DSpace • Day-to-day submission is manual (by

    a small army of editors) • One-time batch uploads of items from other systems in CSV format (InMagic!) • OAI-PMH for metadata only • OAI-ORE for metadata + bitstreams (eg, from another DSpace, Sharepoint, etc) • SWORD (haven't tried) • REST API (DSpace 5+, haven't tried)
  13. Getting data OUT OF DSpace • REST API for structured

    JSON or XML • OAI-PMH for metadata • OAI-ORE for metadata + bitstreams (PDFs, etc) • RSS feeds for websites / blogs • XML sitemaps for search engines
  14. CGSpace technology stack - NGINX 1.8 HTTP server - TLS

    termination, SPDY, redirects, virtual hosts - Tomcat 7 servlet engine - runs DSpace, bound to localhost - Ubuntu 14.04 GNU/Linux OS - long-term support release, good mix of stable / new
  15. Skills needed in your organization Besides content people(!)... • Prioritize:

    Linux systems administration experience (Tomcat, httpd, PostgreSQL, DNS, SSH, git) • General: computer science background • Web developers a diverse bunch... • Java development experience doesn't hurt
  16. Extra considerations • Item mapping • Maintenance tasks (background batch

    jobs) • Backups of assetstore and PostgreSQL! • Altmetrics tracks social media mentions • Separate production / development environments • CGSpace server is $80/month • ~20GB of PDFs, ~8GB of Solr data
  17. This presentation has a Creative Commons licence. You are free

    to re-use or distribute this work for non-commercial purposes, provided credit is given to ILRI. better lives through livestock ilri.org Box 30709, Nairobi 00100, Kenya Phone +254 20 422 3000 Fax +254 20 422 3001 Email [email protected] ilri.org better lives through livestock ILRI is a member of the CGIAR consortium ILRI has offices in: Central America • East Africa South Asia • Southeast and East Asia Southern Africa • West Africa