Reference rot in Concordia University’s Spectrum Research Repository
Presented by Kathleen Botter, Systems Librarian at Concordia University Library, at InfoNexus 2016 in Montreal, Canada. Visit http://info-nexus.org/ for more detail.
1/5 Articles suffers from Reference Rot 3.5 million STM articles from 1997 - 2012 arXiv, Elsevier, Pubmed Central 1.8 million articles with open web references - 7/10 articles suffers from reference rot Klein M., Van de Sompel H., Sanderson R., Shankar H., Balakireva L., Zhou K., et al. (2014). Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLoS ONE 9(12), e115253. doi:10.1371/journal.pone.0115253
of Link and Reference Rot in Legal Citations 3 Harvard law and policy related publications ~1996-2012 70% of links suffer reference rot US Supreme Court opinions from CourtListener 50% of links suffer reference rot Zittrain J., Albert K., & Lessig L. (2014). Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations. Legal Information Management 14(2), 88-99. doi: 10.1017/S1472669614000255
exhibit characteristics of reference rot? Do some disciplines exhibit more reference rot than others? Is there a relation between the age of a thesis and reference rot?
All PhD and Masters submitted as ETDs since Spring 2011 PDF/A format 605 (654) PhD theses total (S2011-S2015) Loss: 30 embargoes/restrictions + 19 pdf conversion problems
to xml or txt 3. Use regular expression on each thesis to find links 4. Manual verification (and fixing) of links 5. Use cURL utility to get http status code for each link Output = original URL, final/effective URL, status code
with http status code 200 Manually visit link Use last accessed statement in reference or date of theses publication Look for mementos close to this date Tell by the final/effective url that the link is a custom 404 http://www.gfkrt.com/imperia/md/content/rt- france/cp_gfk_march___de_la_bd_-_39eme___dition_de_la_fibd.pdf http://www.gfk.com/404/ , 200
can’t be converted Blank spaces, ~, _, new lines in links Inconsistent linking http sometimes there sometimes not Ellipses (…) in long links! Mistakes not so obvious Missing .edu, //
25.6% of theses are healthy (155) 49.6% of theses suffer reference rot (300) Links 8046 of 10503 links are healthy ~23.4% of links are afflicted by link rot
Archive’s Wayback Machine Perma.cc – law specific Browser plugins that create mementos automatically Citation style for online resources Hiberlink solution for open web resources: <a href=“http://hiberlink.org” data-versionurl=http://archive.today/CT6mt data-versiondate=“2014-08-12”>