Todd Vision, University of North Carolina at Chapel Hill http://orcid.org/0000-‐0002-‐6133-‐2581, @tjvision 17-‐Jul-‐2014 You may reuse any of the original content in these slides as you wish, provided you attribute the source Science Bootcamp 1 CC-‐BY-‐NC-‐SA nic221 http://www.flickr.com/ photos/ nic221/391536867/
research data o How to achieve archiving of long-‐tail data associated with the scientific literature o How to make data archives interoperate with the rest of the scholarly communications infrastructure o How to educate yourself and others about the data landscape o I will use Dryad (and DataONE) as models 17-‐Jul-‐2014 Science Bootcamp 4
and their supplements o Datasets o Software (including scripts, models, workflows) o Presentations (slides, video) o Grant proposals o Reviews o Other digital content (e.g. wikis, blogs, ontologies) o Materials, reagants, equipment o Lab notebooks o etc… 17-‐Jul-‐2014 Science Bootcamp 5
Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.
PDB, ClinVar, etc. Long tail data (e.g. much statistical data) Graphic from B. Heidorn 17-‐Jul-‐2014 17 “Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray
data from from 141 articles in American Psychological Association journals. “6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied • Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61, 726-728. • Wicherts JM, Bakker M, Molenaar D (2011) Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLoS ONE 6(11):e26828 17-‐Jul-‐2014 Science Bootcamp 19
research data o How to achieve archiving of long-‐tail data associated with the scientific literature o How to make data archives interoperate with the rest of the scholarly communications infrastructure o How to educate yourself and others about the data landscape 17-‐Jul-‐2014 Science Bootcamp 22
(2009) Keeping Research Data Safe 2 Direct Verification of published research Preserving accessibility to data Allowing reuse and repurposing of data Discoverability of data Indirect (costs avoided) Redundant data collection Inefficient legacy data curation Burden of sharing-upon-request Opportunity cost of science not done Near term Protection against personnel turnover Availability for review and validation Long term Secure long-term stewardship Increased impact per publication Private Increased citations New collaborations New research opportunities Fulfilling funding mandates Public More efficient use of research dollars Public trust in science Educational opportunities Improved methodologies More informed policy
for sharing research data. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1 Data policies among bioscience journals n=70 IF=3.6 IF=4.5 IF=6.0
products of the scientific enterprise, and they should be preserved and usable for decades in the future. As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive. Authors may elect to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information. http://datadryad.org/pages/jdap 17-‐Jul-‐2014 Science Bootcamp 28
o Data packages 5,469 o Data files 16,436 o Journals 329 o Authors 19,857 o Downloads 490,448 28-‐May-‐2014 Dryad-‐Dataverse Community Meeting – Cambridge, MA 33
Dryad-‐Dataverse Community Meeting – Cambridge, MA 35 A. Embargo selections of Dryad data authors for the 10,108 files in Dryad deposited from inception to September 20, 2013. Data include only datasets related to articles published in journals for which the authors had the option of selecting an embargo. B. Long-‐term embargoes (>1 year) by journal that granted them. Data: Vision TJ , Scherle R, Mannheimer S (2013) Embargo selections of Dryad data authors. FigShare. http://doi.org/10.6084/ m9.figshare.805946. Article: Roche DG, Lanfear R, Binning SA, Haff TM, et al. (2014) Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e1001779
A (2012) Data from: Monsters are people too. Dryad Digital Repository. doi:10.5061/dryad.4rk06 Levy J, Foulsham T, Kingstone A (2012) Monsters are people too. Biology Letters 9(1): 20120850. doi:10.1098/rsbl.2012.0850 Beholder image from Dungeons & Dragons Monster Manual via Discover Magazine blog
research data o How to achieve archiving of long-‐tail data associated with the scientific literature o How to make data archives interoperate with the rest of the scholarly communications infrastructure o How to educate yourself and others about the data landscape 17-‐Jul-‐2014 Science Bootcamp 44
research data o How to achieve archiving of long-‐tail data associated with the scientific literature o How to make data archives interoperate with the rest of the scholarly communications infrastructure o How to educate yourself and others about the data landscape 17-‐Jul-‐2014 Science Bootcamp 59
that we need directories of them: o http://re3data.org o http://DataBib.org These repositories vary along many dimensions: o Datatype focus o Community focus o Allowed file sizes o Curation policies o Data access policies o Funding model 17-‐Jul-‐2014 Science Bootcamp 60
research networks universities & libraries societies Journals How to enable the different stakeholders to bring data into the fold of scholarly communication? 17-‐Jul-‐2014 Science Bootcamp 64