Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dr. Todd Vision - Bringing Data into the Fold o...

Dr. Todd Vision - Bringing Data into the Fold of Scholarly Communication

July 17, 2014 at Science Boot Camp Southeast for Librarians, Raleigh NC

More Decks by Science Boot Camp for Librarians Southeast 2014

Other Decks in Science

Transcript

  1. Bringing  data  into  the  fold  of   scholarly  communication  

    Todd  Vision,  University  of  North  Carolina  at  Chapel  Hill   http://orcid.org/0000-­‐0002-­‐6133-­‐2581,  @tjvision     17-­‐Jul-­‐2014   You  may  reuse  any  of   the  original  content  in   these  slides  as  you  wish,   provided  you  attribute   the  source   Science  Bootcamp   1   CC-­‐BY-­‐NC-­‐SA  nic221   http://www.flickr.com/ photos/ nic221/391536867/  
  2. Traditional  roles  of  a  journal   o Registration   o Certification  

    o Dissemination   o Archiving   17-­‐Jul-­‐2014   Science  Bootcamp   2  
  3. New  York  Public  Library,  CC-­‐BY-­‐NC-­‐SA  2006  Elena  Romera   http://www.flickr.com/photos/elenaromera/353826561/

      Some  aspects  of  scholarly   communication  have  adapted  more   quickly  than  others  to  the  internet   17-­‐Jul-­‐2014   Science  Bootcamp   3  
  4. Outline   o  Challenges  of  availability  for  different  kinds  of

      research  data     o  How  to  achieve  archiving  of  long-­‐tail  data  associated   with  the  scientific  literature   o  How  to  make  data  archives  interoperate  with  the   rest  of  the  scholarly  communications  infrastructure     o  How  to  educate  yourself  and  others  about  the  data   landscape   o  I  will  use  Dryad  (and  DataONE)  as  models   17-­‐Jul-­‐2014   Science  Bootcamp   4  
  5. What  are  research  outputs?   o  Traditional  publications  (articles,  monographs)

     and   their  supplements   o  Datasets   o  Software  (including  scripts,  models,  workflows)   o  Presentations  (slides,  video)   o  Grant  proposals   o  Reviews   o  Other  digital  content  (e.g.  wikis,  blogs,  ontologies)   o  Materials,  reagants,  equipment   o  Lab  notebooks   o  etc…   17-­‐Jul-­‐2014   Science  Bootcamp   5  
  6. Public  Participation  in  Scientific  Research  Conference:  4-­‐5  August  2012  in

      Portland,  Oregon  USA  prior  to  Ecological  Society  of  America  meeting  (6-­‐10  Aug.):   http://www.birds.cornell.edu/citscitoolkit/conference/2012   8  
  7. Source:  Publishing  Research  Consortium,  http://publishingresearch.net     n=3824   10

      Ease  of  access  vs.  importance   17-­‐Jul-­‐2014   Science  Bootcamp  
  8. The  long  tail  of  orphan  data  in  “small  science”  

    Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.
  9. Over  1M  research  articles  published  annually   14   Nature

     480,  426–429  (2011)  http://doi.org/10.1038/480426a  
  10. 15   What  happens  to  all   the  data  underlying

     the   millions  of  articles   published  every  year?  
  11. Science  Bootcamp   Data  Volume   Intellectual  complexity   Genbank,

      PDB,   ClinVar,   etc.   Long  tail  data     (e.g.  much  statistical  data)   Graphic  from  B.  Heidorn   17-­‐Jul-­‐2014   16  
  12. Science  Bootcamp   Data  Volume   Intellectual  complexity   Genbank,

      PDB,   ClinVar,   etc.   Long  tail  data     (e.g.  much  statistical  data)   Graphic  from  B.  Heidorn   17-­‐Jul-­‐2014   17   “Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray
  13. As  needed  data  sharing  policies   Wicherts  and  colleagues  requested

     data  from   from  141  articles  in  American  Psychological   Association  journals.   “6  months  later,  after  …  400  emails,  [sending]   detailed  descriptions  of  our  study  aims,   approvals  of  our  ethical  committee,  signed   assurances  not  to  share  data  with  others,  and   even  our  full  resumes…”  only  27%  of  authors   complied     •  Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61, 726-728. •  Wicherts JM, Bakker M, Molenaar D (2011) Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLoS ONE 6(11):e26828 17-­‐Jul-­‐2014   Science  Bootcamp   19  
  14. Quantifying  data  loss   17-­‐Jul-­‐2014   Science  Bootcamp   21

      Vines  TH  et  al.  (2013)  Current  Biology  DOI:10.1016/j.cub.2013.11.014  
  15. Outline   o Challenges  of  availability  for  different  kinds   of

     research  data     o How  to  achieve  archiving  of  long-­‐tail  data   associated  with  the  scientific  literature   o How  to  make  data  archives  interoperate   with  the  rest  of  the  scholarly   communications  infrastructure     o How  to  educate  yourself  and  others  about   the  data  landscape   17-­‐Jul-­‐2014   Science  Bootcamp   22  
  16. Why  encourage  data  archiving   at  the  time  of  publication

      24 Plan   Collect   Assure   Describe   Preserve   Discover   Integrate   Analyze   Proposal   wri9ng   Research   Publica9on   Ideas   ?   ?  
  17. Benefits  of  archiving  data   Modified from Beagrie et al.

    (2009) Keeping Research Data Safe 2 Direct Verification of published research Preserving accessibility to data Allowing reuse and repurposing of data Discoverability of data Indirect (costs avoided) Redundant data collection Inefficient legacy data curation Burden of sharing-upon-request Opportunity cost of science not done Near term Protection against personnel turnover Availability for review and validation Long term Secure long-term stewardship Increased impact per publication Private Increased citations New collaborations New research opportunities Fulfilling funding mandates Public More efficient use of research dollars Public trust in science Educational opportunities Improved methodologies More informed policy
  18. Open  data  citation  advantage   17-­‐Jul-­‐2014   Science  Bootcamp  

    26   Piwowar  and  Vision  (2013)  doi:10.7717/peerj.175  
  19. Piwowar HA, Chapman WW (2008) A review of journal policies

    for sharing research data. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1 Data  policies  among  bioscience  journals   n=70 IF=3.6   IF=4.5   IF=6.0  
  20. Joint  Data  Archiving  Policy  for  journals   Data  are  important

     products  of  the  scientific   enterprise,  and  they  should  be  preserved  and   usable  for  decades  in  the  future.     As  a  condition  for  publication,  data  supporting  the   results  in  the  article  should  be  deposited  in  an   appropriate  public  archive.   Authors  may  elect  to  embargo  access  to  the  data  for   a  period  up  to  a  year  after  publication.     Exceptions  may  be  granted  at  the  discretion  of  the   editor,  especially  for  sensitive  information.       http://datadryad.org/pages/jdap 17-­‐Jul-­‐2014   Science  Bootcamp   28  
  21. Content  and  usage  in  Dryad   As  of  mid-­‐May  

    o Data  packages          5,469     o Data  files            16,436   o Journals                                329   o Authors                                    19,857   o Downloads                  490,448   28-­‐May-­‐2014   Dryad-­‐Dataverse  Community  Meeting  –   Cambridge,  MA   33  
  22. Effects  of  JDAP  since  2011   17-­‐Jul-­‐2014   Science  Bootcamp

      34   Magee  et  al.  (2014)  Dawn  of  open  access  to  phylogenetic  data.  arXiv:14.1405.6623.v1  
  23. Embargoes  are  the  exception,   not  the  rule   28-­‐May-­‐2014

      Dryad-­‐Dataverse  Community  Meeting  –   Cambridge,  MA   35   A.  Embargo  selections  of  Dryad  data   authors  for  the  10,108  files  in  Dryad   deposited  from  inception  to   September  20,  2013.  Data  include  only   datasets  related  to  articles  published   in  journals  for  which  the  authors  had   the  option  of  selecting  an  embargo.       B.  Long-­‐term  embargoes  (>1  year)  by   journal  that  granted  them.     Data:  Vision  TJ  ,  Scherle  R,  Mannheimer  S   (2013)  Embargo  selections  of  Dryad  data   authors.  FigShare.  http://doi.org/10.6084/ m9.figshare.805946.     Article:  Roche  DG,  Lanfear  R,  Binning  SA,   Haff  TM,  et  al.  (2014)  Troubleshooting   Public  Data  Archiving:  Suggestions  to   Increase  Participation.  PLoS  Biol  12(1):   e1001779  
  24. It  is  not  all  tabular  data  I   28-­‐May-­‐2014  

    Dryad-­‐Dataverse  Community  Meeting  –   Cambridge,  MA   38  
  25. Even  data  on  monsters    Levy  J,  Foulsham  T,  Kingstone

     A  (2012)  Data  from:  Monsters   are  people  too.  Dryad  Digital  Repository.  doi:10.5061/dryad.4rk06    Levy  J,  Foulsham  T,  Kingstone  A  (2012)  Monsters  are  people   too.  Biology  Letters  9(1):  20120850.  doi:10.1098/rsbl.2012.0850   Beholder  image  from  Dungeons  &  Dragons   Monster  Manual  via  Discover  Magazine  blog  
  26. Reusability  requires  timely   documentation   41   Information  Content

      Time   Time  of  publication   Specific  details   General  details   Accident   Retirement  or     career  change   Death   (Michener  et  al.  1997)  
  27. Outline   o Challenges  of  availability  for  different  kinds   of

     research  data     o How  to  achieve  archiving  of  long-­‐tail  data   associated  with  the  scientific  literature   o How  to  make  data  archives  interoperate   with  the  rest  of  the  scholarly   communications  infrastructure     o How  to  educate  yourself  and  others  about   the  data  landscape   17-­‐Jul-­‐2014   Science  Bootcamp   44  
  28. Outline   o Challenges  of  availability  for  different  kinds   of

     research  data     o How  to  achieve  archiving  of  long-­‐tail  data   associated  with  the  scientific  literature   o How  to  make  data  archives  interoperate   with  the  rest  of  the  scholarly   communications  infrastructure     o How  to  educate  yourself  and  others  about   the  data  landscape   17-­‐Jul-­‐2014   Science  Bootcamp   59  
  29. Data  archiving  landscape   There  are  so  many  data  repositories

     that  we  need   directories  of  them:   o  http://re3data.org   o  http://DataBib.org   These  repositories  vary  along  many  dimensions:   o  Datatype  focus   o  Community  focus   o  Allowed  file  sizes   o  Curation  policies   o  Data  access  policies   o  Funding  model   17-­‐Jul-­‐2014   Science  Bootcamp   60  
  30. How  to  enable  the  different  stakeholders  to  bring   data

     into  the  fold  of  scholarly  communication?   17-­‐Jul-­‐2014   Science  Bootcamp   63   CC-­‐BY-­‐SA  Hyougushi,  flickr  
  31. publishers   educators   &students   researchers   funders  

    research   networks   universities   &  libraries   societies   Journals   How  to  enable  the  different  stakeholders  to  bring   data  into  the  fold  of  scholarly  communication?   17-­‐Jul-­‐2014   Science  Bootcamp   64