Research Data By and Large (W. Horstmann, LizaRDs - Research Data Management (RDM) Workshop, Zürich/Schweiz, 15.12.2014)

Research Data By and Large (W. Horstmann, LizaRDs - Research Data Management (RDM) Workshop, Zürich/Schweiz, 15.12.2014)

3d28614f8bfc9ffdbc5d1842672ef244?s=128

SUB Göttingen

July 06, 2015
Tweet

Transcript

  1. [Anlass  der  Präsenta.on]   Research  Data  By  and  Large  

    Wolfram  Horstmann      
  2. [Anlass  der  Präsenta.on]   Brave  new  world  of  research  data

     
  3. [Anlass  der  Präsenta.on]   Research  data  stakeholders   Data  

    Governments   Research   Ins.tu.ons   Funders   Publishers   Researchers  
  4. [Anlass  der  Präsenta.on]   TODAY         {1}

     RESEARCH  DATA  BIG  AND  SMALL   {2}  RDM  POLICIES   {3}  UNIVERSITY  EXAMPLES   {4}  JOURNALS   {5}  CONCLUSION  
  5. [Anlass  der  Präsenta.on]   {1}  RESEARCH  DATA  BIG  AND  SMALL

     
  6. [Anlass  der  Präsenta.on]   “Big data” is all the rage

  7. [Anlass  der  Präsenta.on]   Larger  parts  of  research  use  small

     data     The 2011 survey by Science, found that 48.3% of respondents were working with datasets that were less than 1GB in size and over half of those polled store their data only in their laboratories. Science 11 February 2011: Vol. 331 no. 6018 pp. 692-693 DOI: 10.1126/science.331.6018.692   BUT   hGp://muse.jhu.edu/journals/library_trends/v057/57.2.heidorn.pdf   Because there is only a tiny fraction of large projects and a loooooooooooooong tail of small projects   Can  you  see     the  curve?  
  8. [Anlass  der  Präsenta.on]   “Long-­‐Tail”  as  in  Economics   Chris

     Anderson  (Editor  in  Chief),  Wired,  Issue  12.10,  October  2004   hGp://www.wired.com/wired/images.html?issue=12.10&topic=tail&img=2  
  9. [Anlass  der  Präsenta.on]   “Long-­‐Tail”  as  in  Research  Data  

    P.  Bryan  Heidorn  (LIS  U  Arizona)  in  Library  Trends  57/2,  Fall  2008   hGp://muse.jhu.edu/journals/library_trends/v057/57.2.heidorn.pdf   •  …  While  great  care  is  frequently  devoted  to  the  collection,   preservation  and  reuse  of  data  on  very  large  projects,  relatively   little  attention  is  given  to  the  data  that  is  being  generated  by  the   majority  of  scientists.       •  …  There  may  only  be  a  few  scientists  worldwide  that  would  want   to  see  a  particular  boutique  data  set  but  there  are  many   thousands  of  these  data  sets.     •  …  The  long  tail  is  a  breeding  ground  for  new  ideas  and  never   before  attempted  science.     •  …  The  challenge  for  science  policy  is  to  develop  institutions  and   practices  such  as  institutional  repositories,  which  make  this  data   useful  for  society.    
  10. [Anlass  der  Präsenta.on]   Big  Data,  Long-­‐Tail  Data   • 

    “Disks  in  your  drawer;  server  in  lab  basement”   •  Long  Tail  Data  exist  across  all  disciplines   Head Tail Homogeneous Heterogeneous Large Small Common standards Unique standards Integrated Not-integrated Central curation Individual curation Disciplinary repositories Institutional, general or no repositories Adapted from: Shedding Light on the Dark Data in the Long Tail of Science by P. Bryan Heidorn. 2008
  11. [Anlass  der  Präsenta.on]   Heterogeneous!   •  A  review  undertaken

     by  Cornell  University  of  over  200  data  “packages”  (files  related  to   arXiv  papers)  deposited  into  the  Cornell  Data  Conservancy  with  there  were  42  different   file  extensions  for  1837  files  across  six  disciplines.   hGp://blogs.cornell.edu/dsps/2013/06/14/arxiv-­‐data-­‐conservancy-­‐pilot/   •  The  Dryad  Repository,  which  is  a  curated,  general-­‐purpose  repository  that  collects  and   provides  access  to  data  underlying  scien.fic  publica.ons  reports  a  huge  diversity  of   formats  including  excel,  CVS,  images,  video,  audio,  html,  xml,  as  well  as  “many   uncommon  and  annoying  formats”.  The  average  size  of  the  data  package  which  they   collect  is  ~50  MB.   hGp://wiki.datadryad.org/wg/dryad/images/b/b7/2013MayVision.pdf   •  According  to  the  European  Commission  (EC)  document,  Research  Data  e-­‐ Infrastructures:  Framework  for  Ac;on  in  H2020,  “diversity  is  likely  to  remain  a   dominant  feature  of  research  data  –  diversity  of  formats,  types,  vocabularies,  and   computa.onal  requirements  –  but  also  of  the  people  and  communi.es  that  generate   and  use  the  data.”   hGp://cordis.europa.eu/fp7/ict/e-­‐infrastructure/docs/framework-­‐for-­‐ac.on-­‐in-­‐ h2020_en.pdf  
  12. [Anlass  der  Präsenta.on]   Ins.tu.onal,  domain  or  no  repositories  

          Science 11 February 2011: Vol. 331 no. 6018 pp. 692-693 DOI: 10.1126/science.331.6018.692
  13. [Anlass  der  Präsenta.on]   Some  of  the  challenges    

        Data  quality    -­‐  appraise  and  show  data  as  scien.fic  /  ins.tu.onal  /societal  asset    -­‐  push  standards  for  metadata  and  technology  across  disciplines     Discoverability    -­‐  increase  discoverability  in  diverse  repositories     Incen.ves        -­‐  show  researchers  how  easy  and  beneficial  it  is  to  deposit  data    -­‐  ask  funders  and  ins.tu.ons  about  policies     Business  case    -­‐  show  problems  of  irreproducibility,  double  research  &  innova.on  loss  
  14. [Anlass  der  Präsenta.on]   •  Accepted  as  an  RDA  Interest

     Group  in  Summer  2013   •  Over  90  members  from  around  the  world     Objec.ves   •  To  beGer  understand  the  long  tail   •  To  address  challenges  involved  in  managing  diverse  datasets   •  To  share  and  develop  prac.ces  for  managing  diverse  data   •  To  work  towards  greater  interoperability  across  repositories   Long  Tail  of  Research  Data  Interest  Group   “Thanks  for  the   slides,  Kathleen!”   Kathleen  Sheerer,   COAR  ExecuJve   Director  and  Co-­‐ Chair  of  the  RDA  IG  
  15. [Anlass  der  Präsenta.on]   {2}  RDM  POLICIES  

  16. [Anlass  der  Präsenta.on]   Project  Policy:  Human  Genome  Project  

    1996  
  17. [Anlass  der  Präsenta.on]   Funder  Policy:  DFG  1998   Recommendation

    7: Primary data as the basis for publications shall be securely stored for ten years in a durable form in the institution of their origin.!
  18. [Anlass  der  Präsenta.on]   Project  policy:  Tools  2001  

  19. [Anlass  der  Präsenta.on]   Funder  Policy:  US  NIH  2002  

  20. [Anlass  der  Präsenta.on]   Journal  Policy,  e.g.  NAR  2003  (?)

     
  21. [Anlass  der  Präsenta.on]   Funder  Policy:  CRCs  2010ff  

  22. [Anlass  der  Präsenta.on]   Funder  Policy:  EPSRC  2011  

  23. [Anlass  der  Präsenta.on]   Funder  Policy:  many…  2012  

  24. [Anlass  der  Präsenta.on]   Ins.tu.onal  Policy:  e.g.   Edinburgh  2011

     
  25. [Anlass  der  Präsenta.on]   Funder  Policy:  EC  2012  

  26. [Anlass  der  Präsenta.on]   Government  Policy:  e.g.  UK  2012  

  27. [Anlass  der  Präsenta.on]   Interna.onal  Policy,  e.g.  G8  2013  

  28. [Anlass  der  Präsenta.on]   Service  Policy:  Dryad    

  29. [Anlass  der  Präsenta.on]   Service  Policy:  Figshare  

  30. [Anlass  der  Präsenta.on]   Service  Policy:  Scien.fic  Data  

  31. [Anlass  der  Präsenta.on]   {3}  UNIVERSITY  EXAMPLES  

  32. [Anlass  der  Präsenta.on]   Look  at  Na.onal  Ini.a.ves  

  33. [Anlass  der  Präsenta.on]   Look  at  Libraries  Prac.ce   CASE

     STUDIES  OF  RESEARCH  DATA  MANAGEMENT  IN  UNIVERSITIES  
  34. [Anlass  der  Präsenta.on]   Look  at  University  Policy  

  35. [Anlass  der  Präsenta.on]   Engage  in  Expert  Conversa.ons  

  36. [Anlass  der  Präsenta.on]   Refer  to  Exis.ng  Repositories  

  37. [Anlass  der  Präsenta.on]   UNIVERSITY  EXAMPLE:  BIELEFELD  

  38. [Anlass  der  Präsenta.on]   RDM  Principles  

  39. [Anlass  der  Präsenta.on]   Central  RDM  Ac.vi.es   •  Research

     Data  Policy:  “Principles”   •  Coordina.on  post  funded  by  the  University   •  Focus  group  with  leading  academics   •  Colloquium  Knowledge  Infrastructure   •  Library  in  close  coopera.on  with  IT   – Library  provides  cura.on  and  metadata  support   – IT  Services  provide  servers  and  storage  
  40. [Anlass  der  Präsenta.on]   Centrally  Supported  RDM  actvi.ty   • 

    Data-­‐Service  Centre  “Business  and  Organisa.onal  Data”   •  Infrastructure  for  Collabora.ve  Research  Project  (50PIs,  12  Years)   “From  Heterogeneity  to  Inequality”   •  Excellence  Cluster  “Cogni.ve  Interac.on  Robo.cs”  (100PIs,  10  years)   •  Library  Services:  metadata,  DOIs,  sowware,  calendars,  websites…   •  Storage  and  housing  in  IT  Services  
  41. [Anlass  der  Präsenta.on]  

  42. [Anlass  der  Präsenta.on]  

  43. [Anlass  der  Präsenta.on]  

  44. [Anlass  der  Präsenta.on]   RDM  Resolu.on  

  45. [Anlass  der  Präsenta.on]   Ins.tu.onal  Repository  

  46. [Anlass  der  Präsenta.on]   Ins.tu.onal  Repository  

  47. [Anlass  der  Präsenta.on]   Ins.tu.onal  Repository  

  48. [Anlass  der  Präsenta.on]   UNIVERSITY  EXAMPLE:  OXFORD  

  49. [Anlass  der  Präsenta.on]   RDM  Policy  

  50. [Anlass  der  Präsenta.on]   RDM  Working  Group   •  Chaired

     by  PVC-­‐R   •  Representa.on  of  many  stakeholders   – Divisions   – Libraries   – IT  Services   – Research  Services   •  Many  external  and  departmental  ac.vi.es   •  Central  services  as  fall-­‐back  
  51. [Anlass  der  Präsenta.on]   Mul.-­‐Agency  Ini.a.ve  

  52. [Anlass  der  Präsenta.on]   RDM  Website  

  53. [Anlass  der  Präsenta.on]   Ins.tu.onal  Repository:  ORA-­‐Data  

  54. [Anlass  der  Präsenta.on]   UNIVERSITY  EXAMPLE:  GÖTTINGEN  

  55. [Anlass  der  Präsenta.on]   Information Infrastructure at the Göttingen Campus

    •  Göttingen Campus •  IT Services: GWDG •  State and University Library Göttingen: SUB •  Research Data Policy •  Göttingen eResearch Alliance –  Building on a strong tradition of collaboration –  Sustainable Support at Seleceted Points in the Research Lifecycle –  Consultations for Project Proposals –  Pooling Infrastructure Specialists –  Training, IT Support, Publication Services, Research Data and Software
  56. [Anlass  der  Präsenta.on]   Building the Göttingen eResearch Alliance Library

      GöKngen     Campus     Partners   IT  Services   University     Collec.ons   Facul;es     and     Ins;tutes   Centre  for     Digital  HumaniJes   Humani.es  Data  Centre   Philosophy   Medicine   Biology   Theology   Chemistry   Geosciences   Interna;onal   Partners   Collabora'on   #           Project  Consultancy   Research  Data  Services   Staff  Pooling   Training   SoSware   Development   Publica;on  Services         Services   Interna'onal   Informa'on   Infrastructure  
  57. [Anlass  der  Präsenta.on]   JOURNAL  EXAMPLE:  NATURE  

  58. [Anlass  der  Präsenta.on]  

  59. [Anlass  der  Präsenta.on]   JOURNAL  EXAMPLE:  SCIENTIFIC  DATA  

  60. [Anlass  der  Präsenta.on]  

  61. [Anlass  der  Präsenta.on]   JOURNAL  EXAMPLE:  F1000  RESEARCH  

  62. [Anlass  der  Präsenta.on]   Ins.tu.onal  Repositories   General  Terms  

    Hos.ng  
  63. [Anlass  der  Präsenta.on]   SUMMARY  AND  CONCLUSION  

  64. [Anlass  der  Präsenta.on]   We  Need  Data  Libraries   • 

    Post-­‐hoc  Data  Library:  derived  from  a  longstanding  history  of  librarianship     –  Strenghts:  service  reputa.on,  recurrent  funds  and  a  profession  behind  it   –  Weakness:  liGle  subject-­‐specific  exper.se   •  Ad-­‐hoc  Data  Library:  derived  from  urgent  needs  in  (research)  prac.ce     –  Strength:  built  on  outstanding  subject-­‐specific  exper.se   –  Weakness:  service  not  always  culture  of  research,  no  recurrent  funding   •  However,  there  are  many  hybrids   –  The  physical  data  library  is  about  virtual  data  services   –  The  virtual  data  library  will  need  a  physical  infrastructure   [Anlass der Präsentation]
  65. [Anlass  der  Präsenta.on]   We  Need  Data  Libraries   • 

    How?   1.  apply  collabora.on  spirit  between  Researchers,  Libraries,  IT  Service,   Ins.tu.ons,  Funders  and  Publishers   2.  jointly  work  on  a  ‚funded‘  policy   3.  focus  on  the  record  of  research,  i.e.  links  between  data  and   literature   4.  focus  on  the  added  value  for  the  individual  researcher        
  66. [Anlass  der  Präsenta.on]   Research  Data     Stakeholder  Contribu.ons

      Research   Data   Governments   Research   Ins.tu.ons   Funders   Publishers   Researchers   Funding   Infrastructure  
  67. [Anlass  der  Präsenta.on]   THANKS