Research Data Management + Sharing

Research Data Management + Sharing

Presentation given at Steenbock Library BioCommons, UW-Madison, as part of my workshop series on digital scholarship topics for grad students and early career researchers. November 2015.

64ae9936d30bfe1f029b7e3fa1be486a?s=128

Brianna Marshall

November 19, 2015
Tweet

Transcript

  1. 2.

    about me! Brianna Marshall! Digital Curation Coordinator, UW Libraries! Lead,

    Research Data Services – Education + training – Consultations – Data management plans (DMPs) researchdata.wisc.edu @UWMadRschSvcs ! !
  2. 3.

    caveats! •  Given limited time, I’ve chosen to mention things

    - even just briefly! - rather than forego them entirely •  Hopefully you’ll get ideas for concepts and tools to explore later •  Expectations and best practices are often field- specific, so it’s tough to generalize
  3. 4.

    hopes + dreams! you’ll find something I talk about today

    useful + put it into practice you’ll share your top data management tips with us you’ll tell me what you want to know more about for future workshops! you’ll be reenergized enough by the topic to find something else that works for you and / or
  4. 5.

    what is research data?! “the recorded factual information commonly accepted

    in the scientific community as necessary to validate research findings.” INCLUDES: code, figures, statistics, interviews, transcripts EXCLUDES: preliminary analyses, drafts of papers, plans for further research, communication + peer reviews, physical samples -  OMB Circular, White House
  5. 6.

    time to ponder! Can you still access your data from…

    – 10 years ago? – 5 years ago? – 1 year ago? Let’s talk about the data you’ve kept and lost.
  6. 7.

    time to ponder! Can you still access your data digital

    stuff from… – 10 years ago? – 5 years ago? – 1 year ago? Let’s talk about the data digital stuff you’ve kept and lost.
  7. 12.

    federal funding requirements! Data management plans (DMPs) are required by

    all federal funding agencies. Office of Science and Technology Policy (OSTP) memo –  Released spring 2013; took effect fall 2015 –  Requires open sharing of published articles and data –  Publication repository is provided; data repository is not –  Applies to agencies with $100M + in R&D
  8. 14.

    what’s up with researchers?! Ample technology to generate data but

    few skills to manage it effectively Movement toward openness, impacted by OSTP and spurred by early career researcher expectations Disciplinary culture shifts toward data reuse + reproducibility Need for multi-purpose online spaces to collaborate, share, store, and archive research outputs (including data)
  9. 16.

    data management basics! File organization •  Is your data organized

    meaningfully or jumbled together? Do you know where your data is? Documentation •  How much contextual information accompanies your data? Can you understand it? Can a stranger understand it? Storage & backup •  Where is your data stored and backed up? Could you recover from hardware failure or accidental deletion? Media obsolescence •  Do you know how the software, hardware, and file formats you use will impact your data’s readability in the future?
  10. 17.

    file naming conventions! •  Use them any time you have

    related files •  Consistent •  Short yet descriptive •  Avoid spaces and special characters example File001.xls vs. Project_instrument_location_YYYYMMDD.xls
  11. 18.

    directory/folder organization! Lots of possibilities, so consider what makes sense

    for your project – File type – Date – Type of analysis example: MyDocuments\Research\Sample12.tiff vs. C:\\NSFGrant01234\WaterQuality\Images\LakeMendota_20141030.tiff
  12. 19.

    retroactive organization! •  Do a data inventory. List all the

    places where your data lives (both physical and digital) •  Make a plan for consolidating – follow the rule of 3, not the rule of 17
  13. 21.

    document on many levels! Project- & folder-level –  Create a

    readme file. (Good example located here: http://hdl.handle.net/2022/17155) –  Document any data processing and analyses. –  Don’t forget written notes. Item-level –  Remember the importance of file names for conveying descriptive information. –  Find and adhere to disciplinary metadata standards •  XML •  Dublin Core
  14. 22.

    what’s in a good readme file?! •  Names + contact

    information for people associated with the project! •  List of files, including a description of their relationship to one another! •  Copyright + licensing information! •  Limitations of the data! •  Funding sources / institutional support! ! tl;dr !! Any information necessary for someone with no knowledge of your research to understand and / or replicate your work.!
  15. 24.

    storage & backup! storage = working files. The files you

    access regularly and change frequently. In general, losing your storage means losing current versions of the data. backup = regular process of copying data separate from storage. You don’t really need it until you lose data, but when you need to restore a file it will be the most important process you have in place.
  16. 25.

    rule of 3 Keep THREE copies of your data – 

    TWO onsite –  ONE offsite Example –  One: Network drive –  Two: External hard drive –  Three: Cloud storage This ensures that your storage and backup is not all in the same place – that’s too risky!
  17. 27.

    evaluating cloud services! •  Lots of options out there –

    and not all are created equal •  Read the Terms of Service! •  While at UW, use your free UW Box or Google Drive accounts
  18. 29.

    media obsolescence! CC  image  by  Flickr  user  wlef70    

      •  software •  hardware •  file formats        
  19. 30.

    thwarting obsolescence! •  You can’t. •  Today’s popular software can

    become obsolete through business deals, new versions, or a gradual decline in user base. (Consider WordPerfect.) •  Anticipate average lifespan of media to be 3-5 years. Migrate your files every few years, if not more frequently!
  20. 31.

    thwarting obsolescence! •  Some file formats are less susceptible to

    obsolescence than others –  Open, non-proprietary formats (pick TXT over DOCX, CSV over XSLX, TIF over JPG) –  Wide adoption –  History of backward compatibility –  Metadata support in open format (XML)
  21. 32.

    back to (data management) basics! File organization •  Is your

    data organized meaningfully or jumbled together? Do you know where your data is? Documentation •  How much contextual information accompanies your data? Can you understand it? Can a stranger understand it? Storage & backup •  Where is your data stored and backed up? Could you recover from hardware failure or accidental deletion? Media obsolescence •  Do you know how the software, hardware, and file formats you use will impact your data’s readability in the future?
  22. 34.

    get credit for your data! •  Many ways to share/publish

    your data! –  Institutional + disciplinary repositories –  Data papers/journals •  If your research is federally funded, remember that you’ll now have to share your data •  Data is not copyrightable; best practice is to apply a Creative Commons 0 license •  There’s even a proven citation advantage to sharing your data* *Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 https://dx.doi.org/10.7717/peerj.175
  23. 41.
  24. 46.

    final thoughts! •  Think about how your existing data management

    practices will impact your ability to access your data days/weeks/years from now. •  If organizing retroactively, prioritize your most important research. •  Managing digital stuff requires a LOT of decision making, so embrace it! •  Any plan is better than no plan at all. Start today. Ask for help.
  25. 47.

    my suggestion?! Grant or not, start new projects with a

    data management plan compiled by project leaders. The plan should cover: •  Organization + naming •  Documentation + metadata •  Storage + sharing •  Any and all other pertinent details. (The more the better; it’ll save you headaches later.) The plan should be actively revisited and adapted as needed throughout the project.  
  26. 49.
  27. 51.

    upcoming digital scholarship workshops! An Introduction to Open Research DECEMBER

    10 AVAILABLE ONLINE Project Management + Productivity Tools Crafting Your Digital Identity Steenbock Library BioCommons | 4-5pm