Research Data Management + Sharing

Research Data Management + Sharing

Presentation given at UW Libraries, November 2016.

64ae9936d30bfe1f029b7e3fa1be486a?s=128

Brianna Marshall

November 16, 2016
Tweet

Transcript

  1. Brianna Marshall, Digital Curation Coordinator Research Data Management + Sharing

  2. About these workshops Funder Public Access Requirements Wed, Oct. 12,

    2:00-3:30 at Steenbock Thurs, Oct. 13, 9:30-11:00 at Memorial Research Data Management & Sharing Wed, Nov. 16, 2:00-3:30 at Steenbock Thurs, Nov. 17, 9:30-11:00 at Memorial Open Access Publishing Wed, Dec. 14, 2:00-3:30 at Steenbock Thurs, Dec. 15, 9:30-11:00 at Memorial Open Research & Reproducibility Wed, Feb. 15, 2:00-3:30 at Steenbock Thurs, Feb 16, 9:30-11:00 at Memorial Authors’ Rights Management Wed, Mar. 15, 2:00-3:30 at Steenbock Thurs, Mar. 16, 9:30-11:00 at Memorial Digital Project Planning Wed, Apr. 12, 2:00-3:30 at Steenbock Thurs, Apr. 13, 9:30-11:00 at Memorial Library administration effort to support staff work and development around these topics
  3. how can libraries p r o v i d e

    v a l u e in data services?
  4. CONSULTATIONS + GUIDANCE GENERAL RDM •  Organization, naming, versioning • 

    Sharing + openness •  Data formats, metadata •  Security •  Storage + backup •  Etc. OPEN SCIENCE + REPRODUCIBILITY FEDERAL REQUIREMENTS •  Data management plans (DMPs) •  OSTP memo 1
  5. + + + S E R V I C E

    S + + + TEACHING + TRAINING •  Data information literacy •  Embedded assistance DATA CURATION TECHNOLOGY •  Storage •  Collaboration •  Preservation (Maybe in the form of a repository, maybe not.) 2
  6. + + + + P O L I C Y

    + + + + 3 UNIVERSITY •  Governance/stewardship •  Security •  Campus OSTP response NATIONAL •  Funder policy advocacy •  Group/committee involvement INTERNATIONAL •  Data standards advocacy •  Group/committee involvement
  7. Research Data Management What is research data? “The recorded factual

    information commonly accepted in the scientific community as necessary to validate research findings.” INCLUDES: code, figures, statistics, interviews, transcripts EXCLUDES: preliminary analyses, drafts of papers, plans for further research, communication + peer reviews, physical samples - OMB Circular, White House
  8. time to ponder Can you still access your data from…

    – 10 years ago? – 5 years ago? – 1 year ago?
  9. data horror stories Image courtesy of Flickr user wolfgangfoto (CC

    BY ND)"
  10. https://i.imgflip.com/ntnjb.jpg you can’t find it.

  11. or you can’t understand it. http://cdn.meme.am/instances/58392702.jpg

  12. or it’s long gone. https://community.spiceworks.com/topic/813225- best-backup-recovery-memes

  13. h#p://mentalfloss.com/uk/entertainment/27204/how-one-line-of-text-nearly-killed-toy-story-2

  14. what’s up with researchers? Ample technology to generate data but

    few skills to manage it effectively Movement toward openness, impacted by OSTP and spurred by early career researcher expectations Disciplinary culture shifts toward data reuse + reproducibility Need for multi-purpose online spaces to collaborate, share, store, and archive research outputs (including data)
  15. research is increasingly digital. •  Multi-institutional •  Grant-funded •  Shared

    infrastructure •  Computationally driven Image courtesy of #wocintechchat
  16. [ retraction watch.com ]

  17. federal funding requirements Data management plans (DMPs) are required by

    all federal funding agencies. Office of Science and Technology Policy (OSTP) memo –  Released spring 2013; took effect fall 2015 –  Requires open sharing of published articles and data –  Publication repository is provided; data repository is not –  Applies to agencies with $100M + in R&D
  18. data management basics File organization •  Is your data organized

    meaningfully or jumbled together? Do you know where your data is? Documentation •  How much contextual information accompanies your data? Can you understand it? Can a stranger understand it? Storage & backup •  Where is your data stored and backed up? Could you recover from hardware failure or accidental deletion? Media obsolescence •  Do you know how the software, hardware, and file formats you use will impact your data’s readability in the future?
  19. file naming conventions •  Use them any time you have

    related files •  Consistent •  Short yet descriptive •  Avoid spaces and special characters example File001.xls vs. Project_instrument_location_YYYYMMDD.xls
  20. directory/folder organization Lots of possibilities, so consider what makes sense

    for your project – File type – Date – Type of analysis example: MyDocuments\Research\Sample12.tiff vs. C:\\NSFGrant01234\WaterQuality\Images\LakeMendota_20141030.tiff
  21. retroactive organization •  Do a data inventory. List all the

    places where your data lives (both physical and digital) •  Make a plan for consolidating – follow the rule of 3, not the rule of 17!
  22. document on many levels Project- & folder-level –  Create a

    readme file. (Good example located here: http://hdl.handle.net/2022/17155) –  Document any data processing and analyses. –  Don’t forget written notes. Item-level –  Remember the importance of file names for conveying descriptive information. –  Find and adhere to disciplinary metadata standards •  XML •  Dublin Core
  23. what’s in a good readme file? •  Names + contact

    information for people associated with the project •  List of files, including a description of their relationship to one another •  Copyright + licensing information •  Limitations of the data •  Funding sources / institutional support tl;dr Any information necessary for someone with no knowledge of your research to understand and / or replicate your work.
  24. example readme file!

  25. storage & backup storage = working files. The files you

    access regularly and change frequently. In general, losing your storage means losing current versions of the data. backup = regular process of copying data separate from storage. You don’t really need it until you lose data, but when you need to restore a file it will be the most important process you have in place.
  26. rule of 3 Keep THREE copies of your data – 

    TWO onsite –  ONE offsite Example –  One: Network drive –  Two: External hard drive –  Three: Cloud storage This ensures that your storage and backup is not all in the same place – that’s too risky!
  27. Original clipart from http://cliparts.co/clipart/2532461. Modified version made available as CC0.

  28. evaluating cloud services •  Lots of options out there –

    and not all are created equal •  Read the Terms of Service! •  While at UW, researchers can use free UW Box or Google Drive accounts
  29. h#p://www.doit.wisc.edu/news/collaboraDon-tools-google-docs-vs-box-2

  30. media obsolescence CC image by Flickr user wlef70 •  software

    •  hardware •  file formats
  31. thwarting obsolescence •  You can’t. •  Today’s popular software can

    become obsolete through business deals, new versions, or a gradual decline in user base. (Consider WordPerfect.) •  Anticipate average lifespan of media to be 3-5 years. Migrate your files every few years, if not more frequently!
  32. thwarting obsolescence •  Some file formats are less susceptible to

    obsolescence than others –  Open, non-proprietary formats (pick TXT over DOCX, CSV over XSLX, TIF over JPG) –  Wide adoption –  History of backward compatibility –  Metadata support in open format (XML)
  33. Research Data Management How to work with digital files! • 

    Formats •  Naming •  Folder structure •  Description •  Collaboration •  Storage •  Sharing •  Publication
  34. Case Study 1: Formats (Media Obsolescence)

  35. Case Study 2: File Naming + Folder Structure

  36. Case Study 3 Storage + Collaboration 1. Does there exist

    a place for X Lab to store large amounts (100 gb) of data that relates to each publication? Requirements: •  DOI •  Ability to store multiple files and folders •  Ability for people to access without paying any money or a sign-in •  Not connected to Professor X → longevity 2. How do people handle/share computer code related to a publication?
  37. Data is disciplinary.

  38. h#p://deanbirke#.name/work/research.html

  39. “What can I do?”

  40. Start the conversation. All you need is… ü  A pinch

    of boldness ü  A basic understanding of RDM concepts ü  Knowledge of what to promote: RDS!
  41. Work with RDS ü  Ask us for suggestions about starting

    the conversation, including sample text ü  Partner with us to teach RDM best practices to your researchers ü  What are your ideas?
  42. researchdata.wisc.edu

  43. discussion points •  What do you think is the future

    of RDM and libraries? •  What do you want to learn about RDM? What are you most curious about? •  What are good strategies for engaging departments in discussion on these topics?
  44. What’s Next? Funder Public Access Requirements Wed, Oct. 12, 2:00-3:30

    at Steenbock Thurs, Oct. 13, 9:30-11:00 at Memorial Research Data Management & Sharing Wed, Nov. 16, 2:00-3:30 at Steenbock Thurs, Nov. 17, 9:30-11:00 at Memorial Open Access Publishing Wed, Dec. 14, 2:00-3:30 at Steenbock Thurs, Dec. 15, 9:30-11:00 at Memorial Open Research & Reproducibility Wed, Feb. 15, 2:00-3:30 at Steenbock Thurs, Feb 16, 9:30-11:00 at Memorial Authors’ Rights Management Wed, Mar. 15, 2:00-3:30 at Steenbock Thurs, Mar. 16, 9:30-11:00 at Memorial Digital Project Planning Wed, Apr. 12, 2:00-3:30 at Steenbock Thurs, Apr. 13, 9:30-11:00 at Memorial