Presentation given at Steenbock Library BioCommons, UW-Madison, as part of my workshop series on digital scholarship topics for grad students and early career researchers. November 2015.
- even just briefly! - rather than forego them entirely • Hopefully you’ll get ideas for concepts and tools to explore later • Expectations and best practices are often field- specific, so it’s tough to generalize
useful + put it into practice you’ll share your top data management tips with us you’ll tell me what you want to know more about for future workshops! you’ll be reenergized enough by the topic to find something else that works for you and / or
in the scientific community as necessary to validate research findings.” INCLUDES: code, figures, statistics, interviews, transcripts EXCLUDES: preliminary analyses, drafts of papers, plans for further research, communication + peer reviews, physical samples - OMB Circular, White House
all federal funding agencies. Office of Science and Technology Policy (OSTP) memo – Released spring 2013; took effect fall 2015 – Requires open sharing of published articles and data – Publication repository is provided; data repository is not – Applies to agencies with $100M + in R&D
few skills to manage it effectively Movement toward openness, impacted by OSTP and spurred by early career researcher expectations Disciplinary culture shifts toward data reuse + reproducibility Need for multi-purpose online spaces to collaborate, share, store, and archive research outputs (including data)
meaningfully or jumbled together? Do you know where your data is? Documentation • How much contextual information accompanies your data? Can you understand it? Can a stranger understand it? Storage & backup • Where is your data stored and backed up? Could you recover from hardware failure or accidental deletion? Media obsolescence • Do you know how the software, hardware, and file formats you use will impact your data’s readability in the future?
related files • Consistent • Short yet descriptive • Avoid spaces and special characters example File001.xls vs. Project_instrument_location_YYYYMMDD.xls
for your project – File type – Date – Type of analysis example: MyDocuments\Research\Sample12.tiff vs. C:\\NSFGrant01234\WaterQuality\Images\LakeMendota_20141030.tiff
readme file. (Good example located here: http://hdl.handle.net/2022/17155) – Document any data processing and analyses. – Don’t forget written notes. Item-level – Remember the importance of file names for conveying descriptive information. – Find and adhere to disciplinary metadata standards • XML • Dublin Core
information for people associated with the project! • List of files, including a description of their relationship to one another! • Copyright + licensing information! • Limitations of the data! • Funding sources / institutional support! ! tl;dr !! Any information necessary for someone with no knowledge of your research to understand and / or replicate your work.!
access regularly and change frequently. In general, losing your storage means losing current versions of the data. backup = regular process of copying data separate from storage. You don’t really need it until you lose data, but when you need to restore a file it will be the most important process you have in place.
TWO onsite – ONE offsite Example – One: Network drive – Two: External hard drive – Three: Cloud storage This ensures that your storage and backup is not all in the same place – that’s too risky!
become obsolete through business deals, new versions, or a gradual decline in user base. (Consider WordPerfect.) • Anticipate average lifespan of media to be 3-5 years. Migrate your files every few years, if not more frequently!
obsolescence than others – Open, non-proprietary formats (pick TXT over DOCX, CSV over XSLX, TIF over JPG) – Wide adoption – History of backward compatibility – Metadata support in open format (XML)
data organized meaningfully or jumbled together? Do you know where your data is? Documentation • How much contextual information accompanies your data? Can you understand it? Can a stranger understand it? Storage & backup • Where is your data stored and backed up? Could you recover from hardware failure or accidental deletion? Media obsolescence • Do you know how the software, hardware, and file formats you use will impact your data’s readability in the future?
your data! – Institutional + disciplinary repositories – Data papers/journals • If your research is federally funded, remember that you’ll now have to share your data • Data is not copyrightable; best practice is to apply a Creative Commons 0 license • There’s even a proven citation advantage to sharing your data* *Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 https://dx.doi.org/10.7717/peerj.175
practices will impact your ability to access your data days/weeks/years from now. • If organizing retroactively, prioritize your most important research. • Managing digital stuff requires a LOT of decision making, so embrace it! • Any plan is better than no plan at all. Start today. Ask for help.
data management plan compiled by project leaders. The plan should cover: • Organization + naming • Documentation + metadata • Storage + sharing • Any and all other pertinent details. (The more the better; it’ll save you headaches later.) The plan should be actively revisited and adapted as needed throughout the project.