Slide 1

Slide 1 text

Brianna Marshall, Digital Curation Coordinator Research Data Management + Sharing

Slide 2

Slide 2 text

About these workshops Funder Public Access Requirements Wed, Oct. 12, 2:00-3:30 at Steenbock Thurs, Oct. 13, 9:30-11:00 at Memorial Research Data Management & Sharing Wed, Nov. 16, 2:00-3:30 at Steenbock Thurs, Nov. 17, 9:30-11:00 at Memorial Open Access Publishing Wed, Dec. 14, 2:00-3:30 at Steenbock Thurs, Dec. 15, 9:30-11:00 at Memorial Open Research & Reproducibility Wed, Feb. 15, 2:00-3:30 at Steenbock Thurs, Feb 16, 9:30-11:00 at Memorial Authors’ Rights Management Wed, Mar. 15, 2:00-3:30 at Steenbock Thurs, Mar. 16, 9:30-11:00 at Memorial Digital Project Planning Wed, Apr. 12, 2:00-3:30 at Steenbock Thurs, Apr. 13, 9:30-11:00 at Memorial Library administration effort to support staff work and development around these topics

Slide 3

Slide 3 text

how can libraries p r o v i d e v a l u e in data services?

Slide 4

Slide 4 text

CONSULTATIONS + GUIDANCE GENERAL RDM •  Organization, naming, versioning •  Sharing + openness •  Data formats, metadata •  Security •  Storage + backup •  Etc. OPEN SCIENCE + REPRODUCIBILITY FEDERAL REQUIREMENTS •  Data management plans (DMPs) •  OSTP memo 1

Slide 5

Slide 5 text

+ + + S E R V I C E S + + + TEACHING + TRAINING •  Data information literacy •  Embedded assistance DATA CURATION TECHNOLOGY •  Storage •  Collaboration •  Preservation (Maybe in the form of a repository, maybe not.) 2

Slide 6

Slide 6 text

+ + + + P O L I C Y + + + + 3 UNIVERSITY •  Governance/stewardship •  Security •  Campus OSTP response NATIONAL •  Funder policy advocacy •  Group/committee involvement INTERNATIONAL •  Data standards advocacy •  Group/committee involvement

Slide 7

Slide 7 text

Research Data Management What is research data? “The recorded factual information commonly accepted in the scientific community as necessary to validate research findings.” INCLUDES: code, figures, statistics, interviews, transcripts EXCLUDES: preliminary analyses, drafts of papers, plans for further research, communication + peer reviews, physical samples - OMB Circular, White House

Slide 8

Slide 8 text

time to ponder Can you still access your data from… – 10 years ago? – 5 years ago? – 1 year ago?

Slide 9

Slide 9 text

data horror stories Image courtesy of Flickr user wolfgangfoto (CC BY ND)"

Slide 10

Slide 10 text

https://i.imgflip.com/ntnjb.jpg you can’t find it.

Slide 11

Slide 11 text

or you can’t understand it. http://cdn.meme.am/instances/58392702.jpg

Slide 12

Slide 12 text

or it’s long gone. https://community.spiceworks.com/topic/813225- best-backup-recovery-memes

Slide 13

Slide 13 text

h#p://mentalfloss.com/uk/entertainment/27204/how-one-line-of-text-nearly-killed-toy-story-2

Slide 14

Slide 14 text

what’s up with researchers? Ample technology to generate data but few skills to manage it effectively Movement toward openness, impacted by OSTP and spurred by early career researcher expectations Disciplinary culture shifts toward data reuse + reproducibility Need for multi-purpose online spaces to collaborate, share, store, and archive research outputs (including data)

Slide 15

Slide 15 text

research is increasingly digital. •  Multi-institutional •  Grant-funded •  Shared infrastructure •  Computationally driven Image courtesy of #wocintechchat

Slide 16

Slide 16 text

[ retraction watch.com ]

Slide 17

Slide 17 text

federal funding requirements Data management plans (DMPs) are required by all federal funding agencies. Office of Science and Technology Policy (OSTP) memo –  Released spring 2013; took effect fall 2015 –  Requires open sharing of published articles and data –  Publication repository is provided; data repository is not –  Applies to agencies with $100M + in R&D

Slide 18

Slide 18 text

data management basics File organization •  Is your data organized meaningfully or jumbled together? Do you know where your data is? Documentation •  How much contextual information accompanies your data? Can you understand it? Can a stranger understand it? Storage & backup •  Where is your data stored and backed up? Could you recover from hardware failure or accidental deletion? Media obsolescence •  Do you know how the software, hardware, and file formats you use will impact your data’s readability in the future?

Slide 19

Slide 19 text

file naming conventions •  Use them any time you have related files •  Consistent •  Short yet descriptive •  Avoid spaces and special characters example File001.xls vs. Project_instrument_location_YYYYMMDD.xls

Slide 20

Slide 20 text

directory/folder organization Lots of possibilities, so consider what makes sense for your project – File type – Date – Type of analysis example: MyDocuments\Research\Sample12.tiff vs. C:\\NSFGrant01234\WaterQuality\Images\LakeMendota_20141030.tiff

Slide 21

Slide 21 text

retroactive organization •  Do a data inventory. List all the places where your data lives (both physical and digital) •  Make a plan for consolidating – follow the rule of 3, not the rule of 17!

Slide 22

Slide 22 text

document on many levels Project- & folder-level –  Create a readme file. (Good example located here: http://hdl.handle.net/2022/17155) –  Document any data processing and analyses. –  Don’t forget written notes. Item-level –  Remember the importance of file names for conveying descriptive information. –  Find and adhere to disciplinary metadata standards •  XML •  Dublin Core

Slide 23

Slide 23 text

what’s in a good readme file? •  Names + contact information for people associated with the project •  List of files, including a description of their relationship to one another •  Copyright + licensing information •  Limitations of the data •  Funding sources / institutional support tl;dr Any information necessary for someone with no knowledge of your research to understand and / or replicate your work.

Slide 24

Slide 24 text

example readme file!

Slide 25

Slide 25 text

storage & backup storage = working files. The files you access regularly and change frequently. In general, losing your storage means losing current versions of the data. backup = regular process of copying data separate from storage. You don’t really need it until you lose data, but when you need to restore a file it will be the most important process you have in place.

Slide 26

Slide 26 text

rule of 3 Keep THREE copies of your data –  TWO onsite –  ONE offsite Example –  One: Network drive –  Two: External hard drive –  Three: Cloud storage This ensures that your storage and backup is not all in the same place – that’s too risky!

Slide 27

Slide 27 text

Original clipart from http://cliparts.co/clipart/2532461. Modified version made available as CC0.

Slide 28

Slide 28 text

evaluating cloud services •  Lots of options out there – and not all are created equal •  Read the Terms of Service! •  While at UW, researchers can use free UW Box or Google Drive accounts

Slide 29

Slide 29 text

h#p://www.doit.wisc.edu/news/collaboraDon-tools-google-docs-vs-box-2

Slide 30

Slide 30 text

media obsolescence CC image by Flickr user wlef70 •  software •  hardware •  file formats

Slide 31

Slide 31 text

thwarting obsolescence •  You can’t. •  Today’s popular software can become obsolete through business deals, new versions, or a gradual decline in user base. (Consider WordPerfect.) •  Anticipate average lifespan of media to be 3-5 years. Migrate your files every few years, if not more frequently!

Slide 32

Slide 32 text

thwarting obsolescence •  Some file formats are less susceptible to obsolescence than others –  Open, non-proprietary formats (pick TXT over DOCX, CSV over XSLX, TIF over JPG) –  Wide adoption –  History of backward compatibility –  Metadata support in open format (XML)

Slide 33

Slide 33 text

Research Data Management How to work with digital files! •  Formats •  Naming •  Folder structure •  Description •  Collaboration •  Storage •  Sharing •  Publication

Slide 34

Slide 34 text

Case Study 1: Formats (Media Obsolescence)

Slide 35

Slide 35 text

Case Study 2: File Naming + Folder Structure

Slide 36

Slide 36 text

Case Study 3 Storage + Collaboration 1. Does there exist a place for X Lab to store large amounts (100 gb) of data that relates to each publication? Requirements: •  DOI •  Ability to store multiple files and folders •  Ability for people to access without paying any money or a sign-in •  Not connected to Professor X → longevity 2. How do people handle/share computer code related to a publication?

Slide 37

Slide 37 text

Data is disciplinary.

Slide 38

Slide 38 text

h#p://deanbirke#.name/work/research.html

Slide 39

Slide 39 text

“What can I do?”

Slide 40

Slide 40 text

Start the conversation. All you need is… ü  A pinch of boldness ü  A basic understanding of RDM concepts ü  Knowledge of what to promote: RDS!

Slide 41

Slide 41 text

Work with RDS ü  Ask us for suggestions about starting the conversation, including sample text ü  Partner with us to teach RDM best practices to your researchers ü  What are your ideas?

Slide 42

Slide 42 text

researchdata.wisc.edu

Slide 43

Slide 43 text

discussion points •  What do you think is the future of RDM and libraries? •  What do you want to learn about RDM? What are you most curious about? •  What are good strategies for engaging departments in discussion on these topics?

Slide 44

Slide 44 text

What’s Next? Funder Public Access Requirements Wed, Oct. 12, 2:00-3:30 at Steenbock Thurs, Oct. 13, 9:30-11:00 at Memorial Research Data Management & Sharing Wed, Nov. 16, 2:00-3:30 at Steenbock Thurs, Nov. 17, 9:30-11:00 at Memorial Open Access Publishing Wed, Dec. 14, 2:00-3:30 at Steenbock Thurs, Dec. 15, 9:30-11:00 at Memorial Open Research & Reproducibility Wed, Feb. 15, 2:00-3:30 at Steenbock Thurs, Feb 16, 9:30-11:00 at Memorial Authors’ Rights Management Wed, Mar. 15, 2:00-3:30 at Steenbock Thurs, Mar. 16, 9:30-11:00 at Memorial Digital Project Planning Wed, Apr. 12, 2:00-3:30 at Steenbock Thurs, Apr. 13, 9:30-11:00 at Memorial