Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data management planning for Doctoral Training centres

Jez Cope
September 10, 2013

Data management planning for Doctoral Training centres

Jez Cope

September 10, 2013
Tweet

Other Decks in Education

Transcript

  1. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Outline . . 1 Introduction Doctoral Training Centres Research data management Data Management Planning . . 2 Training approach Interactive exercise Data Management Planning Feedback Resources . . 3 Conclusions
  2. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Research360 project Jisc-funded 18-month project to develop technical and human infrastructure for research data management at the University of Bath, as an exemplar research-intensive university. http://blogs.bath.ac.uk/research360/
  3. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . What is a DTC/CDT? Doctoral Training Centre (DTC) • Funded by a Research Council • Train PhD students in cohorts • Strategic research area • Highly collaborative • Integrated (“1+3”) PhD course • Year 1: small research projects & intensive training • Year 2–4: main PhD research & additional training
  4. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . What is a DTC/CDT? Doctoral Training Centre (DTC) Centre for Sustainable Chemical Technologies • Funded by a Research Council — EPSRC • Train PhD students in cohorts — 12–18/year • Strategic research area — Sustainable chemical technologies • Highly collaborative — 18 commercial partners • Integrated (“1+3”) PhD course • Year 1: small research projects & intensive training • Year 2–4: main PhD research & additional training http://www.bath.ac.uk/csct/dtc/
  5. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Research data management . . Security . Integrity . Collaboration . Curation . Archival . Sharing . Resilience . …
  6. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Data management planning . Principal investigators . . . . . . . . • Demonstrate integrity • Provide value for money • Establish policies & procedures • Comply with funder, publisher & institution • … . Research students . . . . . . . . • Work efficiently & effectively • Prevent lost work • Maintain privacy (personal & commercial) • Manage data day-to-day • …
  7. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Simplified research lifecycle . . Create . Use . Document . Archive & preserve . Publish & re-use
  8. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Workshop structure . . 1 Icebreaker: current perceptions clicker exercise . . 2 General principles and tips, e.g.: • Use and work from secure network storage where possible • Scan paper records (e.g. notebooks) regularly • … . . 3 Data Management Planning exercise . . 4 Wrap-up: repeat clicker exercise
  9. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Workshop slides . . After this workshop you will be able to: • understand what research data are and to whom they belong • appreciate that the management and storing of research data are responsibilities of those who generate the information • learn about research data management strategies and tools • determine how much data needs to be managed • gauge for how long research data need to be maintained . Managing Your Research Data Cathy Pink & Jez Cope (UKOLN/Library) (Chemistry DTC) Image © Jorge Cham Cham, J. (2003). The Four Stages of Data Loss. Retrieved June 28, 2013, from http://www.phdcomics.com/comics/archive.php?comicid=382 http://opus.bath.ac.uk/32296/
  10. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Clicker exercise • Several general statements about research data • For each statement in turn: • Vote (anonymously) via handset • View results graph • Discuss responses . “I am satisfied that my data is safe” . . . . . . . . • Strongly agree • Agree • Neither agree nor disagree • Disagree • Strongly disagree
  11. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Data Management Planning • Intention: consider all aspects of RDM • Flag up gaps in knowledge • Signpost places to find further information • Simple template (Word document) to produce a DMP • Designed around priorities of front-line research • Sample answers for guidance
  12. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . DMP template for PGRs . . Research360 This template is licensed under a Creative Commons Attribution 3.0 Unported License. 2 Looking after your data What different versions of each data file do you create? As I survey new cohorts, data is appended to the dataset and saved as a new file. There is only ever one version of each data file — new experiments create new data, which is stored in a new set of files. Each time I run a new version of my model, intermediate files are written over, but the final results are saved as a new file What additional information is required to understand each data file? I keep additional notes about interviews and participants in a Word document with the audio recordings and transcripts. Abbreviations used for column headings are kept in a separate text document. The content of digital photographs are recorded in the file name. Where do you store your data? My primary copy is on the university X: drive, and I copy files to my laptop to work on while away from the office. How do you structure and name your folders and files? I use the structure <thesis chapter>/<date>-<experiment number>. A folder for each project phase, and within those a folder for each interview. Each filename starts with the date on which the data was collected. How is your data backed up? Data stored on the university research storage system is backed up by BUCS. I make sure I copy the latest versions of my working files there each day. I regularly scan my paper-notebook and store digital copies on the X:drive How will you test whether you can restore from your backups? Weekly check that files on the X: drive are still usable. Sharing your data Who owns the data you generate? According to my studentship agreement, the University owns all data I create. As a self-funded student, I own all intellectual property from my project. Who else has a right to see or use this data? Others in my research group and my supervisor’s industrial partners. Only my supervisor needs access. Who else should reasonably have access? I would like my work to be useful to policy makers in government. What should/shouldn’t be shared and why? All my data is covered by a confidentiality agreement and cannot be shared. Some of my data identifies individual patients and must be anonymised before sharing. . Research360 This template is licensed under a Creative Commons Attribution 3.0 Unported License. 1 Data Management Plan Postgraduate Research Project Overview Researcher: Project title: Project duration: Project context: My research is about… Defining your data Where does your data come from? [The text in grey gives examples of possible answers — use or replace it as needed] I record interviews with my subjects using a digital audio recorder, then transcribe them into text. I test my catalyst under a number of conditions, then submit samples of the products to analysis facilities. I generate data using model code that I’ve written, then process it various ways to produce visualisations. I take high-resolution digital photographs of artefacts recovered in the field, and sometimes send samples off for analysis. I combine existing data from a number of sources [give examples…] and reanalyse them to derive new conclusions. How often do you get new data? All of my data will come from a single 3-month field trip in my second year. I expect to run two or three experiments each week through my second year and much of my third year – about 100 in total. How much data do you generate? Each experiment produces about 50MB of data, so over the course of my PhD I expect this to add up to about 5GB. What format is your data in? Audio recordings are stored as MP3; transcripts are stored in Word documents. Experimental observations are recorded in a paper notebook, while recordings from instruments are stored in the proprietary format of the instrument. http://opus.bath.ac.uk/30772/
  13. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . DMP guidance for PGRs . . How much data do you generate? Try to give this in kB/MB/GB. Start with how much you have so far, and try to estimate how this will grow for the rest of the project, based on your answer to the previous question. If you keep data in a non-digital format, such as a lab notebook, consider how many notebooks you might need. Tip: You can find out the size of an existing file or folder by right-clicking on it in Windows Explorer or Mac Finder and selecting Properties… (Windows) or Get info… (Mac). What data formats do you use? What software is required to access and analyse the data? Is it free/open, and if not are alternatives available? How would you access your data if the university no longer had a license for the software you currently use? What type of data does each format hold? Looking after your data The answers you enter in this section will help you identify how to keep your data safe, and will also make it easier for other researchers to make use of your data after the end of your project, if appropriate. What different versions of each data file do you create? In this context “versions” of a file doesn’t simply refer to multiple copies (such as you might make when backing up), but updated copies when the contents of a file are changed. Do you update or add data to existing files, or do you create new files when you add new data? How do you indicate this in the filename? Do you create additional files during analysis? If so, how do filenames from different stages of your analysis relate to each other? What additional information is required to understand each data file? What would you need to know to reproduce the data? If someone else in your lab or a reader of your papers wanted to replicate your analysis, what would they need to know? If you have used abbreviations or codes in your data, how will others know what they mean? This type of detail is particularly important to record because it is often glossed over in published outputs, where the general method and conclusions are more important than the fine detail. Once you’ve decided what information should be recorded, you should go ahead and record it in a “readme” file (or similar) that you store with your data. You could think about setting up a template to make this quicker for new data. Where do you store your data? If you have more than one copy of your data (say on a laptop and desktop computer) you should decide, early on, which is the primary copy, as this . Completing a Data Management Plan Postgraduate Research Students Introduction This document is intended to help you complete a data management plan, and gives you some extra things to consider that aren’t mentioned in the template. You might not know the answers to everything — if there’s something you’re not sure about, make a note on the plan to find out. You may wish to discuss some or all of it with your supervisor. Type as much (or as little) as you feel you need to into each box: it will expand to accommodate what you write. The text in grey on the template gives examples of possible answers — use or replace it as needed. Data Management Plans, Data Sharing Statements and other similarly-named documents are now required by most research funders. Increasing numbers of universities require them as well. Those required by funders often go into more detail about staff and resources than this template for PGRs, but this template has been specifically designed to help you plan the aspects of data management relevant to a doctoral research project. Completing each section Overview This section is for administrative purposes, to make it clear which project the plan is about. “Project context” need only be two or three sentences summarising your project’s aims. Defining your data Almost all research builds on sources of information to develop and justify conclusions, whether this information is newly created or gathered from existing sources; this is your research data. This section helps you to think about what ‘data’ means for your research. The answers you enter in this section will help you identify what resources, such as storage, you will need to manage your data. They will also help the University to understand and plan for growth in demand Where does your data come from? Is it gathered from experiments? From literature? From existing data on the web? Or from somewhere else? What instruments do you use? How about observations or photos? How often do you get new data? Continuously over a long period or from separate one-off events such as experiments or interviews? How many experiments per week? How will this change over time? http://opus.bath.ac.uk/36009/
  14. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Overall strategy for DTC students • Provide training opportunities (University-wide) for those who need them: • Face-to-face workshops • E-learning module • RDM web pages • Encourage training uptake during year 1 • Require data management plan during year 2 • Provide additional support and signposting to resources on-demand • Possibly part of transfer process • Follow up in later years to assess plan execution
  15. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Feedback . Workshops . . . . . . . . • Overwhelmingly positive • Good balance between lecture and interaction • Always calls for more concrete tips • Worse when we got too “interactive” . DMP template . . . . . . . . (Without workshop) • High relevance for most students • Right level of detail in guidance • Did flag up gaps in knowledge
  16. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Resources • University of Bath resources (CC licensed): • Workshop slides: http://opus.bath.ac.uk/32296/ • DMP template for PGRs: http://opus.bath.ac.uk/30772/ • DMP guidance for PGRs: http://opus.bath.ac.uk/36009/ • Research Data Management web pages: http://www.bath.ac.uk/research/data • Other useful stuff: • Research data MANTRA: http://datalib.edina.ac.uk/mantra/ • PollEverywhere web, SMS & smartphone voting: http://www.polleverywhere.com/
  17. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Summary • PhD students need dedicated training in RDM & DMP • DTCs present an opportunity to provide additional support • Our materials are available (CC licensed) from: http://blogs.bath.ac.uk/research360/
  18. . . . .. . . . .. . .

    . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Acknowledgements • Research360 project team: • Cathy Pink, Institutional Data Scientist co-developed training materials • Matthew Davidson, Dept. of Chemistry • Liz Lyon, UKOLN • Kara Jones, Library • Katy Jordan, Library • John Howell, BUCS • Roger Jardine, BUCS • Maria Wells, Vice-Chancellor’s Office • Katy McKen, RDSO