Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using data management plans as a research tool for improving data services in academic libraries

Using data management plans as a research tool for improving data services in academic libraries

An invited talk at Caltech Libraries

Amanda Lea Whitmire

February 10, 2017
Tweet

Other Decks in Research

Transcript

  1. USING DATA MANAGEMENT PLANS as a RESEARCH TOOL for IMPROVING

    DATA SERVICES in ACADEMIC LIBRARIES Amanda Whitmire Head Librarian & Bibliographer, Harold A Miller Library Assistant to the Director, Hopkins Marine Station California Institute of Technology 10 February 2017
  2. This project was made possible in part by the Institute

    of Museum and Library Services grant number LG-07-13-0328. Amanda Whitmire | @AWhitTwit | Stanford University Libraries Jake Carlson | @jrcarlso | University of Michigan Library Patricia M. Hswe | @pmhswe | Andrew Mellon Foundation Susan Wells Parham | Georgia Institute of Technology Library Brian Westra | @bdwestra | University of Oregon Libraries D A R T Team DART Project | @DMPResearch Project site: https://osf.io/kh2y6/
  3. the basics DMP review workshops website mid-level dedicated “research services”

    metadata support facilitate deposit in DRs consults high level infrastructure data curation From: Reznik-Zellen, Rebecca C.; Adamick, Jessica; and McGinty, Stephen. (2012). "Tiers of Research Data Support Services." Journal of eScience Librarianship 1(1): Article 5. http://dx.doi.org/10.7191/jeslib.2012.1002
  4. NSF Directorate or Division NSF Directorate or Division BIO Biological

    Sciences ENG Engineering DBI Biological Infrastructure CBET Chemical, Bioengineering, Environmental, & Transport Systems DEB Environmental Biology CMMI Civil, Mechanical & Manufacturing Innovation EF Emerging Frontiers Office ECCS Electrical, Communications & Cyber Systems IOS Integrative Organismal Systems EEC Engineering Education & Centers MCB Molecular & Cellular Biosciences EFRI Emerging Frontiers in Research & Innovation CISE Computer & Information Science & Engineering IIP Industrial Innovation & Partnerships ACI Advanced Cyberinfrastructure CCF Computing & Communication Foundations GEO Geosciences CNS Computer & Network Systems AGS Atmospheric & Geospace Sciences IIS Information & Intelligent Systems EAR Earth Sciences EHR Education & Human Resources OCE Ocean Sciences DGE Division of Graduate Education PLR Polar Programs DRL Research on Learning in Formal & Informal Settings MPS Mathematical & Physical Sciences DUE Undergraduate Education AST Astronomical Sciences HRD Human Resources Development CHE Chemistry SBE Social, Behavioral & Economic Sciences DMR Materials Research BCS Behavioral & Cognitive Sciences DMS Mathematical Sciences SES Social & Economic Sciences PHY Physics SMA SBE Office of Multidisciplinary Activities
  5. Source Guidance text NSF guidelines The standards to be used

    for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies) BIO Describe the data that will be collected, and the data and metadata formats and standards used. CSE The DMP should cover the following, as appropriate for the project: ...other types of information that would be maintained and shared regarding data, e.g. the means by which it was generated, detailed analytical and procedural information required to reproduce experimental results, and other metadata ENG Data formats and dissemination. The DMP should describe the specific data formats, media, and dissemination approaches that will be used to make data available to others, including any metadata GEO AGS Data Format: Describe the format in which the data or products are stored (e.g. hardcopy logs and/or instrument outputs, ASCII, XML files, HDF5, CDF, etc).
  6. Measures of IRR 1. Percentage agreement | not for ordinal

    data; overestimates agreement 2. Cronbach’s alpha | works for 2 raters only 3. Cohen’s kappa | used for nominal data; works for 2 raters only 4. Fleiss’s kappa | for nominal variables 5. Intra-class correlation (ICC) | perfect!
  7. Intra-class correlation (ICC) Variance due to rated subjects (DMPs) ICC

    = (Variance due to DMPs + Variance due to raters + Residual Variance) 6 variations of ICC – must choose carefully based on study design Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin. 1979; 86(2):420–428. McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychological Methods. 1996; 1(1):30–46.
  8. Intra-class correlation (ICC) ICC_results <- icc(ratingsData, model="twoway", type="agreement", unit="single") “two-way”

    | vs. one-way; raters are random & DMPs are random “agreement” | vs. consistency; looking for absolute agreement b/w raters “single” | vs. average; single ratings are used, not averages of ratings
  9. Inter-rater reliability: round 1 Mean = 0.731 | Median =

    0.759 Standard Deviation = 0.146 Mean = 0.487 | Median = 0.464 Standard Deviation = 0.112 5 16 3 1 0-0.39 = poor | 0.40 – 0.59 = fair | 0.60 – 0.74 = good | 0.75 – 1 = excellent
  10. Inter-rater reliability: round 2 Mean = 0.731 | Median =

    0.759 Standard Deviation = 0.146 Mean = 0.487 | Median = 0.464 Standard Deviation = 0.112 5 5 16 7 12 3 1 0-0.39 = poor | 0.40 – 0.59 = fair | 0.60 – 0.74 = good | 0.75 – 1 = excellent
  11. https:/ /osf.io/kh2y6/ Find the rubric See the survey we used

    to collect assessment data Look at our DMP assessment data
  12. Performance Level Performance Criteria Complete / detailed Addressed issue, but

    incomplete Did not address issue Directorates General Assessment Criteria Describes what types of data will be captured, created or collected Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project No details included, fails to adequately describe data types. All Directorate- or division- specific assessment criteria Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.) Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant. Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices. Does not clearly address how data will be captured or created. GEO AGS, GEO EAR SGP, MPS AST Identifies how much data (volume) will be produced Amount of expected data (MB, GB, TB, etc.) is clearly specified. Amount of expected data (GB, TB, etc.) is vaguely specified. Amount of expected data (GB, TB, etc.) is NOT specified. GEO EAR SGP, GEO AGS
  13. DMP performance level ratings for the criterion: “Describes what types

    of data will be captured, created or collected.” Complete/ detailed (%) Addressed issue, but incomplete (%) Did not address (%) All 68.4 22.2 9.5 BIO 89.7 11.5 5.8 CISE 53.0 31.8 15.2 ENG 73.6 19.8 6.6 GEO 63.9 28.9 7.2 MPS 61.2 23.5 15.3 SBE 82.0 12.0 6.0
  14. Methods of sharing research data as described in NSF data

    management plans. Numbers are percentages (shaded by color according to the scale). Susan Wells Parham et al., ‘Using Data Management Plans to Explore Variability in Research Data Management Practices across Domains’, International Journal of Digital Curation 11, no. 1 (10 May 2016): 53–67, doi:10.2218/ijdc.v11i1.423.
  15. BIO vs. all DMPs: Traits of intent to share data

    83% 54% 38% 68% 48% 19% described data completely described data formats completely identified metadata standards completely BIO ALL DMPs
  16. BIO: Distribution of modes of sharing 0% 6% 27% 75%

    27% 2% 13% 23% 0% 8% 0% Did not specify Institutional repository Journal / supplement Data center /repository Other Book Personal website On request ETD Conference / proceedings Not planning to share data BIO DMPs rated highest for sharing via data centers and repositories. Only 34% of DMPs overall said data would be shared this way.
  17. GenBank (14) Dryad (12) SRA (11) iDigBio (3) KNB (3)

    MorphBank (3) NCBI (3) TreeBASE (2) Repositories mentioned (N) What’s going on with BIO? Infrastructure!