Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Citation - From Principles to Implementation

Martin Fenner
October 20, 2015

Data Citation - From Principles to Implementation

Presentation given together with Sarah Callaghan at the COPDESS workshop in Oxford October 20, 2015.

Martin Fenner

October 20, 2015
Tweet

More Decks by Martin Fenner

Other Decks in Science

Transcript

  1. Data Citation
    From Principles to Implementation

    View Slide

  2. Sarah Callaghan
    Centre for Environmental Data Analysis
    http://orcid.org/0000-0002-0517-1031
    Martin Fenner
    DataCite Technical Director
    http://orcid.org/0000-0003-1419-2405

    View Slide

  3. Joint Declaration of Data
    Citation Principles
    https://www.force11.org/datacitation

    View Slide

  4. 1. Importance
    Data should be considered legitimate, citable products of
    research. Data citations should be accorded the same
    importance in the scholarly record as citations of other
    research objects, such as publication.

    View Slide

  5. 1. Importance - Examples

    View Slide

  6. 1. Importance - Examples
    http://publications.agu.org/author-resource-center/text-requirements/#refs

    View Slide

  7. 1. Importance - Issues!
    ● Reference limits:
    ○ From Nature’s Manuscript formatting guide (http://www.nature.com/nature/authors/gta/#a5.4)
    “The maximum number of references, strictly enforced, is 50 for Articles and 30 for Letters.
    Only one publication can be listed for each number.”
    ● Information about data citation not getting into the generic author guidelines
    for important publishers
    ● Still need the culture change that data is as important as articles

    View Slide

  8. 2. Credit and Attribution
    Data citations should facilitate giving scholarly credit and
    normative and legal attribution to all contributors to the data,
    recognizing that a single style or mechanism of attribution
    may not be applicable to all data.

    View Slide

  9. Fun Fact
    Credit comes before evidence in Joint Declaration of
    Data Citation Principles.

    View Slide

  10. Does CC0 require others who use my work to give me
    attribution?
    No, and that's a big difference between CC0 and our licenses. Unlike our licenses,
    there are no conditions contained in CC0. Just like anything in the public domain,
    it will be possible for others to use or adapt it however they wish without
    attribution. However, this does not mean that you cannot request attribution in
    accordance with community or professional norms and standards.
    https://wiki.creativecommons.org/wiki/CC0_FAQ

    View Slide

  11. http://www.phdcomics.com/comics/archive.php?comicid=562

    View Slide

  12. http://search.labs.datacite.org/orcid

    View Slide

  13. http://casrai.org/CRediT
    http://www.gigasciencejournal.com/content/3/1/18/about#open-badges
    Project CRediT

    View Slide

  14. 3. Evidence
    In scholarly literature, whenever and wherever a claim relies
    upon data, the corresponding data should be cited.
    http://xkcd.com/882/

    View Slide

  15. 3. Evidence - “Data behind the Graph”

    View Slide

  16. 3. Evidence - Issues
    ● Granularity
    ○ Is it appropriate to assign a citation to just the subset of a larger dataset that underlies a
    particular graph?
    ■ Results in lots of citations to lots of ever-so-slightly-different things
    ○ “Cite the book, not the paragraph” == “Cite the dataset, not the cell number” ??
    ○ How do we generate citations automatically?
    ● Common sense required here
    ○ Communities need to develop their own guidance for what is “common sense”

    View Slide

  17. 4. Unique Identification
    A data citation should include a persistent method for
    identification that is machine actionable, globally unique, and
    widely used by a community.

    View Slide

  18. Figure 3. Multiple Alignment of Ten Conserved Motifs in
    the RAG1 Core Proteins and Transib TPases
    The motifs are underlined and numbered from 1 to 10. Starting positions of the
    motifs immediately follow the corresponding protein names. Distances between
    the motifs are indicated in numbers of aa residues. Black circles denote conserved
    residues that form the RAG1/Transib catalytic DDE triad. The RAG1 proteins are
    as follows: RAG1_XL (GenBank GI no. 2501723, Xenopus laevis, frog),
    RAG1_HS (4557841, Homo sapiens,human), RAG1_GG (131826, Gallus gallus,
    chicken), RAG1_CL (1470117,Carcharhinus leucas, bull shark), RAG1_FR
    (4426834, Fugu rubripes, fugu fish).
    http.//doi.org/10.1371/journal.pbio.0030181
    not machine actionable, not globally unique

    View Slide

  19. Antibodies.
    The antibodies used in this study included the following: rabbit polyclonal
    antibodies to GABA
    A
    receptor α2 (catalog #600-401-D45 RRID:AB_11182018;
    Rockland Immunochemicals), α5 (catalog #AB9678 RRID:AB_570435; Millipore),
    β3 (catalog #ab4046 RRID:AB_2109564; Abcam), γ2 (extracellular epitope,
    catalog #224 003 RRID:AB_2263066; Synaptic Systems), and AMPA receptor
    GluA1 (catalog #AB1504 RRID:AB_2113602; Millipore; and extracellular epitope,
    catalog #PC246-100UG RRID:AB_564636; Millipore) …
    http.//doi.org/10.1523/JNEUROSCI.4415-13.2014
    not machine actionable without context, not globally unique

    View Slide

  20. 17. Yim KM, Ng HW, Chan CK, Yip G, Lau FL. Sibutramine-induced
    acute myocardial infarction in a young lady. Clin Toxicol (Phila).
    2008; 46(9):877-879.
    18. Waszkiewicz N, Zalewska-Szajda B, Szajda SD, Simonienko K,
    Zalewska A, Szulc A et al.. Sibutramine-induced mania as the first
    manifestation of bipolar disorder. BMC Psychiatry. 2012; 12:43.
    19. Yet Another DataTables Column Filter. https://github.
    com/vedmack/yadcf
    http.//doi.org/10.1186/s13321-015-0077-3
    not persistent

    View Slide

  21. Recommendation
    Use persistent identifier expressed as URI,
    e.g. http.//doi.org/10.1186/s13321-015-0077-3.
    Always include basic metadata, e.g. authors, title, publication
    date and publication venue.

    View Slide

  22. 5. Access
    Data citations should facilitate access to the data themselves
    and to such associated metadata, documentation, code, and
    other materials, as are necessary for both humans and
    machines to make informed use of the referenced data.

    View Slide

  23. 5. Access
    http://xkcd.com/1592/

    View Slide

  24. 5. Access - the citation string
    DataCite’s suggested formats for the citation string are:
    ● Creator (PublicationYear): Title. Publisher. Identifier
    ● Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier
    Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from
    ODP Site 127‐797. Geological Institute, University of Tokyo. http://dx.doi.org/10.
    1594/PANGAEA.726855
    The clickable link that facilitates access to the data itself.
    Enough information so if the link doesn’t work, a web search might be able to find
    the resource.

    View Slide

  25. 5. Access - the role of landing pages

    View Slide

  26. 6. Persistence
    Unique identifiers, and metadata describing the data, and its
    disposition, should persist -- even beyond the lifespan of the
    data they describe.

    View Slide

  27. Fun Fact
    The Joint Declaration of Data Citation Principles doesn’t
    use a persistent method for identification.

    View Slide

  28. Metadata for data that have been cited
    should persist.
    Not all research data and their metadata
    can or should persist.
    Metadata for most data that have been
    published should persist.

    View Slide

  29. 7. Specificity and Verifiability
    Data citations should facilitate identification of, access to, and
    verification of the specific data that support a claim. Citations
    or citation metadata should include information about
    provenance and fixity sufficient to facilitate verifying that the
    specific timeslice, version and/or granular portion of data
    retrieved subsequently is the same as was originally cited.

    View Slide

  30. 7. Specificity and Verifiability - “Frozen” Data
    We can meet the requirements of Principle 7 if the
    data (and corresponding metadata) is “frozen”
    - i.e.: complete and finalised, not going to be
    modified or updated
    - also known as “fixity”
    Data isn’t that simple!
    - e.g.: long running data collections
    Proper version control can help here

    View Slide

  31. 7. Specificity and Verifiability - Dynamic Data
    ● Special cases:
    ○ Timeslicing
    ○ Append-only datasets
    ● RDA Working Group on Data Citation (https://rd-alliance.org/groups/data-
    citation-wg.html )
    ○ Recommendations to enable data citation of evolving data (https://rd-alliance.
    org/system/files/documents/RDA-DC-Recommendations_150924.pdf )
    ○ Instead of static data exports or textual descriptions of data subsets, support a
    centric view of data sets.
    ○ Proposed solution enables precise identification of the very subset and version of data used,
    supporting reproducibility of processes, sharing and reuse of data.
    ○ The set of recommendations is undergoing evaluation in a series of pilots in different domains.

    View Slide

  32. 8. Interoperability and
    Flexibility
    Data citation methods should be sufficiently flexible to
    accommodate the variant practices among communities, but
    should not differ so much that they compromise
    interoperability of data citation practices across communities.

    View Slide

  33. Not so happy with this principle (but
    might be the wording).

    View Slide

  34. 8. Interoperability and
    Flexibility (modified)
    Data citation methods should follow users expectations. They
    should not be different from citation methods for journal
    articles or other scholarly content, unless there is a very
    compelling reason to do so. Data citation methods should be
    generic rather than specific to a particular community.

    View Slide

  35. Conclusions?
    Principles are great, but we need to implement
    them.
    The devil is in the details - there will be no “one
    size fits all” solution.
    “Common sense” solutions will work, but will vary
    across communities - important to collaborate to
    keep things moving in (roughly) the same
    directions.
    Don’t let the perfect be the enemy of the good!

    View Slide