$30 off During Our Annual Pro Sale. View Details »

Connecting data and literature

Ian Mulvany
October 11, 2013
340

Connecting data and literature

Slides presented at the PLOS ALM13 workshop in San Francisco on 2013-10-11

Ian Mulvany

October 11, 2013
Tweet

Transcript

  1. Connecting data and
    literature
    The idea is
    simple, the
    practice is a little
    bit messy
    @ianmulvany
    Head of
    Technology eLife
    not
    :(
    Tuesday, 15 October 13

    View Slide

  2. What is data?
    Why share it?
    How should we share it?
    The easy bit
    Patterns and examples
    Where can you put your data?
    There's a lot of data
    The hard bit
    The messy bit
    The thinking bit
    Why is it all so complicated?
    What can we do about it?
    Tuesday, 15 October 13

    View Slide

  3. What is data?
    Any artifact that can support the argument,
    AKA “Anything”
    Overwhelmingly transformed into digital
    The digital increasingly encapsulates reasoning
    Tuesday, 15 October 13

    View Slide

  4. Why share it?
    First instance of reuse doubles the utility of the
    data
    Makes your work more reproducible
    Even if it's not fully reusable or
    reproducible, makes it more plausible that
    you actually did something
    Tuesday, 15 October 13

    View Slide

  5. Amsterdam manifesto
    1. Data should be considered citable products of research.
    2. Such data should be held in persistent public repositories.
    3. If a publication is based on data not included with the article, those data should be cited in the publication.
    4. A data citation in a publication should resemble a bibliographic citation and be located in the publication’s reference list.
    5. Such a data citation should include a unique persistent identifier (a DataCite DOI recommended, or other persistent identifiers already
    in use within the community).
    6. The identifier should resolve to a page that either provides direct access to the data or information concerning its accessibility. Ideally,
    that landing page should be machine-actionable to promote interoperability of the data.
    7. If the data are available in different versions, the identifier should provide a method to access the previous or related versions.
    8. Data citation should facilitate attribution of credit to all contributors
    Cite the data already already!
    In the reference list!!
    Tuesday, 15 October 13

    View Slide

  6. Perkin Elmer Opera LX high-throughput microscope system
    1 TB / week, ~ 200,000 images
    Tuesday, 15 October 13

    View Slide

  7. Tuesday, 15 October 13

    View Slide

  8. via http://www.mkomo.com/cost-per-gigabyte
    Tuesday, 15 October 13

    View Slide

  9. EB
    Facebook total circa
    2012
    PB
    TB
    Data associated with
    imaging paper
    GB
    MB ZB
    This Presentation
    Figshare/Dryad
    deposit limit
    Gurdon lab, annual data
    CERN yearly
    EBI
    1993 2005 2025 2038 2050 2063
    Tuesday, 15 October 13

    View Slide

  10. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  11. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  12. Tuesday, 15 October 13

    View Slide

  13. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  14. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  15. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  16. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  17. Supplementary file hosted by Dryad
    Tuesday, 15 October 13

    View Slide

  18. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  19. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  20. screenshot - journal of Neuroscience
    Tuesday, 15 October 13

    View Slide

  21. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  22. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  23. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  24. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  25. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  26. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide


  27. Additional files

    Major dataset
    The following datasets were generated:
    p>

    id="" l:rel="related" content-type="generated-dataset"
    id="dataro1" document-id="Dataset ID and/or url" document-
    type="data" document-id-type="dataset" source-id=""
    source-id-type="hwp" link-type="related" hwp:id="related-
    object-1" hwp:rev-id="xref-related-object-1-1 xref-
    related-object-1-2 xref-related-object-1-3 xref-related-
    object-1-4 xref-related-object-1-5 xref-related-
    object-1-6">

    Chen
    K

    ,

    Johnston
    J

    Tuesday, 15 October 13

    View Slide

  28. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  29. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  30. Is it a Bird? Is it a Plane?
    in reference list:
    publication-type="other"
    Tagged as
    Journal
    http://europepmc.org/articles/
    PMC3646594
    publication-type="thesis"
    http://europepmc.org/articles/
    PMC3722494
    publication-type="webpage"
    http://europepmc.org/articles/
    PMC3626513
    publication-type="journal"
    http://europepmc.org/articles/
    PMC3661987
    Placenta
    Gigascience
    Frontiers in Physiology
    Optical Express
    with thanks to @jomacyntyre
    Tuesday, 15 October 13

    View Slide

  31. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  32. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  33. Tuesday, 15 October 13

    View Slide

  34. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  35. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  36. Mixing nano, micro, entities, topics
    44
    hu, Jun 20, 13
    via Tim Clarke - Harvard
    Tuesday, 15 October 13

    View Slide

  37. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri/identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  38. - don't connect the data to the paper at all
    - refer obliquely to to the data set in the body of the publication
    - link to the data set in the body of the publication via uri /identifier
    - dump the data into supp info
    - deposit in data cite, and hope there is a link to the paper
    - link to the paper from the dataset
    - create a specific section of the paper tagged about data
    - cite the data in the reference list
    - enhance the metadata of the paper in crossmark pointing to the data
    - create a micro publication
    - create a meta-paper about the data
    Patterns
    Ugh Meh Yeah!
    Tuesday, 15 October 13

    View Slide

  39. @up_jors
    Journal of Open Research
    Software
    Tuesday, 15 October 13

    View Slide

  40. Tuesday, 15 October 13

    View Slide

  41. Tim Clarke’s requirements
    for a meta paper
    Only inherently reusable data is published
    Normalize identifiers
    Reverse normal “ratio” of text:data
    Amsterdam data citation principles
    All data is searchable w/ or w/o the paper
    Global metadata catalog in stable archive
    Tuesday, 15 October 13

    View Slide

  42. REST Basic HTTP Auth, OAI
    Where can you put the data, and what can you find out?
    EBI/PDG/BGI ...
    Figshare/projects
    Imeji
    Dryad
    Datacite affiliated store
    Zenodo
    Lab archives
    Dataverse
    Github
    Amazon
    Lab cluster
    Data repo Metrics API
    RDF
    OAI-ORE/PMH RDF
    Many
    REST Oauth
    Yes
    REST Basic HTTP Auth, OAI
    REST Oauth
    REST keys based
    ?
    views/downloads/shared
    web metrics
    views/downloads
    Altmetric
    web metrics
    No
    pull requests forks following
    usage cost
    No No
    OAI-ORE/PMH
    Tuesday, 15 October 13

    View Slide

  43. EBI/PDG/BGI ...
    Figshare/projects
    Imeji
    Dryad
    Datacite affiliated store
    Zenodo
    Lab archives
    Dataverse
    Github
    Amazon
    Lab cluster
    Data repo Metrics API
    !✓

    ?









    ✓✓



    !✓


    I nearly forgot about DataUp/DataOne


    ?
    Tuesday, 15 October 13

    View Slide

  44. How did we get here?
    STM software stack is slow to evolve - a LOT
    of technical debt, can take 6 months to roll
    out a minor feature
    Data has not traditionally been considered a
    first class entity
    There’s a librarian/startup/publisher/
    researcher gap about how to think about the
    world
    Tuesday, 15 October 13

    View Slide

  45. A New Hope
    Git/Github
    Vagrant/Packer/Docker
    rOpenSci ODIN
    Thompson Reuters
    Data Index
    ISA-tab
    Adoption of CC0
    license
    Tools Initiatives
    Tuesday, 15 October 13

    View Slide

  46. http://dx.doi.org/10.7554/eLife.00861
    Tuesday, 15 October 13

    View Slide

  47. 10.5281/zenodo.6960
    Tuesday, 15 October 13

    View Slide

  48. help with examples
    https://github.com/elifesciences/
    publisher-xml-fragment-examples
    Tuesday, 15 October 13

    View Slide

  49. Thank you to
    Ruth Wilson - Scientific Data
    Lars Holm Nielsen - Zenodo
    Jason Swedlow - Glencoe Software
    Earl Beutler - Lab Archives
    Tim Clarke - Harvard
    Basiten Saquet - Imeji
    Friederike Kleinfercher - Imeji
    Jo McEntyre - EBI & EuroPMC
    Geoff Bilder - Crossref
    Georgina Gurnhill - Digital Science
    Tuesday, 15 October 13

    View Slide