Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Connecting data and literature

Ian Mulvany
October 11, 2013

Connecting data and literature

Slides presented at the PLOS ALM13 workshop in San Francisco on 2013-10-11

Ian Mulvany

October 11, 2013


  1. Connecting data and literature The idea is simple, the practice

    is a little bit messy @ianmulvany Head of Technology eLife not :( Tuesday, 15 October 13
  2. What is data? Why share it? How should we share

    it? The easy bit Patterns and examples Where can you put your data? There's a lot of data The hard bit The messy bit The thinking bit Why is it all so complicated? What can we do about it? Tuesday, 15 October 13
  3. What is data? Any artifact that can support the argument,

    AKA “Anything” Overwhelmingly transformed into digital The digital increasingly encapsulates reasoning Tuesday, 15 October 13
  4. Why share it? First instance of reuse doubles the utility

    of the data Makes your work more reproducible Even if it's not fully reusable or reproducible, makes it more plausible that you actually did something Tuesday, 15 October 13
  5. Amsterdam manifesto 1. Data should be considered citable products of

    research. 2. Such data should be held in persistent public repositories. 3. If a publication is based on data not included with the article, those data should be cited in the publication. 4. A data citation in a publication should resemble a bibliographic citation and be located in the publication’s reference list. 5. Such a data citation should include a unique persistent identifier (a DataCite DOI recommended, or other persistent identifiers already in use within the community). 6. The identifier should resolve to a page that either provides direct access to the data or information concerning its accessibility. Ideally, that landing page should be machine-actionable to promote interoperability of the data. 7. If the data are available in different versions, the identifier should provide a method to access the previous or related versions. 8. Data citation should facilitate attribution of credit to all contributors Cite the data already already! In the reference list!! Tuesday, 15 October 13
  6. Perkin Elmer Opera LX high-throughput microscope system 1 TB /

    week, ~ 200,000 images Tuesday, 15 October 13
  7. EB Facebook total circa 2012 PB TB Data associated with

    imaging paper GB MB ZB This Presentation Figshare/Dryad deposit limit Gurdon lab, annual data CERN yearly EBI 1993 2005 2025 2038 2050 2063 Tuesday, 15 October 13
  8. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  9. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  10. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  11. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  12. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  13. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  14. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  15. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  16. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  17. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  18. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  19. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  20. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  21. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  22. <sec sec-type="supplementary-material" hwp:id="sec-31"> <title hwp:id="title-49">Additional files</title> <sec sec-type="datasets" hwp:id="sec-32"> <title

    hwp:id="title-50">Major dataset</title> <p hwp:id="p-115">The following datasets were generated:</ p> <p hwp:id="p-116"> <related-object l:ref="Dataset ID and/or url" hwp:source- id="" l:rel="related" content-type="generated-dataset" id="dataro1" document-id="Dataset ID and/or url" document- type="data" document-id-type="dataset" source-id="" source-id-type="hwp" link-type="related" hwp:id="related- object-1" hwp:rev-id="xref-related-object-1-1 xref- related-object-1-2 xref-related-object-1-3 xref-related- object-1-4 xref-related-object-1-5 xref-related- object-1-6"> <name name-style="western" hwp:sortable="Chen K"> <surname>Chen</surname> <given-names>K</given-names> </name> <x xml:space="preserve">,</x> <name name-style="western" hwp:sortable="Johnston J"> <surname>Johnston</surname> <given-names>J</given-names> </name> Tuesday, 15 October 13
  23. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  24. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  25. Is it a Bird? Is it a Plane? in reference

    list: publication-type="other" Tagged as Journal http://europepmc.org/articles/ PMC3646594 publication-type="thesis" http://europepmc.org/articles/ PMC3722494 publication-type="webpage" http://europepmc.org/articles/ PMC3626513 publication-type="journal" http://europepmc.org/articles/ PMC3661987 Placenta Gigascience Frontiers in Physiology Optical Express with thanks to @jomacyntyre Tuesday, 15 October 13
  26. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  27. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  28. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  29. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  30. Mixing nano, micro, entities, topics 44 hu, Jun 20, 13

    via Tim Clarke - Harvard Tuesday, 15 October 13
  31. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  32. - don't connect the data to the paper at all

    - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri /identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
  33. Tim Clarke’s requirements for a meta paper Only inherently reusable

    data is published Normalize identifiers Reverse normal “ratio” of text:data Amsterdam data citation principles All data is searchable w/ or w/o the paper Global metadata catalog in stable archive Tuesday, 15 October 13
  34. REST Basic HTTP Auth, OAI Where can you put the

    data, and what can you find out? EBI/PDG/BGI ... Figshare/projects Imeji Dryad Datacite affiliated store Zenodo Lab archives Dataverse Github Amazon Lab cluster Data repo Metrics API RDF OAI-ORE/PMH RDF Many REST Oauth Yes REST Basic HTTP Auth, OAI REST Oauth REST keys based ? views/downloads/shared web metrics views/downloads Altmetric web metrics No pull requests forks following usage cost No No OAI-ORE/PMH Tuesday, 15 October 13
  35. EBI/PDG/BGI ... Figshare/projects Imeji Dryad Datacite affiliated store Zenodo Lab

    archives Dataverse Github Amazon Lab cluster Data repo Metrics API !✓ ✓ ? ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓✓ ✕ ✕ ✕ !✓ ✓ ⁉ I nearly forgot about DataUp/DataOne ✓ ✓ ? Tuesday, 15 October 13
  36. How did we get here? STM software stack is slow

    to evolve - a LOT of technical debt, can take 6 months to roll out a minor feature Data has not traditionally been considered a first class entity There’s a librarian/startup/publisher/ researcher gap about how to think about the world Tuesday, 15 October 13
  37. A New Hope Git/Github Vagrant/Packer/Docker rOpenSci ODIN Thompson Reuters Data

    Index ISA-tab Adoption of CC0 license Tools Initiatives Tuesday, 15 October 13
  38. Thank you to Ruth Wilson - Scientific Data Lars Holm

    Nielsen - Zenodo Jason Swedlow - Glencoe Software Earl Beutler - Lab Archives Tim Clarke - Harvard Basiten Saquet - Imeji Friederike Kleinfercher - Imeji Jo McEntyre - EBI & EuroPMC Geoff Bilder - Crossref Georgina Gurnhill - Digital Science Tuesday, 15 October 13