Connecting data and literature The idea is simple, the practice is a little bit messy @ianmulvany Head of Technology eLife not :( Tuesday, 15 October 13
What is data? Why share it? How should we share it? The easy bit Patterns and examples Where can you put your data? There's a lot of data The hard bit The messy bit The thinking bit Why is it all so complicated? What can we do about it? Tuesday, 15 October 13
What is data? Any artifact that can support the argument, AKA “Anything” Overwhelmingly transformed into digital The digital increasingly encapsulates reasoning Tuesday, 15 October 13
Why share it? First instance of reuse doubles the utility of the data Makes your work more reproducible Even if it's not fully reusable or reproducible, makes it more plausible that you actually did something Tuesday, 15 October 13
Amsterdam manifesto 1. Data should be considered citable products of research. 2. Such data should be held in persistent public repositories. 3. If a publication is based on data not included with the article, those data should be cited in the publication. 4. A data citation in a publication should resemble a bibliographic citation and be located in the publication’s reference list. 5. Such a data citation should include a unique persistent identifier (a DataCite DOI recommended, or other persistent identifiers already in use within the community). 6. The identifier should resolve to a page that either provides direct access to the data or information concerning its accessibility. Ideally, that landing page should be machine-actionable to promote interoperability of the data. 7. If the data are available in different versions, the identifier should provide a method to access the previous or related versions. 8. Data citation should facilitate attribution of credit to all contributors Cite the data already already! In the reference list!! Tuesday, 15 October 13
EB Facebook total circa 2012 PB TB Data associated with imaging paper GB MB ZB This Presentation Figshare/Dryad deposit limit Gurdon lab, annual data CERN yearly EBI 1993 2005 2025 2038 2050 2063 Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
Is it a Bird? Is it a Plane? in reference list: publication-type="other" Tagged as Journal http://europepmc.org/articles/ PMC3646594 publication-type="thesis" http://europepmc.org/articles/ PMC3722494 publication-type="webpage" http://europepmc.org/articles/ PMC3626513 publication-type="journal" http://europepmc.org/articles/ PMC3661987 Placenta Gigascience Frontiers in Physiology Optical Express with thanks to @jomacyntyre Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri/identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
- don't connect the data to the paper at all - refer obliquely to to the data set in the body of the publication - link to the data set in the body of the publication via uri /identifier - dump the data into supp info - deposit in data cite, and hope there is a link to the paper - link to the paper from the dataset - create a specific section of the paper tagged about data - cite the data in the reference list - enhance the metadata of the paper in crossmark pointing to the data - create a micro publication - create a meta-paper about the data Patterns Ugh Meh Yeah! Tuesday, 15 October 13
Tim Clarke’s requirements for a meta paper Only inherently reusable data is published Normalize identifiers Reverse normal “ratio” of text:data Amsterdam data citation principles All data is searchable w/ or w/o the paper Global metadata catalog in stable archive Tuesday, 15 October 13
REST Basic HTTP Auth, OAI Where can you put the data, and what can you find out? EBI/PDG/BGI ... Figshare/projects Imeji Dryad Datacite affiliated store Zenodo Lab archives Dataverse Github Amazon Lab cluster Data repo Metrics API RDF OAI-ORE/PMH RDF Many REST Oauth Yes REST Basic HTTP Auth, OAI REST Oauth REST keys based ? views/downloads/shared web metrics views/downloads Altmetric web metrics No pull requests forks following usage cost No No OAI-ORE/PMH Tuesday, 15 October 13
How did we get here? STM software stack is slow to evolve - a LOT of technical debt, can take 6 months to roll out a minor feature Data has not traditionally been considered a first class entity There’s a librarian/startup/publisher/ researcher gap about how to think about the world Tuesday, 15 October 13
A New Hope Git/Github Vagrant/Packer/Docker rOpenSci ODIN Thompson Reuters Data Index ISA-tab Adoption of CC0 license Tools Initiatives Tuesday, 15 October 13