Slide 1

Slide 1 text

Making Data Count Martin Fenner DataCite Technical Director http://orcid.org/0000-0003-1419-2405

Slide 2

Slide 2 text

U.S. National Science Foundation Grant http://www.nsf.gov/awardsearch/showAward? AWD_ID=1448821 
 September 2014 – February 2016 Project Page http://mdc.lagotto.io Making Data Count

Slide 3

Slide 3 text

Goals What metrics for research data do researchers and data managers want? Do data repositories make these metrics available? Build services to collect these metrics for all datasets in DataONE repository network

Slide 4

Slide 4 text

MDC Team Peter Slaughter Dave Vieglais Matt Jones Stephen Abrams John Kratz Patricia Cruse Carly Strasser Jennifer Lin Kristen Ratan John Chodacki Martin Fenner Project Partners
 California Digital Library (CDL) DataONE Public Library of Science (PLOS)

Slide 5

Slide 5 text

https://www.dataone.org/

Slide 6

Slide 6 text

THE VALUE OF RESEARCH DATA Metrics for datasets from a cultural and technical point of view http://repository.jisc.ac.uk/6205/1/Value_of_Research_Data.pdf

Slide 7

Slide 7 text

Survey

Slide 8

Slide 8 text

How interested would you be to know each of the following about the impact of your data? http://doi.org/10.1038/sdata.2015.39 http://www.dx.doi.org/10.5060/D8H59D

Slide 9

Slide 9 text

What metrics/statistics does your repository currently track and expose? http://doi.org/10.1038/sdata.2015.39 http://www.dx.doi.org/10.5060/D8H59D

Slide 10

Slide 10 text

http://doi.org/10.1371/journal.pone.0117619

Slide 11

Slide 11 text

Tool Building

Slide 12

Slide 12 text

Rewrote Lagotto open source application to handle research outputs beyond journal articles (API, admin frontend, relations) https://dlm.datacite.org/works/doi/10.5061/dryad.kh886

Slide 13

Slide 13 text

Wrote import pipeline to regularly import new DataONE datasets. Handles persistent identifiers beyond DOIs, including URLs. http://dlm.datacite.org/status

Slide 14

Slide 14 text

Wrote new sources for data metrics, including DataONE usage stats and data citations found in open access content https://dlm.datacite.org/sources/europe_pmc_fulltext

Slide 15

Slide 15 text

Started collecting data from more than 20 sources, including Mendeley, Facebook and Wikipedia https://dlm.datacite.org/works?source_id=mendeley

Slide 16

Slide 16 text

Citations

Slide 17

Slide 17 text

Metadata of dataset

Slide 18

Slide 18 text

Metadata of articles References are part of the metadata deposited to CrossRef Cited-by service aggregates these citations for CrossRef DOIs Work is underway to include Crossref DOI <-> DataCite DOI links

Slide 19

Slide 19 text

Fulltext search https://dlm.datacite.org/works/doi.org/10.5061/dryad.f1cb2

Slide 20

Slide 20 text

Second Order Events https://dlm.datacite.org/sources/pmceurope

Slide 21

Slide 21 text

http://doi.org/10.1038/ncomms9212 … For instance, although there are estimated 18,000 butterfly species, there are currently only 6 butterfly genome sequences7, 8, 9, 10, 11… Citations 7-11 are all for journal articles, not datasets. Second Order Events

Slide 22

Slide 22 text

Downloads

Slide 23

Slide 23 text

Usage Stats aggregate DataOne usage log files from DataOne member nodes parse logs, applying COUNTER rules • observe double-click intervals • exclude blacklisted useragents two versions of usage stats • COUNTER-compliant • partial compliant (include some machines)

Slide 24

Slide 24 text

Average % of not filtered since 2005COUNTER 63.57% Partial 63.59% 2015COUNTER 44.88% Partial 47.05% Usage Stats

Slide 25

Slide 25 text

Next Steps Analyze usage statistics in more detail Analyze second order citations Analyze influence of persistent identifier Do similar project with scientific software Turn research project into service

Slide 26

Slide 26 text

Scientific Software https://ls.datacite.org/works?source_id=github

Slide 27

Slide 27 text

https://search.labs.datacite.org/?q=landsat&publicationYear=2016 Integration into Search

Slide 28

Slide 28 text

https://rd-alliance.org/groups/rdawds-publishing-data-services- wg.html Build Services Infrastructure

Slide 29

Slide 29 text

http://blog.crossref.org/2015/09/det-poised-for-launch.html