Making Data Count

Martin Fenner
January 29, 2016

Presentation given at CWTS Leiden on January 29, 2016.

  1. Goals What metrics for research data do researchers and data

    managers want? Do data repositories make these metrics available? Build services to collect these metrics for all datasets in DataONE repository network
  2. MDC Team Peter Slaughter Dave Vieglais Matt Jones Stephen Abrams

    John Kratz Patricia Cruse Carly Strasser Jennifer Lin Kristen Ratan John Chodacki Martin Fenner Project Partners
 California Digital Library (CDL) DataONE Public Library of Science (PLOS)
  3. THE VALUE OF RESEARCH DATA Metrics for datasets from a

    cultural and technical point of view http://repository.jisc.ac.uk/6205/1/Value_of_Research_Data.pdf
  4. How interested would you be to know each of the

    following about the impact of your data? http://doi.org/10.1038/sdata.2015.39 http://www.dx.doi.org/10.5060/D8H59D
  5. Rewrote Lagotto open source application to handle research outputs beyond

    journal articles (API, admin frontend, relations) https://dlm.datacite.org/works/doi/10.5061/dryad.kh886
  6. Wrote import pipeline to regularly import new DataONE datasets. Handles

    persistent identifiers beyond DOIs, including URLs. http://dlm.datacite.org/status
  7. Wrote new sources for data metrics, including DataONE usage stats

    and data citations found in open access content https://dlm.datacite.org/sources/europe_pmc_fulltext
  8. Started collecting data from more than 20 sources, including Mendeley,

    Facebook and Wikipedia https://dlm.datacite.org/works?source_id=mendeley
  9. Metadata of articles References are part of the metadata deposited

    to CrossRef Cited-by service aggregates these citations for CrossRef DOIs Work is underway to include Crossref DOI <-> DataCite DOI links
  10. http://doi.org/10.1038/ncomms9212 … For instance, although there are estimated 18,000 butterfly

    species, there are currently only 6 butterfly genome sequences7, 8, 9, 10, 11… Citations 7-11 are all for journal articles, not datasets. Second Order Events
  11. Usage Stats aggregate DataOne usage log files from DataOne member

    nodes parse logs, applying COUNTER rules • observe double-click intervals • exclude blacklisted useragents two versions of usage stats • COUNTER-compliant • partial compliant (include some machines)
  12. Average % of not filtered since 2005COUNTER 63.57% Partial 63.59%

    2015COUNTER 44.88% Partial 47.05% Usage Stats
  13. Next Steps Analyze usage statistics in more detail Analyze second

    order citations Analyze influence of persistent identifier Do similar project with scientific software Turn research project into service