Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Data Count

Martin Fenner
January 29, 2016

Making Data Count

Presentation given at CWTS Leiden on January 29, 2016.

Martin Fenner

January 29, 2016
Tweet

More Decks by Martin Fenner

Other Decks in Science

Transcript

  1. Making Data
    Count
    Martin Fenner
    DataCite Technical Director
    http://orcid.org/0000-0003-1419-2405

    View Slide

  2. U.S. National Science Foundation Grant
    http://www.nsf.gov/awardsearch/showAward?
    AWD_ID=1448821

    September 2014 – February 2016
    Project Page
    http://mdc.lagotto.io
    Making Data Count

    View Slide

  3. Goals
    What metrics for research data do
    researchers and data managers want?
    Do data repositories make these metrics
    available?
    Build services to collect these metrics for
    all datasets in DataONE repository network

    View Slide

  4. MDC Team
    Peter Slaughter
    Dave Vieglais
    Matt Jones
    Stephen Abrams
    John Kratz
    Patricia Cruse
    Carly Strasser
    Jennifer Lin
    Kristen Ratan
    John Chodacki
    Martin Fenner
    Project Partners

    California Digital Library (CDL)
    DataONE
    Public Library of Science (PLOS)

    View Slide

  5. https://www.dataone.org/

    View Slide

  6. THE VALUE OF RESEARCH DATA
    Metrics for datasets from a cultural and technical point of view
    http://repository.jisc.ac.uk/6205/1/Value_of_Research_Data.pdf

    View Slide

  7. Survey

    View Slide

  8. How interested would you be to know each of
    the following about the impact of your data?
    http://doi.org/10.1038/sdata.2015.39
    http://www.dx.doi.org/10.5060/D8H59D

    View Slide

  9. What metrics/statistics does your repository
    currently track and expose?
    http://doi.org/10.1038/sdata.2015.39
    http://www.dx.doi.org/10.5060/D8H59D

    View Slide

  10. http://doi.org/10.1371/journal.pone.0117619

    View Slide

  11. Tool Building

    View Slide

  12. Rewrote Lagotto open source application to handle research
    outputs beyond journal articles (API, admin frontend,
    relations) https://dlm.datacite.org/works/doi/10.5061/dryad.kh886

    View Slide

  13. Wrote import pipeline to regularly import new DataONE
    datasets. Handles persistent identifiers beyond DOIs,
    including URLs.
    http://dlm.datacite.org/status

    View Slide

  14. Wrote new sources for data metrics, including DataONE
    usage stats and data citations found in open access content
    https://dlm.datacite.org/sources/europe_pmc_fulltext

    View Slide

  15. Started collecting data from more than 20 sources,
    including Mendeley, Facebook and Wikipedia
    https://dlm.datacite.org/works?source_id=mendeley

    View Slide

  16. Citations

    View Slide

  17. Metadata of dataset

    View Slide

  18. Metadata of articles
    References are part of the metadata
    deposited to CrossRef
    Cited-by service aggregates these citations
    for CrossRef DOIs
    Work is underway to include Crossref DOI
    <-> DataCite DOI links

    View Slide

  19. Fulltext search
    https://dlm.datacite.org/works/doi.org/10.5061/dryad.f1cb2

    View Slide

  20. Second Order Events
    https://dlm.datacite.org/sources/pmceurope

    View Slide

  21. http://doi.org/10.1038/ncomms9212
    … For instance, although there are estimated 18,000
    butterfly species, there are currently only 6 butterfly
    genome sequences7, 8, 9, 10, 11…
    Citations 7-11 are all for journal articles, not datasets.
    Second Order Events

    View Slide

  22. Downloads

    View Slide

  23. Usage Stats
    aggregate DataOne usage log files from
    DataOne member nodes
    parse logs, applying COUNTER rules
    • observe double-click intervals
    • exclude blacklisted useragents
    two versions of usage stats
    • COUNTER-compliant
    • partial compliant (include some machines)

    View Slide

  24. Average % of
    not filtered
    since 2005COUNTER 63.57%
    Partial 63.59%
    2015COUNTER 44.88%
    Partial 47.05%
    Usage Stats

    View Slide

  25. Next Steps
    Analyze usage statistics in more detail
    Analyze second order citations
    Analyze influence of persistent identifier
    Do similar project with scientific software
    Turn research project into service

    View Slide

  26. Scientific Software
    https://ls.datacite.org/works?source_id=github

    View Slide

  27. https://search.labs.datacite.org/?q=landsat&publicationYear=2016
    Integration into Search

    View Slide

  28. https://rd-alliance.org/groups/rdawds-publishing-data-services-
    wg.html
    Build Services
    Infrastructure

    View Slide

  29. http://blog.crossref.org/2015/09/det-poised-for-launch.html

    View Slide