Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open Metrics

Martin Fenner
September 09, 2014

Open Metrics

Presentation given at OA Days in Cologne, Germany.

Martin Fenner

September 09, 2014
Tweet

More Decks by Martin Fenner

Other Decks in Science

Transcript

  1. !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    Martin Fenner
    Technical Lead Article-Level Metrics
    Public Library of Science
    Open Metrics
    http://www.flickr.com/photos/batega/2056949264/

    View Slide

  2. 2
    3
    A piece of data or content is open if anyone
    is free to use, reuse, and redistribute it —
    subject only, at most, to the requirement to
    attribute and/or share-alike.!
    !
    http://opendefinition.org/

    View Slide

  3. 3
    3
    It has become more important where we
    publish than what we publish.
    !
    The tools we use for impact assessment are
    slow, limited and proprietary.

    View Slide

  4. 4
    3
    We need to move beyond Open Access
    for all research outputs.
    !
    Open Metrics are needed for the timely,
    comprehensive and reproducible
    evaluation of the impact of research.

    View Slide

  5. Open up (and standardize)
    the metadata
    5
    3
    Funding Information
    Institutions
    Authors
    Subject Area !
    Classification
    ?

    View Slide

  6. Open up reference lists (and
    make them machine-readable)
    6
    3
    Outcomes
    N All the data described in this article are available at http://
    europepmc.org/ftp/oa/AccNoAnalysisData/.
    N The Whatizit ANA pipeline for ENA, UniProt and PDB
    accession numbers is integrated into the ePMC infrastructure
    and all the gathered accession numbers are available via the
    ePMC web site and web services (http://europepmc.org/
    WebServices).
    N The extensions and improvements to the Whatizit ANA
    pipeline will be applied to the ePMC core program of named
    entity recognition and will be available via the web site and
    web services.
    N Tagged versions of the OA article set will be made available on
    an ongoing basis from the FTP site in the future.
    Acknowledgments
    We would like to thank the Rebholz research group at the EBI (2003-2012)
    for developing the Whatizit service, the EBI Literature Services Group for
    the development of many of the core data services used in this study,
    Andrew Caines for help producing the figures and Alex Bateman for
    critical reading of the manuscript.
    Author Contributions
    Conceived and designed the experiments: S
    ¸K JK JRM. Performed the
    experiments: S
    ¸K JK. Analyzed the data: S
    ¸K JK JRM. Wrote the paper: S
    ¸K
    JK JRM.
    References
    1. Kahn P, Hazledine D (1988) NAR’s new requirement for data submission to the
    EMBL data library: information for authors. Nucleic Acids Res 16(10): I–IV.
    2. Science as an Open Enterprise (2012) The Royal Society. Available: http://
    royalsociety.org/policy/projects/science-public-enterprise/report/. Accessed
    2013 Apr 8.
    3. Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Yepes AJ (2007) Text
    processing through Web services: calling Whatizit. Bioinformatics 24(2):296–
    298.
    4. McEntyre JR, Ananiadou S, Andrews S, Black WJ, Boulderstone R, et al. (2011)
    UKPMC: a full text article resource for the life sciences. Nucleic Acids Res
    39:d58–65.
    5. Ne
    ´ve
    ´ol A, Wilbur WJ, Lu Z (2011) Extraction of data deposition statements from
    the literature: a method for automatically tracking research results. Bioinfor-
    matics 27(23):3306–3312.
    6. Ne
    ´ve
    ´ol A, Wilbur WJ, Lu Z (2012) Improving links between literature and
    biological data with text mining: a case study with GEO, PDB and MEDLINE.
    Database (Oxford) 2012: bas026.
    7. Fink JL, Kushch S, Williams PR, Bourne PE (2008) BioLit: integrating biological
    literature with databases. Nucleic Acids Res 36(Web Server Issue):W385–9.
    8. Haeussler M, Gerner M, Bergman CM (2011) Annotating genes and genomes
    with DNA sequences extracted from biomedical articles. Bioinformatics
    27(7):980–6.
    9. Finn RD, Mistry J, Tate J, Coggill P, Heger A, et al. (2010) The Pfam protein
    families database. Nucleic Acids Res Database Issue 38:D211–222.
    10. Parkinson H, Sarkans U, Shojatalab M, Abeygunawardena N, Contrino S, et al.
    (2005) ArrayExpress – a public repository for microarray gene expression data at
    the EBI. Nucleic Acids Res (2005) 33 (Suppl 1): D553–D555.
    11. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, et al. (2002)
    InterPro: an integrated documentation resource for protein families, domains
    and functional sites. Brief Bioinform (3):225–35.
    http://dx.doi.org/10.1371/journal.pone.0063184

    View Slide

  7. Open up links to data and
    other resources
    7
    3
    http://dx.doi.org/10.1371/journal.pone.0096617

    View Slide

  8. 8
    3
    Open up what we measure
    RESEARCH ARTICLE
    VIEWED SAVED DISCUSSED RECOMMENDED CITED
    PLOS HTML
    PLOS PDF
    PLOS XML
    PMC HTML
    PMC PDF
    CiteULike
    Mendely
    NatureBlogs
    ScienceSeeker
    ResearchBlogging
    PLOS Comments
    Wikipedia
    Twitter
    Facebook
    F1000 Prime CrossRef
    PMC
    Web of Science
    Scopus
    Increasing Engagement
    http://dx.doi.org/10.3789/isqv25no2.2013.04

    View Slide

  9. 3
    yes!
    87 %
    no!
    13 %
    Data from 9,969 random CrossRef DOIs from 2011 and 2012, collected June 22, 2014.
    Facebook Debugger: https://developers.facebook.com/tools/debug!
    DOIs: http://dx.doi.org/10.6084/M9.FIGSHARE.821209 (2011) and http://dx.doi.org/
    10.6084/M9.FIGSHARE.821213 (2012)
    Facebook can’t resolve all
    DOIs to a canonical URL
    !
    Cookies!
    Circular redirects!
    Permissions!
    Canonical URL mismatch
    Open up article landing pages

    View Slide

  10. 10
    3
    Open up what is measured
    r 13
    Articles Books Datasets Software

    View Slide

  11. !
    11
    3
    Days since publication!
    August 27, 2014
    Open up the time window
    http://alm.plos.org/articles/info:doi/10.1371/journal.pone.0105948

    View Slide

  12. 12
    3
    http://blog.zooniverse.org/
    Open up authorship

    View Slide

  13. 13
    3
    http://article-level-metrics.plos.org/plos-alm-data/
    Open up the metrics data

    View Slide

  14. https://github.com/articlemetrics/alm-report
    Open up the software to collect
    and analyze the metrics data

    View Slide

  15. https://github.com/articlemetrics/alm-report
    Open up the service to collect
    and analyze the metrics data
    CrossRef Labs has started a pilot project to collect
    metrics for all CrossRef DOIs issued since January 2011.!
    !
    A DOI Event Tracker (DET) working group with CrossRef
    members was formed in May 2014 to support this pilot.!
    !
    The service is available at http://det.labs.crossref.org,
    and is using the alm software.

    View Slide

  16. !
    16
    3
    Open up data quality checks

    View Slide

  17. Open up the discussion
    Any metric we use
    should have good
    reliability (consistency)
    and validity.
    http://en.wikipedia.org/wiki/Validity_(statistics)#mediaviewer/File:Reliability_and_validity.svg

    View Slide

  18. This presentation is made available under a
    CC-BY 4.0 license.!
    http://creativecommons.org/licenses/by/4.0/

    View Slide