Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Academia's Biggest Blind Spot

Academia's Biggest Blind Spot

Modern scientific research depends on software and yet the activities of a research software engineer aren't generally recognized by the academy. In this talk I'll discuss why our inability to recognize the importance of research software is a serious threat to our long-term success and offer some potential tactics for elevating the role of software in academia.

Arfon Smith

May 19, 2016
Tweet

More Decks by Arfon Smith

Other Decks in Research

Transcript

  1. ACADEMIA’S
    BIGGEST
    BLIND SPOT
    Arfon Smith (@arfon)
    Creative Commons Attribution 4.0 International License.

    View Slide

  2. Software
    datA?

    View Slide

  3. Software is important

    View Slide

  4. Software is everywhere

    View Slide

  5. View Slide

  6. Software turns a theoretical model into
    quantitative predictions; software controls an
    experiment; and software extracts from raw data
    evidence supporting or rejecting a theory.
    Gaël Varoquaux

    View Slide

  7. Software is important
    becoming increasingly
    ^

    View Slide

  8. View Slide

  9. View Slide

  10. View Slide

  11. deep intellectual contributions
    (Are) now encoded in software.
    Victoria Stodden

    View Slide

  12. Complex
    things
    numbers
    science!
    data
    science!
    science!

    View Slide

  13. Reproducibility
    Data intensive

    View Slide

  14. http://www.nature.com/news/2011/111005/full/478026a.html

    View Slide

  15. Complex
    things
    Unpacking the box
    Transparency
    Reproducibility
    Credit
    Trust

    View Slide

  16. Complex
    things
    Few incentives for the individual
    Transparency
    Reproducibility
    Credit
    Trust

    View Slide

  17. Software can be a
    competitive advantage

    View Slide

  18. View Slide

  19. View Slide

  20. Human Genome project
    15 years
    3 billion bases
    $3 billion

    View Slide

  21. View Slide

  22. View Slide

  23. Human Genome project
    15 years
    3 billion bases
    $3 billion

    View Slide

  24. Human Genome project
    15 years
    3 billion bases
    $3 billion
    5000 perl modules

    View Slide

  25. View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. Change was the
    only constant

    View Slide

  30. Continual change
    Flexible
    Web-based
    Service-oriented
    1000s of projects

    View Slide

  31. View Slide

  32. Galaxy Zoo
    1 million galaxies
    300,000 people
    two postdocs & a student

    View Slide

  33. View Slide

  34. View Slide

  35. A1: Smooth A2:
    Features or
    disk
    A3: Star or
    artifact
    A1:
    Completely
    round
    A2: In
    between
    A3: Cigar
    shaped
    A1: Yes A2: No
    A1: 1 A2: 2 A3: 3 A4: 4 A5: More
    than 4
    A6: Can't tell
    A1: Straight
    Line
    A2: Chain A3: Cluster A4: Spiral
    A1: Yes
    A2: No
    A1: Yes A2: No
    A1: Yes A2: No
    A1: Yes A2: No
    A1:
    Rounded
    A2: Boxy A3: No
    bulge
    A1: Bar A2: No bar
    A1: Spiral A2: No
    spiral
    A1: Tight A2: Medium A3: Loose
    A1: 1 A2: 2 A3: 3 A4: 4 A5: More
    than 4
    A6: Can't tell
    T01: Is the galaxy simply smooth and rounded, with no sign of a disk?
    T07: How rounded is it? T12: Does the galaxy have a mostly clumpy appearance?
    T16: How many clumps are there?
    T15: Do the clumps appear in a straight line, a
    chain, a cluster, or a spiral pattern?
    T13: Is there one clump which is clearly
    brighter than the others?
    T14: Is the brightest clump central to
    the galaxy?
    T17: Does the galaxy appear
    symmetrical?
    T02: Could this be a disk viewed edge-on?
    T09: Does the galaxy have a bulge
    at its center? If so, what shape?
    T03: Is there a sign of a bar feature
    through the center of the galaxy?
    T04: Is there any sign of a spiral arm
    pattern?
    T10: How tightly wound do the spiral
    arms appear?
    T11: How many spiral arms are there?

    View Slide

  36. A1: Yes
    A2: No
    A1: Yes A2: No
    A1: Yes A2: No
    A1: Yes A2: No
    A1:
    Rounded
    A2: Boxy A3: No
    bulge
    A1: Spiral A2: No
    spiral
    A1: Tight A2: Medium A3: Loose
    A1: 1 A2: 2 A3: 3 A4: 4 A5: More
    than 4
    A6: Can't tell
    A1: No
    bulge
    A2: Just
    noticeable
    A3: Obvious A4:
    Dominant
    A1: Yes A2: No
    A1: Ring A2: Lens or
    arc
    A3:
    Disturbed
    A4: Irregular A5: Other A6: Merger A7: Dust
    lane
    T13: Is there one clump which is clearly
    brighter than the others?
    T14: Is the brightest clump central to
    the galaxy?
    T17: Does the galaxy appear
    symmetrical?
    T18: Do the clumps appear to be
    embedded within a larger object?
    T09: Does the galaxy have a bulge
    at its center? If so, what shape?
    T04: Is there any sign of a spiral arm
    pattern?
    T10: How tightly wound do the spiral
    arms appear?
    T11: How many spiral arms are there?
    T05: How prominent is the central bulge, compared
    with the rest of the galaxy?
    T06: Is there anything odd?
    T08: Is the odd feature a ring, or is the galaxy disturbed or irregular?
    End
    1st Tier Question
    2nd Tier Question
    3rd Tier Question
    4th Tier Question
    5th Tier Question

    View Slide

  37. View Slide

  38. View Slide

  39. View Slide

  40. Zooniverse
    Scalable/Fast
    Flexible domain model
    Highly available
    Cheap

    View Slide

  41. Software became our
    competitive advantage

    View Slide

  42. Both were special cases

    View Slide

  43. 1000s of staff
    100s of engineers
    (with some hope
    of a career)

    View Slide

  44. staffed by engineers
    who’d given up on
    a research career

    View Slide

  45. Software is important
    becoming increasingly
    ^

    View Slide

  46. What is required for these
    to not be special cases?

    View Slide

  47. How do we enable people
    who want to write software?

    View Slide

  48. the skills required to be a
    successful scientific researcher
    are increasingly indistinguishable
    from the skills required to be
    successful in industry.
    Jake VanderPlas

    View Slide

  49. Software currently isn’t a
    creditable research output

    View Slide

  50. View Slide

  51. We must find a way to
    legitimize software as a
    form of scholarship.
    Phil Bourne, Director for Data Science, NIH

    View Slide

  52. Software currently isn’t a
    creditable research output
    why?

    View Slide

  53. Challenge Technical Cultural
    How do you cite software? ✔ ❌
    Software Citations aren’t allowed ❌ ✔
    software isn’t citable ✔ ❌
    Software citations aren’t indexed ✔ ✔
    Software isn’t peer reviewed ❌ ✔
    Software can’t cite other software ✔ ❌

    View Slide

  54. Possible solutions

    View Slide

  55. Three categories of solutions
    Self-service (individual)
    Self-service (groups)
    external (Ecosystem)

    View Slide

  56. Solution #1
    Write Software papers

    View Slide

  57. We now have something to cite!
    Write Software papers

    View Slide

  58. BUT: It’s a ton of work…
    Write Software papers

    View Slide

  59. Not all journals accept them…
    Write Software papers

    View Slide

  60. And what about authorship?
    Write Software papers

    View Slide

  61. Many papers = citation dilution
    Write Software papers

    View Slide

  62. Going beyond the software
    paper…

    View Slide

  63. Solution #2
    Make it possible to cite software

    View Slide

  64. Why do we cite?
    To show we’ve done our research
    To credit the work of others
    To enrich the scholarly record
    To avoid plagiarism

    View Slide

  65. Why do we cite?
    To show we’ve done our research
    To credit the work of others
    To enrich the scholarly record
    To avoid plagiarism

    View Slide

  66. Solution #2
    Make it possible to cite software
    Smith et al., My Awesome Codes, v1.0.0,
    http://github.com/arfon/awesome-codes

    View Slide

  67. Solution #2
    Make it possible to cite software
    http://github.com/arfon/awesome-codes

    View Slide

  68. Cite software natively!

    Make it possible to cite software

    View Slide

  69. BUT: Many journals don’t allow it

    Make it possible to cite software

    View Slide

  70. And have citation limits…

    Make it possible to cite software

    View Slide

  71. And indexers don’t count them

    Make it possible to cite software

    View Slide

  72. Doesn’t help people’s careers

    Make it possible to cite software

    View Slide

  73. View Slide

  74. https://www.force11.org/software-citation-principles

    View Slide

  75. http://www.newslocker.com/en-uk/news/uk_news/family-struggle-to-force-giant-sofa-into-tiny-car/

    View Slide

  76. Already an incredible
    ecosystem around software

    View Slide

  77. What does (open source)
    software do especially well?

    View Slide

  78. Authorship

    View Slide

  79. Authorship isn’t static

    View Slide

  80. View Slide

  81. View Slide

  82. View Slide

  83. View Slide

  84. View Slide

  85. Metrics &Verification

    View Slide

  86. View Slide

  87. View Slide

  88. View Slide

  89. View Slide

  90. Dependencies
    Citations?

    View Slide

  91. View Slide

  92. View Slide

  93. View Slide

  94. What if we tried to copy
    use some of these ideas?

    View Slide

  95. What if we built a better
    academic credit model?

    View Slide

  96. Solution #3
    Go beyond the h-index

    View Slide

  97. Solution #3
    Go beyond the h-index
    paper-centric
    ^

    View Slide

  98. Really hard. Probably.

    Go beyond the h-index

    View Slide

  99. Richer citations
    papers
    Datasets
    software
    Cite Anything EVERYTHING

    View Slide

  100. View Slide

  101. Smith et al
    WMAP astropy
    PLANK
    Jones et al
    Smith et al scikit-learn
    0.3 0.1 0.1 0.1 0.2 0.2
    Transitive Credit
    Dan Katz

    View Slide

  102. Smith et al
    WMAP astropy
    PLANK
    Jones et al
    Smith et al scikit-learn
    0.3 0.1 0.1 0.1 0.2 0.2
    whyte et al numpy scipy
    0.5 0.25 0.25

    View Slide

  103. Hard because we have no control
    Go beyond the h-index
    paper-centric
    ^

    View Slide

  104. Solution #4
    Elevating the role of software

    View Slide

  105. Individual actions
    Ask for authors to include software
    when reviewing papers

    View Slide

  106. Individual actions
    Include non-paper contributions
    when assessing candidates

    View Slide

  107. Individual actions
    Take a course
    (or send your student on one)

    View Slide

  108. Individual actions
    Share your software
    on GitHub :-)

    View Slide

  109. All the fixes
    Write Software papers
    Make it possible to cite software
    Go beyond the h-index
    Elevating the role of software

    View Slide

  110. Thanks
    Arfon Smith (@arfon)

    View Slide