Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Academia's Biggest Blind Spot

Academia's Biggest Blind Spot

Modern scientific research depends on software and yet the activities of a research software engineer aren't generally recognized by the academy. In this talk I'll discuss why our inability to recognize the importance of research software is a serious threat to our long-term success and offer some potential tactics for elevating the role of software in academia.

03e2e7de45b193cac192ae7ea071e5ff?s=128

Arfon Smith

May 19, 2016
Tweet

Transcript

  1. ACADEMIA’S BIGGEST BLIND SPOT Arfon Smith (@arfon) Creative Commons Attribution

    4.0 International License.
  2. Software datA?

  3. Software is important

  4. Software is everywhere

  5. None
  6. Software turns a theoretical model into quantitative predictions; software controls

    an experiment; and software extracts from raw data evidence supporting or rejecting a theory. Gaël Varoquaux
  7. Software is important becoming increasingly ^

  8. None
  9. None
  10. None
  11. deep intellectual contributions (Are) now encoded in software. Victoria Stodden

  12. Complex things numbers science! data science! science!

  13. Reproducibility Data intensive

  14. http://www.nature.com/news/2011/111005/full/478026a.html

  15. Complex things Unpacking the box Transparency Reproducibility Credit Trust

  16. Complex things Few incentives for the individual Transparency Reproducibility Credit

    Trust
  17. Software can be a competitive advantage

  18. None
  19. None
  20. Human Genome project 15 years 3 billion bases $3 billion

  21. None
  22. None
  23. Human Genome project 15 years 3 billion bases $3 billion

  24. Human Genome project 15 years 3 billion bases $3 billion

    5000 perl modules
  25. None
  26. None
  27. None
  28. None
  29. Change was the only constant

  30. Continual change Flexible Web-based Service-oriented 1000s of projects

  31. None
  32. Galaxy Zoo 1 million galaxies 300,000 people two postdocs &

    a student
  33. None
  34. None
  35. A1: Smooth A2: Features or disk A3: Star or artifact

    A1: Completely round A2: In between A3: Cigar shaped A1: Yes A2: No A1: 1 A2: 2 A3: 3 A4: 4 A5: More than 4 A6: Can't tell A1: Straight Line A2: Chain A3: Cluster A4: Spiral A1: Yes A2: No A1: Yes A2: No A1: Yes A2: No A1: Yes A2: No A1: Rounded A2: Boxy A3: No bulge A1: Bar A2: No bar A1: Spiral A2: No spiral A1: Tight A2: Medium A3: Loose A1: 1 A2: 2 A3: 3 A4: 4 A5: More than 4 A6: Can't tell T01: Is the galaxy simply smooth and rounded, with no sign of a disk? T07: How rounded is it? T12: Does the galaxy have a mostly clumpy appearance? T16: How many clumps are there? T15: Do the clumps appear in a straight line, a chain, a cluster, or a spiral pattern? T13: Is there one clump which is clearly brighter than the others? T14: Is the brightest clump central to the galaxy? T17: Does the galaxy appear symmetrical? T02: Could this be a disk viewed edge-on? T09: Does the galaxy have a bulge at its center? If so, what shape? T03: Is there a sign of a bar feature through the center of the galaxy? T04: Is there any sign of a spiral arm pattern? T10: How tightly wound do the spiral arms appear? T11: How many spiral arms are there?
  36. A1: Yes A2: No A1: Yes A2: No A1: Yes

    A2: No A1: Yes A2: No A1: Rounded A2: Boxy A3: No bulge A1: Spiral A2: No spiral A1: Tight A2: Medium A3: Loose A1: 1 A2: 2 A3: 3 A4: 4 A5: More than 4 A6: Can't tell A1: No bulge A2: Just noticeable A3: Obvious A4: Dominant A1: Yes A2: No A1: Ring A2: Lens or arc A3: Disturbed A4: Irregular A5: Other A6: Merger A7: Dust lane T13: Is there one clump which is clearly brighter than the others? T14: Is the brightest clump central to the galaxy? T17: Does the galaxy appear symmetrical? T18: Do the clumps appear to be embedded within a larger object? T09: Does the galaxy have a bulge at its center? If so, what shape? T04: Is there any sign of a spiral arm pattern? T10: How tightly wound do the spiral arms appear? T11: How many spiral arms are there? T05: How prominent is the central bulge, compared with the rest of the galaxy? T06: Is there anything odd? T08: Is the odd feature a ring, or is the galaxy disturbed or irregular? End 1st Tier Question 2nd Tier Question 3rd Tier Question 4th Tier Question 5th Tier Question
  37. None
  38. None
  39. None
  40. Zooniverse Scalable/Fast Flexible domain model Highly available Cheap

  41. Software became our competitive advantage

  42. Both were special cases

  43. 1000s of staff 100s of engineers (with some hope of

    a career)
  44. staffed by engineers who’d given up on a research career

  45. Software is important becoming increasingly ^

  46. What is required for these to not be special cases?

  47. How do we enable people who want to write software?

  48. the skills required to be a successful scientific researcher are

    increasingly indistinguishable from the skills required to be successful in industry. Jake VanderPlas
  49. Software currently isn’t a creditable research output

  50. None
  51. We must find a way to legitimize software as a

    form of scholarship. Phil Bourne, Director for Data Science, NIH
  52. Software currently isn’t a creditable research output why?

  53. Challenge Technical Cultural How do you cite software? ✔ ❌

    Software Citations aren’t allowed ❌ ✔ software isn’t citable ✔ ❌ Software citations aren’t indexed ✔ ✔ Software isn’t peer reviewed ❌ ✔ Software can’t cite other software ✔ ❌
  54. Possible solutions

  55. Three categories of solutions Self-service (individual) Self-service (groups) external (Ecosystem)

  56. Solution #1 Write Software papers

  57. We now have something to cite! Write Software papers

  58. BUT: It’s a ton of work… Write Software papers

  59. Not all journals accept them… Write Software papers

  60. And what about authorship? Write Software papers

  61. Many papers = citation dilution Write Software papers ☹

  62. Going beyond the software paper…

  63. Solution #2 Make it possible to cite software

  64. Why do we cite? To show we’ve done our research

    To credit the work of others To enrich the scholarly record To avoid plagiarism
  65. Why do we cite? To show we’ve done our research

    To credit the work of others To enrich the scholarly record To avoid plagiarism
  66. Solution #2 Make it possible to cite software Smith et

    al., My Awesome Codes, v1.0.0, http://github.com/arfon/awesome-codes
  67. Solution #2 Make it possible to cite software http://github.com/arfon/awesome-codes

  68. Cite software natively! Make it possible to cite software

  69. BUT: Many journals don’t allow it Make it possible to

    cite software
  70. And have citation limits… Make it possible to cite software

  71. And indexers don’t count them Make it possible to cite

    software
  72. Doesn’t help people’s careers ☹ Make it possible to cite

    software
  73. None
  74. https://www.force11.org/software-citation-principles

  75. http://www.newslocker.com/en-uk/news/uk_news/family-struggle-to-force-giant-sofa-into-tiny-car/

  76. Already an incredible ecosystem around software

  77. What does (open source) software do especially well?

  78. Authorship

  79. Authorship isn’t static

  80. None
  81. None
  82. None
  83. None
  84. None
  85. Metrics &Verification

  86. None
  87. None
  88. None
  89. None
  90. Dependencies Citations?

  91. None
  92. None
  93. None
  94. What if we tried to copy use some of these

    ideas?
  95. What if we built a better academic credit model?

  96. Solution #3 Go beyond the h-index

  97. Solution #3 Go beyond the h-index paper-centric ^

  98. Really hard. Probably. ☹ Go beyond the h-index

  99. Richer citations papers Datasets software Cite Anything EVERYTHING

  100. None
  101. Smith et al WMAP astropy PLANK Jones et al Smith

    et al scikit-learn 0.3 0.1 0.1 0.1 0.2 0.2 Transitive Credit Dan Katz
  102. Smith et al WMAP astropy PLANK Jones et al Smith

    et al scikit-learn 0.3 0.1 0.1 0.1 0.2 0.2 whyte et al numpy scipy 0.5 0.25 0.25
  103. Hard because we have no control Go beyond the h-index

    paper-centric ^
  104. Solution #4 Elevating the role of software

  105. Individual actions Ask for authors to include software when reviewing

    papers
  106. Individual actions Include non-paper contributions when assessing candidates

  107. Individual actions Take a course (or send your student on

    one)
  108. Individual actions Share your software on GitHub :-)

  109. All the fixes Write Software papers Make it possible to

    cite software Go beyond the h-index Elevating the role of software
  110. Thanks Arfon Smith (@arfon)