Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GitHub for Science

03e2e7de45b193cac192ae7ea071e5ff?s=47 Arfon Smith
December 08, 2014

GitHub for Science

My talk from DotAstronomy in Chicago. Secondary title: 'Three ideas we should steal from Open Source'

03e2e7de45b193cac192ae7ea071e5ff?s=128

Arfon Smith

December 08, 2014
Tweet

More Decks by Arfon Smith

Other Decks in Science

Transcript

  1. GitHub for Science Arfon Smith @arfon Creative Commons Attribution 3.0

    Unported License
  2. Three ideas we should steal from Open Source Arfon Smith

    @arfon Creative Commons Attribution 3.0 Unported License
  3. !

  4. What is a GitHub?

  5. None
  6. GitHub

  7. None
  8. 0 3,000,000 6,000,000 9,000,000 2007 2008 2009 2010 2011 2012

    2013 2014 Users
  9. 4,000,000 8,000,000 12,000,000 16,000,000 20,000,000 2007 2008 2009 2010 2011

    2012 2013 2014 Repositories
  10. Guess the language!

  11. 1750 3500 5250 7000 2008 2009 2010 2011 2012 2013

    2014
  12. 1750 3500 5250 7000 2008 2009 2010 2011 2012 2013

    2014 Fortran
  13. 15000 30000 45000 60000 2008 2009 2010 2011 2012 2013

    2014 LaTeX
  14. 500 1000 1500 2000 2009 2010 2011 2012 2013 2014

    IDL
  15. 0 175,000 350,000 525,000 700,000 2008 2009 2010 2011 2012

    2013 2014 C++
  16. 350,000 700,000 1,050,000 1,400,000 2008 2009 2010 2011 2012 2013

    2014 Python
  17. Why build a GitHub?

  18. Made writing code a social experience 1.

  19. None
  20. None
  21. None
  22. None
  23. Changed the collaborative model of open source 2.

  24. None
  25. ‘May I have access to your codes please?’

  26. None
  27. From 653314448c7c6f6ec2f93de346896895f786773f Mon Sep 17 00:00:00 2001 From: Arfon Smith

    <arfon@github.com> Date: Mon, 13 Oct 2014 16:37:46 -0500 Subject: [PATCH] Bust that cache --- lib/linguist/repository.rb | 14 ++++++++++++-- test/test_repository.rb | 12 ++++++++++++ 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/lib/linguist/repository.rb b/lib/linguist/repository.rb index 1f9e09c..9998ee6 100644 --- a/lib/linguist/repository.rb +++ b/lib/linguist/repository.rb @@ -1,6 +1,6 @@ require 'linguist/lazy_blob' require 'rugged' - +require 'pry' module Linguist # A Repository is an abstraction of a Grit::Repo or a basic file # system tree. It holds a list of paths pointing to Blobish objects. @@ -128,13 +128,23 @@ def current_tree protected def compute_stats(old_commit_oid, cache = nil) - file_map = cache ? cache.dup : {} old_tree = old_commit_oid && Rugged::Commit.lookup(repository,
  28. GitHub delivered on a theoretical promise of open source

  29. Open source collaborations Open Source: the right to modify

  30. Open source collaborations Open Source: the right to modify, not

    the right to contribute.
  31. "

  32. Open source collaborations Forking a project was done as a

    last resort
  33. Open source collaborations GitHub made forking the norm

  34. None
  35. None
  36. 1. Open Collaborations

  37. Open source collaborations Open Source vs Open Collaborations

  38. Open source collaborations Open Source: the right to modify

  39. Open source collaborations Open Collaborations: a highly collaborative development process

    and are receptive to contributions of code, documentation, discussion, etc from anyone who shows competent interest.
  40. Open source collaborations Open Collaborations: a highly collaborative development process

    and are receptive to contributions of code, documentation, discussion, etc from anyone who shows competent interest. THIS
  41. How do 4000 people work together?

  42. The pull request

  43. None
  44. None
  45. None
  46. None
  47. None
  48. None
  49. None
  50. discuss improve Code first, permission later

  51. Exposed process

  52. Every time this happens the community learns

  53. Academia makes the same promise

  54. None
  55. None
  56. Explain what you did

  57. So that others can repeat

  58. Everybody learns

  59. None
  60. (doesn’t have to mean this) Open Public? =

  61. Open (within your team, department or institution)

  62. Electronic & Available

  63. Asynchronous, exposed process

  64. Asynchronous, exposed process

  65. Asynchronous, exposed process

  66. Lock-free

  67. Open, low friction collaborations

  68. Merged pull requests

  69. None
  70. None
  71. None
  72. Culture of Reuse 2.

  73. A story from my life (~10 years ago)

  74. http://amandabauer.blogspot.com/

  75. None
  76. 130 130 1 2048 189 189 258 258 480 562

    378 378 493 521 390 397 851 851 247 274 319 319 304 580 493 511 610 636 188 188 228 228 > cat bad_pix_mask.txt
  77. 2 days work 3 observing runs/week 52 weeks in year

    15 year detector lifetime 2*3*52*15 = 4680 days (13 years)
  78. A second story from my life (~6 months ago)

  79. None
  80. None
  81. None
  82. None
  83. None
  84. None
  85. Software composed of many components

  86. Your software is the thing that is different

  87. Open Source: Ubiquitous culture of reuse

  88. Verification 3.

  89. None
  90. None
  91. None
  92. None
  93. None
  94. None
  95. None
  96. None
  97. None
  98. None
  99. None
  100. Robots doing work

  101. “open source is… reproducible by necessity” Fernando Perez http://blog.fperez.org/2013/11/an-ambitious-experiment-in-data-science.html

  102. Software is an unforgiving medium

  103. “open source is… reproducible by necessity” Fernando Perez http://blog.fperez.org/2013/11/an-ambitious-experiment-in-data-science.html

  104. GitHub for Science

  105. A product?

  106. Collaborative model?

  107. Credit system?

  108. Discoverability?

  109. Reproducibility?

  110. GitHub

  111. * Other vendors of version control software are available *

  112. None
  113. None
  114. None
  115. Barriers are cultural, not technical

  116. Reproducibility Data intensive

  117. If we want to crib what Open Source does

  118. What’s required to make this behaviour the norm?

  119. Credit

  120. “Academic environments of today do not reward tool builders” Ed

    Lazowska, OSTP event http://lazowska.cs.washington.edu/MS/MS.OSTP.pdf
  121. None
  122. None
  123. None
  124. None
  125. None
  126. None
  127. “publishing a paper about code is basically just advertising” David

    Donoho http://www.stanford.edu/~vcs/Video.html
  128. None
  129. How to derive meaningful metrics from open contributions?

  130. None
  131. Trust

  132. None
  133. None
  134. None
  135. None
  136. None
  137. Discoverability

  138. None
  139. Starts with you

  140. Be more exact

  141. Try out some tools

  142. Focus most effort where your peers will most easily recognise

    value
  143. Share software, data and methods (and cite them too!)

  144. Where do communities form?

  145. Around a shared challenge?

  146. Around shared data?

  147. None
  148. 10 ? n Level 1 (continual) Level 2 (periodic)

  149. LSST is a project that is inherently open

  150. Supernovae Weak lensing Active Galactic Nuclei Solar System Galaxies Transients/variable

    stars Large-scale structure Stars, Milky Way Strong lensing Informatics and Statistics Dark Energy (DESC)
  151. None
  152. Your software should be the thing that is different

  153. science too! Your software should be the thing that is

    different
  154. Barriers are cultural, not technical

  155. Next time you review, ask for methods, code and data

  156. Share (and license) your work

  157. Take a course (and send your students on one)

  158. Try versioning your work

  159. Open source has solved much of what academia needs

  160. The challenge is to adapt and evolve the academy in

    this new collaborative age
  161. Thanks. arfon@github.com @arfon #