emcee-odi

 emcee-odi

My talk at the "Open Digital Infrastructure in Astrophysics" meeting.

00c684a144d49f612a51e855eb326d6c?s=128

Dan Foreman-Mackey

June 04, 2019
Tweet

Transcript

  1. emcee Dan Foreman-Mackey CCA@Flatiron // dfm.io // @exoplaneteer // github.com/dfm

  2. A modular ecosystem for probabilistic data analysis including emcee Dan

    Foreman-Mackey CCA@Flatiron // dfm.io // @exoplaneteer // github.com/dfm
  3. Slides can be found at: speakerdeck.com/dfm

  4. None
  5. I have a GitHub problem .

  6. None
  7. None
  8. None
  9. 1 Context

  10. p(physics | data)

  11. Markov Chain Monte Carlo

  12. ui.adsabs.harvard.edu number of astronomy papers with "MCMC" in the text

  13. ui.adsabs.harvard.edu number of astronomy papers with "MCMC" in the text

    DFM starts grad school
  14. In 2010 : Everyone wrote their own MCMC sampler.

  15. In 2010 : So that's what I did too.

  16. None
  17. None
  18. None
  19. ui.adsabs.harvard.edu/abs/2013PASP..125..306F/metrics

  20. scholar.google.com

  21. The algorithm is nearly trivial .

  22. – 8 – Algorithm 3 The parallel stretch move update

    step 1: for i ∈ {0, 1} do 2: for k = 1, . . . , K/2 do 3: // This loop can now be done in parallel for all k 4: Draw a walker Xj at random from the complementary ensemble S(∼i)(t) 5: Xk ← S(i) k 6: z ← Z ∼ g(z), Equation (10) 7: Y ← Xj + z [Xk (t) − Xj ] 8: q ← zn−1 p(Y )/p(Xk (t)) 9: r ← R ∼ [0, 1] 10: if r ≤ q, Equation (9) then 11: Xk (t + 1 2 ) ← Y 12: else 13: Xk (t + 1 2 ) ← Xk (t) 14: end if 15: end for 16: t ← t + 1 2 17: end for acceptance fraction af . This is the fraction of proposed steps that are accepted. There appears to be no agreement on the optimal acceptance rate but it is clear that both extrema are unacceptable. If af ∼ 0, then nearly all proposed steps are rejected, so the chain DFM+ (2013)
  23. So why is it so popular ?

  24. circa 2013

  25. circa 2013

  26. None
  27. 2 Lessons Learned

  28. 1 Releasing your code can be good for your career

    . * Prior results do not guarantee a similar outcome.
  29. 2 Writing docs and tutorials is not a waste of

    time.
  30. I use the documentation that I've written every day.

  31. None
  32. Teaching is a good way to learn .

  33. 3 The extra email load isn't so bad .

  34. I have been part of about 1700 email threads with

    the word "emcee".
  35. That's only about 4.5 emails per week .

  36. 4 Beware of feature creep . * Especially that first

    big pull request.
  37. You will have to maintain the feature that you merge.

  38. 5 Keep it modular .

  39. It's easier to write code that does one thing well.

  40. Package managers exist.

  41. 3 Ideas for a Successful Scientific Software Package

  42. 1 You should be the target audience.

  43. 2 Libraries , not scripts.

  44. 3 Tutorials , not (just) API docs.

  45. 4 Integrate with the ecosystem .

  46. For example: fitting transiting exoplanet observations.

  47. emcee george transit corner.py GitHub repositories; user: dfm

  48. emcee celerite transit corner.py GitHub repositories; user: dfm

  49. emcee celerite starry corner.py Except rodluger/starry by Rodrigo Luger GitHub

    repositories; user: dfm
  50. pymc3 celerite starry corner.py And pymc-devs/pymc3 Except rodluger/starry by Rodrigo

    Luger GitHub repositories; user: dfm
  51. 4 Open Questions * A non-exhaustive list

  52. 1 How do you build and maintain a sustainable developer

    community?
  53. 2 How do you balance community building and technical debt?

  54. 3 How do we give credit to developers of large

    projects?
  55. None
  56. AstroPy is a much more successful open source project by

    all metrics.
  57. AstroPy is a much more successful open source project by

    all metrics. Except citation count.
  58. data from: ui.adsabs.harvard.edu 2013 2014 2015 2016 2017 2018 2019

    year 0 1000 2000 3000 cumulative citations astropy
  59. 2013 2014 2015 2016 2017 2018 2019 year 0 1000

    2000 3000 cumulative citations astropy emcee data from: ui.adsabs.harvard.edu
  60. Why?

  61. What should we do?

  62. 5 The Future

  63. Will people still be using emcee in 10 years ?

  64. I hope not !

  65. None
  66. None
  67. None
  68. None
  69. None
  70. These all have strengths and weaknesses.

  71. But these can have a steep learning curve .

  72. None
  73. None
  74. I plan on continuing to build tools in this ecosystem

    .
  75. I want to learn how to continue to maintain this

    software and build a sustainable community .
  76. 6 Take Homes

  77. Open source is good for business.

  78. Tutorials are crucial.

  79. Build libraries , not scripts.

  80. Thanks! Dan Foreman-Mackey CCA@Flatiron // dfm.io // @exoplaneteer // github.com/dfm