Upgrade to Pro — share decks privately, control downloads, hide ads and more …

emcee-odi

 emcee-odi

My talk at the "Open Digital Infrastructure in Astrophysics" meeting.

Dan Foreman-Mackey

June 04, 2019
Tweet

More Decks by Dan Foreman-Mackey

Other Decks in Science

Transcript

  1. emcee
    Dan Foreman-Mackey
    CCA@Flatiron // dfm.io // @exoplaneteer // github.com/dfm

    View Slide

  2. A modular ecosystem for
    probabilistic data analysis
    including emcee
    Dan Foreman-Mackey
    CCA@Flatiron // dfm.io // @exoplaneteer // github.com/dfm

    View Slide

  3. Slides can be found at:
    speakerdeck.com/dfm

    View Slide

  4. View Slide

  5. I have a GitHub problem .

    View Slide

  6. View Slide

  7. View Slide

  8. View Slide

  9. 1
    Context

    View Slide

  10. p(physics | data)

    View Slide

  11. Markov Chain Monte Carlo

    View Slide

  12. ui.adsabs.harvard.edu
    number of astronomy papers
    with "MCMC" in the text

    View Slide

  13. ui.adsabs.harvard.edu
    number of astronomy papers
    with "MCMC" in the text
    DFM starts
    grad school

    View Slide

  14. In 2010 :
    Everyone wrote their own
    MCMC sampler.

    View Slide

  15. In 2010 :
    So that's what I did too.

    View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. ui.adsabs.harvard.edu/abs/2013PASP..125..306F/metrics

    View Slide

  20. scholar.google.com

    View Slide

  21. The algorithm is
    nearly trivial .

    View Slide

  22. – 8 –
    Algorithm 3 The parallel stretch move update step
    1: for i ∈ {0, 1} do
    2: for k = 1, . . . , K/2 do
    3: // This loop can now be done in parallel for all k
    4: Draw a walker Xj
    at random from the complementary ensemble S(∼i)(t)
    5: Xk
    ← S(i)
    k
    6: z ← Z ∼ g(z), Equation (10)
    7: Y ← Xj
    + z [Xk
    (t) − Xj
    ]
    8: q ← zn−1 p(Y )/p(Xk
    (t))
    9: r ← R ∼ [0, 1]
    10: if r ≤ q, Equation (9) then
    11: Xk
    (t + 1
    2
    ) ← Y
    12: else
    13: Xk
    (t + 1
    2
    ) ← Xk
    (t)
    14: end if
    15: end for
    16: t ← t + 1
    2
    17: end for
    acceptance fraction af
    . This is the fraction of proposed steps that are accepted. There
    appears to be no agreement on the optimal acceptance rate but it is clear that both extrema
    are unacceptable. If af
    ∼ 0, then nearly all proposed steps are rejected, so the chain
    DFM+ (2013)

    View Slide

  23. So why is it so popular ?

    View Slide

  24. circa 2013

    View Slide

  25. circa 2013

    View Slide

  26. View Slide

  27. 2
    Lessons Learned

    View Slide

  28. 1
    Releasing your code can
    be good for your career .
    * Prior results do not guarantee a similar outcome.

    View Slide

  29. 2
    Writing docs and tutorials
    is not a waste of time.

    View Slide

  30. I use the documentation
    that I've written every day.

    View Slide

  31. View Slide

  32. Teaching is a
    good way to learn .

    View Slide

  33. 3
    The extra email load isn't
    so bad .

    View Slide

  34. I have been part of about
    1700 email threads with
    the word "emcee".

    View Slide

  35. That's only about
    4.5 emails per week .

    View Slide

  36. 4
    Beware of feature creep .
    * Especially that first big pull request.

    View Slide

  37. You will have to maintain
    the feature that you merge.

    View Slide

  38. 5
    Keep it modular .

    View Slide

  39. It's easier to write code
    that does one thing well.

    View Slide

  40. Package managers exist.

    View Slide

  41. 3
    Ideas for a Successful
    Scientific Software Package

    View Slide

  42. 1
    You should be the target
    audience.

    View Slide

  43. 2
    Libraries , not scripts.

    View Slide

  44. 3
    Tutorials , not (just) API docs.

    View Slide

  45. 4
    Integrate with
    the ecosystem .

    View Slide

  46. For example:
    fitting transiting exoplanet
    observations.

    View Slide

  47. emcee
    george transit
    corner.py
    GitHub repositories; user: dfm

    View Slide

  48. emcee
    celerite transit
    corner.py
    GitHub repositories; user: dfm

    View Slide

  49. emcee
    celerite starry
    corner.py
    Except rodluger/starry by Rodrigo Luger
    GitHub repositories; user: dfm

    View Slide

  50. pymc3
    celerite starry
    corner.py
    And pymc-devs/pymc3
    Except rodluger/starry by Rodrigo Luger
    GitHub repositories; user: dfm

    View Slide

  51. 4
    Open Questions
    * A non-exhaustive list

    View Slide

  52. 1
    How do you build and
    maintain a sustainable
    developer community?

    View Slide

  53. 2
    How do you balance
    community building and
    technical debt?

    View Slide

  54. 3
    How do we give credit to
    developers of large projects?

    View Slide

  55. View Slide

  56. AstroPy is a much more
    successful open source
    project by all metrics.

    View Slide

  57. AstroPy is a much more
    successful open source
    project by all metrics.
    Except citation count.

    View Slide

  58. data from: ui.adsabs.harvard.edu
    2013 2014 2015 2016 2017 2018 2019
    year
    0
    1000
    2000
    3000
    cumulative
    citations
    astropy

    View Slide

  59. 2013 2014 2015 2016 2017 2018 2019
    year
    0
    1000
    2000
    3000
    cumulative
    citations
    astropy
    emcee
    data from: ui.adsabs.harvard.edu

    View Slide

  60. Why?

    View Slide

  61. What should we do?

    View Slide

  62. 5
    The Future

    View Slide

  63. Will people still be using
    emcee in 10 years ?

    View Slide

  64. I hope not !

    View Slide

  65. View Slide

  66. View Slide

  67. View Slide

  68. View Slide

  69. View Slide

  70. These all have strengths
    and weaknesses.

    View Slide

  71. But these can have a steep
    learning curve .

    View Slide

  72. View Slide

  73. View Slide

  74. I plan on continuing to build
    tools in this ecosystem .

    View Slide

  75. I want to learn how to
    continue to maintain this
    software and build a
    sustainable community .

    View Slide

  76. 6
    Take Homes

    View Slide

  77. Open source is good for
    business.

    View Slide

  78. Tutorials are crucial.

    View Slide

  79. Build libraries , not scripts.

    View Slide

  80. Thanks!
    Dan Foreman-Mackey
    CCA@Flatiron // dfm.io // @exoplaneteer // github.com/dfm

    View Slide