Dan Foreman-Mackey
June 04, 2019
500

# emcee-odi

My talk at the "Open Digital Infrastructure in Astrophysics" meeting.

June 04, 2019

## Transcript

1. emcee
Dan Foreman-Mackey
CCA@Flatiron // dfm.io // @exoplaneteer // github.com/dfm

2. A modular ecosystem for
probabilistic data analysis
including emcee
Dan Foreman-Mackey
CCA@Flatiron // dfm.io // @exoplaneteer // github.com/dfm

3. Slides can be found at:
speakerdeck.com/dfm

4. I have a GitHub problem .

5. 1
Context

6. p(physics | data)

7. Markov Chain Monte Carlo

number of astronomy papers
with "MCMC" in the text

number of astronomy papers
with "MCMC" in the text
DFM starts

10. In 2010 :
Everyone wrote their own
MCMC sampler.

11. In 2010 :
So that's what I did too.

14. The algorithm is
nearly trivial .

15. – 8 –
Algorithm 3 The parallel stretch move update step
1: for i ∈ {0, 1} do
2: for k = 1, . . . , K/2 do
3: // This loop can now be done in parallel for all k
4: Draw a walker Xj
at random from the complementary ensemble S(∼i)(t)
5: Xk
← S(i)
k
6: z ← Z ∼ g(z), Equation (10)
7: Y ← Xj
+ z [Xk
(t) − Xj
]
8: q ← zn−1 p(Y )/p(Xk
(t))
9: r ← R ∼ [0, 1]
10: if r ≤ q, Equation (9) then
11: Xk
(t + 1
2
) ← Y
12: else
13: Xk
(t + 1
2
) ← Xk
(t)
14: end if
15: end for
16: t ← t + 1
2
17: end for
acceptance fraction af
. This is the fraction of proposed steps that are accepted. There
appears to be no agreement on the optimal acceptance rate but it is clear that both extrema
are unacceptable. If af
∼ 0, then nearly all proposed steps are rejected, so the chain
DFM+ (2013)

16. So why is it so popular ?

17. circa 2013

18. circa 2013

19. 2
Lessons Learned

20. 1
be good for your career .
* Prior results do not guarantee a similar outcome.

21. 2
Writing docs and tutorials
is not a waste of time.

22. I use the documentation
that I've written every day.

23. Teaching is a
good way to learn .

24. 3

25. I have been part of about
the word "emcee".

4.5 emails per week .

27. 4
Beware of feature creep .
* Especially that ﬁrst big pull request.

28. You will have to maintain
the feature that you merge.

29. 5
Keep it modular .

30. It's easier to write code
that does one thing well.

31. Package managers exist.

32. 3
Ideas for a Successful
Scientiﬁc Software Package

33. 1
You should be the target
audience.

34. 2
Libraries , not scripts.

35. 3
Tutorials , not (just) API docs.

36. 4
Integrate with
the ecosystem .

37. For example:
ﬁtting transiting exoplanet
observations.

38. emcee
george transit
corner.py
GitHub repositories; user: dfm

39. emcee
celerite transit
corner.py
GitHub repositories; user: dfm

40. emcee
celerite starry
corner.py
Except rodluger/starry by Rodrigo Luger
GitHub repositories; user: dfm

41. pymc3
celerite starry
corner.py
And pymc-devs/pymc3
Except rodluger/starry by Rodrigo Luger
GitHub repositories; user: dfm

42. 4
Open Questions
* A non-exhaustive list

43. 1
How do you build and
maintain a sustainable
developer community?

44. 2
How do you balance
community building and
technical debt?

45. 3
How do we give credit to
developers of large projects?

46. AstroPy is a much more
successful open source
project by all metrics.

47. AstroPy is a much more
successful open source
project by all metrics.
Except citation count.

2013 2014 2015 2016 2017 2018 2019
year
0
1000
2000
3000
cumulative
citations
astropy

49. 2013 2014 2015 2016 2017 2018 2019
year
0
1000
2000
3000
cumulative
citations
astropy
emcee

50. Why?

51. What should we do?

52. 5
The Future

53. Will people still be using
emcee in 10 years ?

54. I hope not !

55. These all have strengths
and weaknesses.

56. But these can have a steep
learning curve .

57. I plan on continuing to build
tools in this ecosystem .

58. I want to learn how to
continue to maintain this
software and build a
sustainable community .

59. 6
Take Homes

60. Open source is good for