I discuss the lessons learned from a few years as a core developer of an open source project.
PyMC3 lessons learned
Peadar Coyle - PyMC3 committer, Blogger and
Why OSS matters - it democratises innovation
New release - PyMC3 3.7
PyMC3 3.7 is released! Highlights:
- Python 3 only
- @arviz_devs for plotting
- Data class for handling changing data between
inference and posterior predictive
- Big under the hood improvements, especially to prior
predictive sampling and shape handling
Funding models for OSS are broken
Github sponsors - reasons for optimism
Isn’t everything a machine learning problem?
Lots of problems are small data or heteogeneous data problems.
Traditional ML models such as XGBoost or Random Forests DON’T
incorporate domain expertise or work well with small data.
What are the applications? How do I make money?
Basically anywhere you need to understand uncertainty, handle domain
speciﬁc knowledge or handle small heterogeneous data.
Marketing is a good use case, A/B testing, survey data, pricing modelling
and many use cases in terms of risk modelling.
What all of these problems have in common is that uncertainty
What is a PPL?
A PPL is a Probabilistic Programming Language that treats random variables as ﬁrst class citizens.
Who uses Stan?
Who uses PyMC3
Community is important invest in it
Community is extremely important. GSoC/ Leadership
Negative Our gender split isn’t great - it’s a problem for
all of OSS. Would love to know some solutions to this.
● Great work by the likes of Ravin Kumar/Austin
Rochford on the tooling side, CI/CD helps in research
● Good test cases and docker work too helped a lot.
● Value docs. Make it really easy to contribute to this
● Evangelism at conferences helped improve adoption
Importance of research
● We publish (occasionally) and regularly read the
● We do a journal club.
● We also have allowed ‘low risk’ merges to pymc3. Reduce
the bar for researchers
Meet in person
● Useful to meet in
person. It helps
I had a use case like this with a client. Went from 20 hour of the model
running to 3 minutes
● Reducing the number of iterations. NUTS is a very powerful tool.
● Vectorization caused a lot of the speed improvements.
What’s coming next?
We just had a PyMC4 summit in Montreal. Keep a lookout for our updates.
Great support by the Tensorﬂow Probability team at Google.
Thank you - You may want to check out
Your job is to inform better decisions
You might think that your job is to understand the truth about reality or whatever.
All science is about making better decisions.
If your inference is wrong - then your decisions will be wrong.
What does this mean practically?
To handle large scale problems or ‘big data’ problems in a Bayesian Inference framework - we
need to use Hamiltonian samplers.
Hamiltonian samplers work well under certain conditions. These conditions are often swept
under the carpet.
What about regulation?
Increasingly models will be deployed in regulated industries - and in a post GDPR world
interpretability will matter more. If you work with healthcare data, ﬁnance data, insurance you
should add Bayesian Statistics to your toolkit.
We’ll discuss how to debug Bayesian models, using modern techniques such as NUTS. This is
PyMC3 speciﬁc but the techniques apply to Rainier, Stan and BUGS.