Lessons learned PyMC3

Lessons learned PyMC3

I discuss the lessons learned from a few years as a core developer of an open source project.



May 29, 2019


  1. 1.

    PyMC3 lessons learned Peadar Coyle - PyMC3 committer, Blogger and

    Data Scientist PyData Montreal @springcoil www.probabilisticprogrammingprimer.com
  2. 3.

    New release - PyMC3 3.7 PyMC3 3.7 is released! Highlights:

    - Python 3 only - @arviz_devs for plotting - Data class for handling changing data between inference and posterior predictive - Big under the hood improvements, especially to prior predictive sampling and shape handling
  3. 7.

    Isn’t everything a machine learning problem? Lots of problems are

    small data or heteogeneous data problems. Traditional ML models such as XGBoost or Random Forests DON’T incorporate domain expertise or work well with small data.
  4. 8.

    What are the applications? How do I make money? Basically

    anywhere you need to understand uncertainty, handle domain specific knowledge or handle small heterogeneous data. Marketing is a good use case, A/B testing, survey data, pricing modelling and many use cases in terms of risk modelling. What all of these problems have in common is that uncertainty quantification matters
  5. 9.

    What is a PPL? A PPL is a Probabilistic Programming

    Language that treats random variables as first class citizens.
  6. 12.
  7. 13.

    Community is important invest in it Community is extremely important.

    GSoC/ Leadership Negative Our gender split isn’t great - it’s a problem for all of OSS. Would love to know some solutions to this.
  8. 14.

    Tooling • Great work by the likes of Ravin Kumar/Austin

    Rochford on the tooling side, CI/CD helps in research software too. • Good test cases and docker work too helped a lot.
  9. 15.

    Docs • Value docs. Make it really easy to contribute

    to this too. • Evangelism at conferences helped improve adoption
  10. 16.

    Importance of research • We publish (occasionally) and regularly read

    the literature. • We do a journal club. • We also have allowed ‘low risk’ merges to pymc3. Reduce the bar for researchers
  11. 17.
  12. 18.

    Meet in person • Useful to meet in person. It

    helps build up relationships.
  13. 19.

    Profile https://discourse.pymc.io/t/multiple-linear-regression/3139 https://docs.pymc.io/notebooks/profiling.html I had a use case like this

    with a client. Went from 20 hour of the model running to 3 minutes • Reducing the number of iterations. NUTS is a very powerful tool. • Vectorization caused a lot of the speed improvements.
  14. 20.

    What’s coming next? We just had a PyMC4 summit in

    Montreal. Keep a lookout for our updates. Great support by the Tensorflow Probability team at Google. https://github.com/pymc-devs/pymc4
  15. 22.

    Your job is to inform better decisions You might think

    that your job is to understand the truth about reality or whatever. All science is about making better decisions. If your inference is wrong - then your decisions will be wrong.
  16. 24.

    What does this mean practically? To handle large scale problems

    or ‘big data’ problems in a Bayesian Inference framework - we need to use Hamiltonian samplers. Hamiltonian samplers work well under certain conditions. These conditions are often swept under the carpet.
  17. 25.

    What about regulation? Increasingly models will be deployed in regulated

    industries - and in a post GDPR world interpretability will matter more. If you work with healthcare data, finance data, insurance you should add Bayesian Statistics to your toolkit. We’ll discuss how to debug Bayesian models, using modern techniques such as NUTS. This is PyMC3 specific but the techniques apply to Rainier, Stan and BUGS.