Lessons learned PyMC3

PyMC3 lessons learned Peadar Coyle - PyMC3 committer, Blogger and
Data Scientist PyData Montreal @springcoil www.probabilisticprogrammingprimer.com

Why OSS matters - it democratises innovation @patio11

New release - PyMC3 3.7 PyMC3 3.7 is released! Highlights:
- Python 3 only - @arviz_devs for plotting - Data class for handling changing data between inference and posterior predictive - Big under the hood improvements, especially to prior predictive sampling and shape handling

Funding models for OSS are broken https://www.fordfoundation.org/about/library/reports-and-studies/roads-and-bridges-the-unseen-labor-be hind-our-digital-infrastructure/

Github sponsors - reasons for optimism https://github.com/sponsors

Our metrics

Isn’t everything a machine learning problem? Lots of problems are
small data or heteogeneous data problems. Traditional ML models such as XGBoost or Random Forests DON’T incorporate domain expertise or work well with small data.

What are the applications? How do I make money? Basically
anywhere you need to understand uncertainty, handle domain speciﬁc knowledge or handle small heterogeneous data. Marketing is a good use case, A/B testing, survey data, pricing modelling and many use cases in terms of risk modelling. What all of these problems have in common is that uncertainty quantiﬁcation matters

What is a PPL? A PPL is a Probabilistic Programming
Language that treats random variables as ﬁrst class citizens.

Who uses Stan?

Who uses PyMC3

Box loop

Community is important invest in it Community is extremely important.
GSoC/ Leadership Negative Our gender split isn’t great - it’s a problem for all of OSS. Would love to know some solutions to this.

Tooling • Great work by the likes of Ravin Kumar/Austin
Rochford on the tooling side, CI/CD helps in research software too. • Good test cases and docker work too helped a lot.

Docs • Value docs. Make it really easy to contribute
to this too. • Evangelism at conferences helped improve adoption

Importance of research • We publish (occasionally) and regularly read
the literature. • We do a journal club. • We also have allowed ‘low risk’ merges to pymc3. Reduce the bar for researchers

Meet in person • Useful to meet in person. It
helps build up relationships.

Proﬁle https://discourse.pymc.io/t/multiple-linear-regression/3139 https://docs.pymc.io/notebooks/profiling.html I had a use case like this
with a client. Went from 20 hour of the model running to 3 minutes • Reducing the number of iterations. NUTS is a very powerful tool. • Vectorization caused a lot of the speed improvements.

What’s coming next? We just had a PyMC4 summit in
Montreal. Keep a lookout for our updates. Great support by the Tensorﬂow Probability team at Google. https://github.com/pymc-devs/pymc4

Thank you - You may want to check out www.probabilisticprogrammingprimer.com

Your job is to inform better decisions You might think
that your job is to understand the truth about reality or whatever. All science is about making better decisions. If your inference is wrong - then your decisions will be wrong.

More applications

What does this mean practically? To handle large scale problems
or ‘big data’ problems in a Bayesian Inference framework - we need to use Hamiltonian samplers. Hamiltonian samplers work well under certain conditions. These conditions are often swept under the carpet.

What about regulation? Increasingly models will be deployed in regulated
industries - and in a post GDPR world interpretability will matter more. If you work with healthcare data, ﬁnance data, insurance you should add Bayesian Statistics to your toolkit. We’ll discuss how to debug Bayesian models, using modern techniques such as NUTS. This is PyMC3 speciﬁc but the techniques apply to Rainier, Stan and BUGS.

Lessons learned PyMC3

Lessons learned PyMC3

springcoil

More Decks by springcoil

Other Decks in Programming

Featured

Transcript

PyMC3 lessons learned Peadar Coyle - PyMC3 committer, Blogger and

Why OSS matters - it democratises innovation @patio11

New release - PyMC3 3.7 PyMC3 3.7 is released! Highlights:

Funding models for OSS are broken https://www.fordfoundation.org/about/library/reports-and-studies/roads-and-bridges-the-unseen-labor-be hind-our-digital-infrastructure/

Github sponsors - reasons for optimism https://github.com/sponsors

Our metrics

Isn’t everything a machine learning problem? Lots of problems are

What are the applications? How do I make money? Basically

What is a PPL? A PPL is a Probabilistic Programming

Who uses Stan?

Who uses PyMC3

Box loop

Community is important invest in it Community is extremely important.

Tooling • Great work by the likes of Ravin Kumar/Austin

Docs • Value docs. Make it really easy to contribute

Importance of research • We publish (occasionally) and regularly read

Meet in person • Useful to meet in person. It

Proﬁle https://discourse.pymc.io/t/multiple-linear-regression/3139 https://docs.pymc.io/notebooks/profiling.html I had a use case like this

What’s coming next? We just had a PyMC4 summit in

Thank you - You may want to check out www.probabilisticprogrammingprimer.com

Your job is to inform better decisions You might think

More applications

What does this mean practically? To handle large scale problems

What about regulation? Increasingly models will be deployed in regulated