Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons learned PyMC3

Lessons learned PyMC3

I discuss the lessons learned from a few years as a core developer of an open source project.

springcoil

May 29, 2019
Tweet

More Decks by springcoil

Other Decks in Programming

Transcript

  1. PyMC3 lessons learned
    Peadar Coyle - PyMC3 committer, Blogger and
    Data Scientist
    PyData Montreal
    @springcoil
    www.probabilisticprogrammingprimer.com

    View full-size slide

  2. Why OSS matters - it democratises innovation
    @patio11

    View full-size slide

  3. New release - PyMC3 3.7
    PyMC3 3.7 is released! Highlights:
    - Python 3 only
    - @arviz_devs for plotting
    - Data class for handling changing data between
    inference and posterior predictive
    - Big under the hood improvements, especially to prior
    predictive sampling and shape handling

    View full-size slide

  4. Funding models for OSS are broken
    https://www.fordfoundation.org/about/library/reports-and-studies/roads-and-bridges-the-unseen-labor-be
    hind-our-digital-infrastructure/

    View full-size slide

  5. Github sponsors - reasons for optimism
    https://github.com/sponsors

    View full-size slide

  6. Isn’t everything a machine learning problem?
    Lots of problems are small data or heteogeneous data problems.
    Traditional ML models such as XGBoost or Random Forests DON’T
    incorporate domain expertise or work well with small data.

    View full-size slide

  7. What are the applications? How do I make money?
    Basically anywhere you need to understand uncertainty, handle domain
    specific knowledge or handle small heterogeneous data.
    Marketing is a good use case, A/B testing, survey data, pricing modelling
    and many use cases in terms of risk modelling.
    What all of these problems have in common is that uncertainty
    quantification matters

    View full-size slide

  8. What is a PPL?
    A PPL is a Probabilistic Programming Language that treats random variables as first class citizens.

    View full-size slide

  9. Who uses Stan?

    View full-size slide

  10. Who uses PyMC3

    View full-size slide

  11. Community is important invest in it
    Community is extremely important. GSoC/ Leadership
    Negative Our gender split isn’t great - it’s a problem for
    all of OSS. Would love to know some solutions to this.

    View full-size slide

  12. Tooling
    ● Great work by the likes of Ravin Kumar/Austin
    Rochford on the tooling side, CI/CD helps in research
    software too.
    ● Good test cases and docker work too helped a lot.

    View full-size slide

  13. Docs
    ● Value docs. Make it really easy to contribute to this
    too.
    ● Evangelism at conferences helped improve adoption

    View full-size slide

  14. Importance of research
    ● We publish (occasionally) and regularly read the
    literature.
    ● We do a journal club.
    ● We also have allowed ‘low risk’ merges to pymc3. Reduce
    the bar for researchers

    View full-size slide

  15. Meet in person
    ● Useful to meet in
    person. It helps
    build up
    relationships.

    View full-size slide

  16. Profile
    https://discourse.pymc.io/t/multiple-linear-regression/3139
    https://docs.pymc.io/notebooks/profiling.html
    I had a use case like this with a client. Went from 20 hour of the model
    running to 3 minutes
    ● Reducing the number of iterations. NUTS is a very powerful tool.
    ● Vectorization caused a lot of the speed improvements.

    View full-size slide

  17. What’s coming next?
    We just had a PyMC4 summit in Montreal. Keep a lookout for our updates.
    Great support by the Tensorflow Probability team at Google.
    https://github.com/pymc-devs/pymc4

    View full-size slide

  18. Thank you - You may want to check out
    www.probabilisticprogrammingprimer.com

    View full-size slide

  19. Your job is to inform better decisions
    You might think that your job is to understand the truth about reality or whatever.
    All science is about making better decisions.
    If your inference is wrong - then your decisions will be wrong.

    View full-size slide

  20. More applications

    View full-size slide

  21. What does this mean practically?
    To handle large scale problems or ‘big data’ problems in a Bayesian Inference framework - we
    need to use Hamiltonian samplers.
    Hamiltonian samplers work well under certain conditions. These conditions are often swept
    under the carpet.

    View full-size slide

  22. What about regulation?
    Increasingly models will be deployed in regulated industries - and in a post GDPR world
    interpretability will matter more. If you work with healthcare data, finance data, insurance you
    should add Bayesian Statistics to your toolkit.
    We’ll discuss how to debug Bayesian models, using modern techniques such as NUTS. This is
    PyMC3 specific but the techniques apply to Rainier, Stan and BUGS.

    View full-size slide