Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EuroSciPy talk on Probabilistic programming

springcoil
August 22, 2015

EuroSciPy talk on Probabilistic programming

Slides from the talk I will give in EuroSciPy

springcoil

August 22, 2015
Tweet

More Decks by springcoil

Other Decks in Technology

Transcript

  1. Probabilistic
    Probabilistic
    Programming
    Programming
    A Brief introduction to Probabilistic Programming
    and Python
    Late August 2015
    [email protected]
    All opinions my own

    View full-size slide

  2. Who am I?
    Who am I?
    I work as a Data Scientist for a large Telecommunications Company
    Masters in Mathematics
    Interned at Amazon
    Was a consultant for a while
    Occasional contributor to Pandas and other projects
    Co-organizer of the Data Science Meetup in Luxembourg
    Member of Royal Statistical Society and NumFOCUS
    @springcoil

    View full-size slide

  3. What is Probabilistic Programming
    What is Probabilistic Programming
    Basically using random variables instead of variables
    Allows you to create a generative story rather than a black box
    A different tool to Machine Learning
    A different paradigm to frequentist statistics
    Forces you to be explicit about your 'subjective' assumptions

    View full-size slide

  4. Source: Oliver Grisel

    View full-size slide

  5. Source: Oliver Grisel

    View full-size slide

  6. Bayesian Statistics
    Bayesian Statistics
    I studied Mathematics, and encountered in textbooks Bayesians
    This is a hard area to do by pen and paper, and most integrals can't be
    solved in exact form
    Thankfully there was an invention of Monte Carlo Simulations
    These simulations are used to approximate your likelihood function

    View full-size slide

  7. Some terminology
    Some terminology

    View full-size slide

  8. Attribution: Quantopian blog

    View full-size slide

  9. How do you pick your prior?
    How do you pick your prior?
    This is a bit of an art
    You generally base the prior on experience
    As you add more data this matters less and less

    View full-size slide

  10. Huh but isn't Probabilistic
    Huh but isn't Probabilistic
    Programming just Stan and BUGS?
    Programming just Stan and BUGS?

    View full-size slide

  11. No in Python you have PyMC3
    No in Python you have PyMC3
    A complete rewrite of PyMC2 now in 'Beta' status
    Based upon Theano
    Computational techniques for handling gradients
    Automatic Differentiation and GPU speedup
    Theano - is also used in deep learning!
    Currently there is a project to port ' ' from
    I gave a thorough tutorial on this -
    Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck
    BMH PyMC2 to PyMC3
    my github

    View full-size slide

  12. Case study: Rugby Analytics
    Case study: Rugby Analytics
    I wanted to do a model of the Six Nations last year.
    I wanted to build an understandable model to predict the winner
    Key Info: Inferring the 'strength' of each team.
    We only have scoring data, which is noisy hence Bayesian Stats

    View full-size slide

  13. What did I do?
    What did I do?
    1. I picked Gamma as a prior for all teams
    2. I used a Hierarchical Model because I wanted home advantage to be
    stronger for stronger teams based
    3. From this I was able to create a novel model based only on historical
    results and scoring intensity
    4. I simulated the likelihood function using MCMC

    View full-size slide

  14. Run the model
    Run the model

    View full-size slide

  15. What actually happened
    What actually happened
    The model incorrectly predicted that England would come out on top.
    Ireland actually won by points difference of 6 points.
    It really came down to the wire!
    "Prediction is difficult especially about the future"
    One of the problems is what we call 'over-shrinkage' and you can
    delve into the results to see what the errors are, my model was within
    the errors.

    View full-size slide

  16. Lessons learned
    Lessons learned
    I can build an explainable model using PyMC2 and PyMC3
    Generative stories help you build up interest with your colleagues
    Communication is the 'last mile' problem of Data Science
    PyMC3 is cool please use it and please contribute

    View full-size slide

  17. Wanna learn more?
    Wanna learn more?
    BMH
    BMH
    Jake VanDerPlas
    PyMC3
    PyMC3
    [email protected]
    [email protected]

    View full-size slide