Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EuroSciPy talk on Probabilistic programming

August 22, 2015

EuroSciPy talk on Probabilistic programming

Slides from the talk I will give in EuroSciPy


August 22, 2015

More Decks by springcoil

Other Decks in Technology


  1. Probabilistic Probabilistic Programming Programming A Brief introduction to Probabilistic Programming

    and Python Late August 2015 [email protected] All opinions my own
  2. Who am I? Who am I? I work as a

    Data Scientist for a large Telecommunications Company Masters in Mathematics Interned at Amazon Was a consultant for a while Occasional contributor to Pandas and other projects Co-organizer of the Data Science Meetup in Luxembourg Member of Royal Statistical Society and NumFOCUS @springcoil
  3. What is Probabilistic Programming What is Probabilistic Programming Basically using

    random variables instead of variables Allows you to create a generative story rather than a black box A different tool to Machine Learning A different paradigm to frequentist statistics Forces you to be explicit about your 'subjective' assumptions
  4. Source: Oliver Grisel

  5. Source: Oliver Grisel

  6. Bayesian Statistics Bayesian Statistics I studied Mathematics, and encountered in

    textbooks Bayesians This is a hard area to do by pen and paper, and most integrals can't be solved in exact form Thankfully there was an invention of Monte Carlo Simulations These simulations are used to approximate your likelihood function
  7. None
  8. Some terminology Some terminology

  9. Attribution: Quantopian blog

  10. How do you pick your prior? How do you pick

    your prior? This is a bit of an art You generally base the prior on experience As you add more data this matters less and less
  11. None
  12. Huh but isn't Probabilistic Huh but isn't Probabilistic Programming just

    Stan and BUGS? Programming just Stan and BUGS?
  13. No in Python you have PyMC3 No in Python you

    have PyMC3 A complete rewrite of PyMC2 now in 'Beta' status Based upon Theano Computational techniques for handling gradients Automatic Differentiation and GPU speedup Theano - is also used in deep learning! Currently there is a project to port ' ' from I gave a thorough tutorial on this - Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck BMH PyMC2 to PyMC3 my github
  14. Case study: Rugby Analytics Case study: Rugby Analytics I wanted

    to do a model of the Six Nations last year. I wanted to build an understandable model to predict the winner Key Info: Inferring the 'strength' of each team. We only have scoring data, which is noisy hence Bayesian Stats
  15. What did I do? What did I do? 1. I

    picked Gamma as a prior for all teams 2. I used a Hierarchical Model because I wanted home advantage to be stronger for stronger teams based 3. From this I was able to create a novel model based only on historical results and scoring intensity 4. I simulated the likelihood function using MCMC
  16. None
  17. None
  18. None
  19. Run the model Run the model

  20. None
  21. None
  22. What actually happened What actually happened The model incorrectly predicted

    that England would come out on top. Ireland actually won by points difference of 6 points. It really came down to the wire! "Prediction is difficult especially about the future" One of the problems is what we call 'over-shrinkage' and you can delve into the results to see what the errors are, my model was within the errors.
  23. Lessons learned Lessons learned I can build an explainable model

    using PyMC2 and PyMC3 Generative stories help you build up interest with your colleagues Communication is the 'last mile' problem of Data Science PyMC3 is cool please use it and please contribute
  24. Wanna learn more? Wanna learn more? BMH BMH Jake VanDerPlas

    PyMC3 PyMC3 [email protected] [email protected]
  25. None