Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EuroSciPy talk on Probabilistic programming

springcoil
August 22, 2015

EuroSciPy talk on Probabilistic programming

Slides from the talk I will give in EuroSciPy

springcoil

August 22, 2015
Tweet

More Decks by springcoil

Other Decks in Technology

Transcript

  1. Who am I? Who am I? I work as a

    Data Scientist for a large Telecommunications Company Masters in Mathematics Interned at Amazon Was a consultant for a while Occasional contributor to Pandas and other projects Co-organizer of the Data Science Meetup in Luxembourg Member of Royal Statistical Society and NumFOCUS @springcoil
  2. What is Probabilistic Programming What is Probabilistic Programming Basically using

    random variables instead of variables Allows you to create a generative story rather than a black box A different tool to Machine Learning A different paradigm to frequentist statistics Forces you to be explicit about your 'subjective' assumptions
  3. Bayesian Statistics Bayesian Statistics I studied Mathematics, and encountered in

    textbooks Bayesians This is a hard area to do by pen and paper, and most integrals can't be solved in exact form Thankfully there was an invention of Monte Carlo Simulations These simulations are used to approximate your likelihood function
  4. How do you pick your prior? How do you pick

    your prior? This is a bit of an art You generally base the prior on experience As you add more data this matters less and less
  5. Huh but isn't Probabilistic Huh but isn't Probabilistic Programming just

    Stan and BUGS? Programming just Stan and BUGS?
  6. No in Python you have PyMC3 No in Python you

    have PyMC3 A complete rewrite of PyMC2 now in 'Beta' status Based upon Theano Computational techniques for handling gradients Automatic Differentiation and GPU speedup Theano - is also used in deep learning! Currently there is a project to port ' ' from I gave a thorough tutorial on this - Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck BMH PyMC2 to PyMC3 my github
  7. Case study: Rugby Analytics Case study: Rugby Analytics I wanted

    to do a model of the Six Nations last year. I wanted to build an understandable model to predict the winner Key Info: Inferring the 'strength' of each team. We only have scoring data, which is noisy hence Bayesian Stats
  8. What did I do? What did I do? 1. I

    picked Gamma as a prior for all teams 2. I used a Hierarchical Model because I wanted home advantage to be stronger for stronger teams based 3. From this I was able to create a novel model based only on historical results and scoring intensity 4. I simulated the likelihood function using MCMC
  9. What actually happened What actually happened The model incorrectly predicted

    that England would come out on top. Ireland actually won by points difference of 6 points. It really came down to the wire! "Prediction is difficult especially about the future" One of the problems is what we call 'over-shrinkage' and you can delve into the results to see what the errors are, my model was within the errors.
  10. Lessons learned Lessons learned I can build an explainable model

    using PyMC2 and PyMC3 Generative stories help you build up interest with your colleagues Communication is the 'last mile' problem of Data Science PyMC3 is cool please use it and please contribute