Slide 1

Slide 1 text

Probabilistic Probabilistic Programming Programming A Brief introduction to Probabilistic Programming and Python Late August 2015 [email protected] All opinions my own

Slide 2

Slide 2 text

Who am I? Who am I? I work as a Data Scientist for a large Telecommunications Company Masters in Mathematics Interned at Amazon Was a consultant for a while Occasional contributor to Pandas and other projects Co-organizer of the Data Science Meetup in Luxembourg Member of Royal Statistical Society and NumFOCUS @springcoil

Slide 3

Slide 3 text

What is Probabilistic Programming What is Probabilistic Programming Basically using random variables instead of variables Allows you to create a generative story rather than a black box A different tool to Machine Learning A different paradigm to frequentist statistics Forces you to be explicit about your 'subjective' assumptions

Slide 4

Slide 4 text

Source: Oliver Grisel

Slide 5

Slide 5 text

Source: Oliver Grisel

Slide 6

Slide 6 text

Bayesian Statistics Bayesian Statistics I studied Mathematics, and encountered in textbooks Bayesians This is a hard area to do by pen and paper, and most integrals can't be solved in exact form Thankfully there was an invention of Monte Carlo Simulations These simulations are used to approximate your likelihood function

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Some terminology Some terminology

Slide 9

Slide 9 text

Attribution: Quantopian blog

Slide 10

Slide 10 text

How do you pick your prior? How do you pick your prior? This is a bit of an art You generally base the prior on experience As you add more data this matters less and less

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Huh but isn't Probabilistic Huh but isn't Probabilistic Programming just Stan and BUGS? Programming just Stan and BUGS?

Slide 13

Slide 13 text

No in Python you have PyMC3 No in Python you have PyMC3 A complete rewrite of PyMC2 now in 'Beta' status Based upon Theano Computational techniques for handling gradients Automatic Differentiation and GPU speedup Theano - is also used in deep learning! Currently there is a project to port ' ' from I gave a thorough tutorial on this - Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck BMH PyMC2 to PyMC3 my github

Slide 14

Slide 14 text

Case study: Rugby Analytics Case study: Rugby Analytics I wanted to do a model of the Six Nations last year. I wanted to build an understandable model to predict the winner Key Info: Inferring the 'strength' of each team. We only have scoring data, which is noisy hence Bayesian Stats

Slide 15

Slide 15 text

What did I do? What did I do? 1. I picked Gamma as a prior for all teams 2. I used a Hierarchical Model because I wanted home advantage to be stronger for stronger teams based 3. From this I was able to create a novel model based only on historical results and scoring intensity 4. I simulated the likelihood function using MCMC

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Run the model Run the model

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

What actually happened What actually happened The model incorrectly predicted that England would come out on top. Ireland actually won by points difference of 6 points. It really came down to the wire! "Prediction is difficult especially about the future" One of the problems is what we call 'over-shrinkage' and you can delve into the results to see what the errors are, my model was within the errors.

Slide 23

Slide 23 text

Lessons learned Lessons learned I can build an explainable model using PyMC2 and PyMC3 Generative stories help you build up interest with your colleagues Communication is the 'last mile' problem of Data Science PyMC3 is cool please use it and please contribute

Slide 24

Slide 24 text

Wanna learn more? Wanna learn more? BMH BMH Jake VanDerPlas PyMC3 PyMC3 [email protected] [email protected]

Slide 25

Slide 25 text

No content