springcoil
August 22, 2015
250

# EuroSciPy talk on Probabilistic programming

Slides from the talk I will give in EuroSciPy

August 22, 2015

## Transcript

1. Probabilistic
Probabilistic
Programming
Programming
A Brief introduction to Probabilistic Programming
and Python
Late August 2015
[email protected]
All opinions my own

2. Who am I?
Who am I?
I work as a Data Scientist for a large Telecommunications Company
Masters in Mathematics
Interned at Amazon
Was a consultant for a while
Occasional contributor to Pandas and other projects
Co-organizer of the Data Science Meetup in Luxembourg
Member of Royal Statistical Society and NumFOCUS
@springcoil

3. What is Probabilistic Programming
What is Probabilistic Programming
Basically using random variables instead of variables
Allows you to create a generative story rather than a black box
A diﬀerent tool to Machine Learning
A diﬀerent paradigm to frequentist statistics

4. Source: Oliver Grisel

5. Source: Oliver Grisel

6. Bayesian Statistics
Bayesian Statistics
I studied Mathematics, and encountered in textbooks Bayesians
This is a hard area to do by pen and paper, and most integrals can't be
solved in exact form
Thankfully there was an invention of Monte Carlo Simulations
These simulations are used to approximate your likelihood function

7. Some terminology
Some terminology

9. How do you pick your prior?
How do you pick your prior?
This is a bit of an art
You generally base the prior on experience
As you add more data this matters less and less

10. Huh but isn't Probabilistic
Huh but isn't Probabilistic
Programming just Stan and BUGS?
Programming just Stan and BUGS?

11. No in Python you have PyMC3
No in Python you have PyMC3
A complete rewrite of PyMC2 now in 'Beta' status
Based upon Theano
Automatic Diﬀerentiation and GPU speedup
Theano - is also used in deep learning!
Currently there is a project to port ' ' from
I gave a thorough tutorial on this -
Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck
BMH PyMC2 to PyMC3
my github

12. Case study: Rugby Analytics
Case study: Rugby Analytics
I wanted to do a model of the Six Nations last year.
I wanted to build an understandable model to predict the winner
Key Info: Inferring the 'strength' of each team.
We only have scoring data, which is noisy hence Bayesian Stats

13. What did I do?
What did I do?
1. I picked Gamma as a prior for all teams
2. I used a Hierarchical Model because I wanted home advantage to be
stronger for stronger teams based
3. From this I was able to create a novel model based only on historical
results and scoring intensity
4. I simulated the likelihood function using MCMC

14. Run the model
Run the model

15. What actually happened
What actually happened
The model incorrectly predicted that England would come out on top.
Ireland actually won by points diﬀerence of 6 points.
It really came down to the wire!
"Prediction is diﬃcult especially about the future"
One of the problems is what we call 'over-shrinkage' and you can
delve into the results to see what the errors are, my model was within
the errors.

16. Lessons learned
Lessons learned
I can build an explainable model using PyMC2 and PyMC3