Data Scientist for a large Telecommunications Company Masters in Mathematics Interned at Amazon Was a consultant for a while Occasional contributor to Pandas and other projects Co-organizer of the Data Science Meetup in Luxembourg Member of Royal Statistical Society and NumFOCUS @springcoil
random variables instead of variables Allows you to create a generative story rather than a black box A different tool to Machine Learning A different paradigm to frequentist statistics Forces you to be explicit about your 'subjective' assumptions
textbooks Bayesians This is a hard area to do by pen and paper, and most integrals can't be solved in exact form Thankfully there was an invention of Monte Carlo Simulations These simulations are used to approximate your likelihood function
have PyMC3 A complete rewrite of PyMC2 now in 'Beta' status Based upon Theano Computational techniques for handling gradients Automatic Differentiation and GPU speedup Theano - is also used in deep learning! Currently there is a project to port ' ' from I gave a thorough tutorial on this - Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck BMH PyMC2 to PyMC3 my github
to do a model of the Six Nations last year. I wanted to build an understandable model to predict the winner Key Info: Inferring the 'strength' of each team. We only have scoring data, which is noisy hence Bayesian Stats
picked Gamma as a prior for all teams 2. I used a Hierarchical Model because I wanted home advantage to be stronger for stronger teams based 3. From this I was able to create a novel model based only on historical results and scoring intensity 4. I simulated the likelihood function using MCMC
that England would come out on top. Ireland actually won by points difference of 6 points. It really came down to the wire! "Prediction is difficult especially about the future" One of the problems is what we call 'over-shrinkage' and you can delve into the results to see what the errors are, my model was within the errors.
using PyMC2 and PyMC3 Generative stories help you build up interest with your colleagues Communication is the 'last mile' problem of Data Science PyMC3 is cool please use it and please contribute