∼ 5 years old Calculates probabilities for football games www.footballradar.com Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 2 / 19
all started) A review of a simple model Example of python solving the simple model Is Juventus going to beat Real Madrid tonight? About me: 2004 - 2010 MSc in physics 2010 - 2013 PhD in physics 2013 - now Working at football radar [email protected] Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 3 / 19
information (Goals, Temperature, Diffusion Coefficient, ...) from microscopic information (interaction potential, players, pitch quality, ...) Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 4 / 19
to show that goals could be fitted to both a Poisson distribution and negative binomial Reep and Pollard 1968. Showed that consecutive passes fits a negative binomial quite well Maher 1982, Finds the Poisson distribution to be superior to the negative binomial. First one to suggest a model Dixon and Coles 1996. Introduces time dependence of team strengths, building on Mahers work ∼ 1990 - present: Lots of papers See e.g. The numbers game by Chris Anderson and David Sally for a historic review Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 5 / 19
theory ”Chance does dominate the game” 1Reep and Pollard 1968 Skill and Chance in Association Football, Journal of the Royal Statistical Society. Series A (General) 1968, page 581 - 585 Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 6 / 19
the game” Mahers paper is to football modelling what Rahman’s paper2 is to Molecular Dynamics 2A. Rahman (1964). “Correlations in the Motion of Atoms in Liquid Argon”. Physical Review 136: A405-A411. Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 7 / 19
football game consists of two independent games. One for the home goals and one for the away goals: Pr(Xi,j = x, Yi,j = y) = λxe−λ x! µye−µ y! λ = αi · βj µ = αj · βi Each team gets an attack strength α and a defence weakness β. Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 9 / 19
identical independent distributed (iid), the likelihood function is: L = k λxk k e−λk xk! µyk k e−µk yk! and so the log likelihood is: log (L) = k −λk + xk log(λk) − log(xk!) − µk + yk log(µk) − log(yk!) xk is home goals for game k yk is away goals for game k λk = αiβj is expected home goals in game k with team i at home µk = αjβi is expected away goals in game k with team j playing away Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 10 / 19
years Serie A, where Juventus is playing and La Liga where Real Madrid is playing. So far, 689 games have been played in those two leagues. Each game is an instance of this class: Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 11 / 19
minimize Mahers model can be solved with: Normalise attack and defense strength to have means = 1: 1 = 1 N i αi = 1 N i βi Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 13 / 19
model results, we can predict the outcome of Juventus vs Real Madrid: Team α β Juventus 1.601 0.466 Real Madrid 2.440 0.852 Expected scoreline: Juventus 1.601*0.852 - 2.440*0.466 Real Madrid Juventus 1.364 - 1.137 Real Madrid Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 16 / 19
for a given scoreline (x, y) is: P(x, y) = 1.36xe−1.36 x! 1.13ye−1.13 y! Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 18 / 19
a simplified model. Factors need to consider: Home advantage Team strength change over time (momentum) Is Ronaldo fit? Juventus has already won Serie A, more focus? Fatigue. Some players may have tired legs. It’s near the end of the season Whatever you can think of that influences the result! Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 19 / 19
a simplified model. Factors need to consider: Home advantage Team strength change over time (momentum) Is Ronaldo fit? Juventus has already won Serie A, more focus? Fatigue. Some players may have tired legs. It’s near the end of the season Whatever you can think of that influences the result! Thank you! Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 19 / 19