PyData meetup talk May 5th. 2015

PyData meetup talk May 5th. 2015

Short introduction to modelling football using python

54ec0a27dc8831ee05f2e46e02934ce2?s=128

Lasse Bøhling

May 05, 2015
Tweet

Transcript

  1. A review of modelling football Lasse Bøhling Football Radar PyData

    meetup May 5, 2015 Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 1 / 19
  2. Introduction Company information ∼ 150 employees ∼ 20 software engineers

    ∼ 5 years old Calculates probabilities for football games www.footballradar.com Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 2 / 19
  3. Introduction Company information Outline of talk Some history (how it

    all started) A review of a simple model Example of python solving the simple model Is Juventus going to beat Real Madrid tonight? About me: 2004 - 2010 MSc in physics 2010 - 2013 PhD in physics 2013 - now Working at football radar lasse.bohling@footballradar.com Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 3 / 19
  4. Similarities to Molecular Dynamics Same same, but different Extract macroscopic

    information (Goals, Temperature, Diffusion Coefficient, ...) from microscopic information (interaction potential, players, pitch quality, ...) Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 4 / 19
  5. Modelling Football History Rough Historic Timeline Moroney 1951. First one

    to show that goals could be fitted to both a Poisson distribution and negative binomial Reep and Pollard 1968. Showed that consecutive passes fits a negative binomial quite well Maher 1982, Finds the Poisson distribution to be superior to the negative binomial. First one to suggest a model Dixon and Coles 1996. Introduces time dependence of team strengths, building on Mahers work ∼ 1990 - present: Lots of papers See e.g. The numbers game by Chris Anderson and David Sally for a historic review Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 5 / 19
  6. Modelling Football History Highlights Reep1, inventor of the long ball

    theory ”Chance does dominate the game” 1Reep and Pollard 1968 Skill and Chance in Association Football, Journal of the Royal Statistical Society. Series A (General) 1968, page 581 - 585 Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 6 / 19
  7. Modelling Football History Maher 1982: ”skill rather than chance dominates

    the game” Mahers paper is to football modelling what Rahman’s paper2 is to Molecular Dynamics 2A. Rahman (1964). “Correlations in the Motion of Atoms in Liquid Argon”. Physical Review 136: A405-A411. Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 7 / 19
  8. Modelling Football History M. J. Maher3, Investigates a Poisson distribution

    in stead of the negative binomial proposed by Reep et. al. Pois(λ) = λke−λ k! , NB(r, p) = k + r − 1 k pk(1 − p)r 0 1 2 3 4 5 6 7 8 9 Goals 0.00 0.05 0.10 0.15 0.20 0.25 Probability Poisson with mean 2.7 0 1 2 3 4 5 6 7 8 9 Goals 0.00 0.05 0.10 0.15 0.20 0.25 Probability Negative Binomial: NB(2.7, 0.5) 3Modeling association football scores, Statistica Neerlandica 36 (1982), nr. 3. page 109 - 118 Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 8 / 19
  9. Review of simple model Maher (1982) Maher suggests that a

    football game consists of two independent games. One for the home goals and one for the away goals: Pr(Xi,j = x, Yi,j = y) = λxe−λ x! µye−µ y! λ = αi · βj µ = αj · βi Each team gets an attack strength α and a defence weakness β. Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 9 / 19
  10. Review of simple model Maher (1982) Assuming the goals are

    identical independent distributed (iid), the likelihood function is: L = k λxk k e−λk xk! µyk k e−µk yk! and so the log likelihood is: log (L) = k −λk + xk log(λk) − log(xk!) − µk + yk log(µk) − log(yk!) xk is home goals for game k yk is away goals for game k λk = αiβj is expected home goals in game k with team i at home µk = αjβi is expected away goals in game k with team j playing away Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 10 / 19
  11. Solving Mahers model with python We use data from this

    years Serie A, where Juventus is playing and La Liga where Real Madrid is playing. So far, 689 games have been played in those two leagues. Each game is an instance of this class: Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 11 / 19
  12. Solving Mahers model with python Optimiser class consist of two

    small functions: log L(α1, ..., αN , β1, ..., βN ) = k=1 −λk + xk log(λk) − µk + yk log(µk) Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 12 / 19
  13. Solving Mahers model with python Using scipy: from scipy.optimize import

    minimize Mahers model can be solved with: Normalise attack and defense strength to have means = 1: 1 = 1 N i αi = 1 N i βi Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 13 / 19
  14. Solving Mahers model with python La Liga results Lasse Bøhling

    (Football Radar) A review of modelling football PyData meetup May 5, 2015 14 / 19
  15. Solving Mahers model with python Serie A results Lasse Bøhling

    (Football Radar) A review of modelling football PyData meetup May 5, 2015 15 / 19
  16. Who is winning tonight Juventus or Real Madrid? Using the

    model results, we can predict the outcome of Juventus vs Real Madrid: Team α β Juventus 1.601 0.466 Real Madrid 2.440 0.852 Expected scoreline: Juventus 1.601*0.852 - 2.440*0.466 Real Madrid Juventus 1.364 - 1.137 Real Madrid Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 16 / 19
  17. Who is winning tonight Juventus or Real Madrid? Lasse Bøhling

    (Football Radar) A review of modelling football PyData meetup May 5, 2015 17 / 19
  18. Who is winning tonight Juventus or Real Madrid? The probability

    for a given scoreline (x, y) is: P(x, y) = 1.36xe−1.36 x! 1.13ye−1.13 y! Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 18 / 19
  19. Who is winning tonight Juventus or Real Madrid? This is

    a simplified model. Factors need to consider: Home advantage Team strength change over time (momentum) Is Ronaldo fit? Juventus has already won Serie A, more focus? Fatigue. Some players may have tired legs. It’s near the end of the season Whatever you can think of that influences the result! Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 19 / 19
  20. Who is winning tonight Juventus or Real Madrid? This is

    a simplified model. Factors need to consider: Home advantage Team strength change over time (momentum) Is Ronaldo fit? Juventus has already won Serie A, more focus? Fatigue. Some players may have tired legs. It’s near the end of the season Whatever you can think of that influences the result! Thank you! Lasse Bøhling (Football Radar) A review of modelling football PyData meetup May 5, 2015 19 / 19