Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to the data analysis using python

September 03, 2017

Introduction to the data analysis using python

PyCon APAC 2017 presentation


September 03, 2017

More Decks by anonaka

Other Decks in Research


  1. Why horse racing data? • Over 30 years official data.

    • Clean data. No need for scraping. • Some chances of making money?
  2. Speed • Faster horse wins the race • Distance and

    Time • Regression analysis (linear model)
  3. • Horses run about 60km/h. • I want to compare

    the speed of horse A runs 1km and horse B runs 2km
  4. Hypothesis • They must get tired if run long distance.

    • Regression analysis with quadratic model.
  5. Expert advice • Every racecourse has different shape, straight line

    length and corner radius, etc. • It is not right to compare the data of various racecourses together.
  6. It is almost meaningless to compare the speed if the

    racecourse/distance is different
  7. Lessons learned •Fatigue is not significant factor in horse racing.

    •Knowing the target domain is very important.
  8. ROI

  9. Win Fav Win Rate 1 32.69 2 18.85 3 13.22

    4 9.41 5 7.08 6 5.45 7 3.93 8 2.86 9 2.12 10 1.45
  10. *O/PSUI"NFSJDB Public betting favorites win approximately 33 percent of all

    races and finish second 53 percent of the time. Second choices win approximately 21 percent of all races and finish second 42 percent of the time. So the top two choices win 54 percent of the races and finish second 74 percent of the time. You might even want to consider the fact that third choices win approximately 14 percent of all races run over the course of a year. http://www.predictem.com/horse/profit.php
  11. Strategy • We humans sometime put too much emphasis on

    certain factors and ignore others. • That is where data analysis can make a difference.

       :FBS                 1SPpU :FO