Introduction to the data analysis using python

1a4bb4b2da22fbbe6c16b0b19562b5a7?s=47 anonaka
September 03, 2017

Introduction to the data analysis using python

PyCon APAC 2017 presentation

1a4bb4b2da22fbbe6c16b0b19562b5a7?s=128

anonaka

September 03, 2017
Tweet

Transcript

  1. Introduction to the data analysis using python Akira Nonaka XOXZO

  2. #FJOUFSBDUJWF

  3. Who am I ? • XOXZO Evangelist • A Flying

    Python Programmer
  4. About XOXZO • Provide SMS, Telephony API • No office

    • Everybody works remotely
  5. None
  6. Ten people Six countries Nine cities

  7. None
  8. Outline 1. Tools 2. Domain knowledge 3. Value of data

    analysis
  9. 5PPMT

  10. Tools • Python • numpy • pandas • matplotlib •

    Jupyter notebook
  11. None
  12. None
  13. None
  14. %PNBJOLOPXMFEHF

  15. Why horse racing data? • Over 30 years official data.

    • Clean data. No need for scraping. • Some chances of making money?
  16. Let’s take a look at running speed

  17. Speed • Faster horse wins the race • Distance and

    Time • Regression analysis (linear model)
  18. None
  19. • Horses run about 60km/h. • I want to compare

    the speed of horse A runs 1km and horse B runs 2km
  20. Is the relationship linear?

  21. Hypothesis • They must get tired if run long distance.

    • Regression analysis with quadratic model.
  22. %JTUBODF 5JNF

  23. quadratic coefficient is negative convex upwards

  24. Check other years

  25. None
  26. None
  27. Expert advice • Every racecourse has different shape, straight line

    length and corner radius, etc. • It is not right to compare the data of various racecourses together.
  28. Analysis by racecourse First off from Tokyo Racecourse

  29. RVBESBUJDDPF⒏DJFOUJTQPTJUJWF DPOWFYEPXOXBSET

  30. None
  31. None
  32. Convex shape 2014 2015 Convex downward 11 14 Convex upward

    8 5
  33. Tokyo Racecourse

  34. Kyoto Racecourse

  35. Hanshin Racecourse

  36. Nakayama Racecourse

  37. It is almost meaningless to compare the speed if the

    racecourse/distance is different
  38. Lessons learned •Fatigue is not significant factor in horse racing.

    •Knowing the target domain is very important.
  39. 7BMVFPGEBUBBOBMZTJT

  40. ROI

  41.   JRA Pay back Japanese horse racing system

  42. Human predictions are reasonably accurate

  43. Win Fav Win Rate 1 32.69 2 18.85 3 13.22

    4 9.41 5 7.08 6 5.45 7 3.93 8 2.86 9 2.12 10 1.45
  44. *O/PSUI"NFSJDB Public betting favorites win approximately 33 percent of all

    races and finish second 53 percent of the time. Second choices win approximately 21 percent of all races and finish second 42 percent of the time. So the top two choices win 54 percent of the races and finish second 74 percent of the time. You might even want to consider the fact that third choices win approximately 14 percent of all races run over the course of a year. http://www.predictem.com/horse/profit.php
  45. None
  46. Strategy • We humans sometime put too much emphasis on

    certain factors and ignore others. • That is where data analysis can make a difference.
  47. Strategy X

  48. Accumulated Payback Sequence of tickets bought from Jan.1 - Dec.

    31
  49. )JTUPSJDBM3FDPSE 1BZCBDL

  50. None
  51. None
  52.          

       :FBS                 1SPpU :FO
  53. Lessons learned •Find under evaluated horses. •Win rate of strategy

    X is 8.8%.
  54. Next Goal • Use machine learning to tune parameters of

    strategy X
  55. None
  56. None