Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to the data analysis using python

anonaka
September 03, 2017

Introduction to the data analysis using python

PyCon APAC 2017 presentation

anonaka

September 03, 2017
Tweet

More Decks by anonaka

Other Decks in Research

Transcript

  1. Introduction to the data
    analysis using python
    Akira Nonaka
    XOXZO

    View Slide

  2. #FJOUFSBDUJWF

    View Slide

  3. Who am I ?
    • XOXZO Evangelist
    • A Flying Python Programmer

    View Slide

  4. About XOXZO
    • Provide SMS, Telephony API
    • No office
    • Everybody works remotely

    View Slide

  5. View Slide

  6. Ten people
    Six countries
    Nine cities

    View Slide

  7. View Slide

  8. Outline
    1. Tools
    2. Domain knowledge
    3. Value of data analysis

    View Slide

  9. 5PPMT

    View Slide

  10. Tools
    • Python
    • numpy
    • pandas
    • matplotlib
    • Jupyter notebook

    View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. %PNBJOLOPXMFEHF

    View Slide

  15. Why horse racing data?
    • Over 30 years official data.
    • Clean data. No need for scraping.
    • Some chances of making money?

    View Slide

  16. Let’s take a look at running speed

    View Slide

  17. Speed
    • Faster horse wins the race
    • Distance and Time
    • Regression analysis (linear model)

    View Slide

  18. View Slide

  19. • Horses run about 60km/h.
    • I want to compare the speed of
    horse A runs 1km and horse B runs
    2km

    View Slide

  20. Is the relationship linear?

    View Slide

  21. Hypothesis
    • They must get tired if run long distance.
    • Regression analysis with quadratic model.

    View Slide

  22. %JTUBODF
    5JNF

    View Slide

  23. quadratic coefficient is negative
    convex upwards

    View Slide

  24. Check other years

    View Slide

  25. View Slide

  26. View Slide

  27. Expert advice
    • Every racecourse has different shape, straight
    line length and corner radius, etc.
    • It is not right to compare the data of various
    racecourses together.

    View Slide

  28. Analysis by racecourse
    First off from Tokyo Racecourse

    View Slide

  29. RVBESBUJDDPF⒏DJFOUJTQPTJUJWF
    DPOWFYEPXOXBSET

    View Slide

  30. View Slide

  31. View Slide

  32. Convex shape
    2014 2015
    Convex
    downward
    11 14
    Convex upward 8 5

    View Slide

  33. Tokyo Racecourse

    View Slide

  34. Kyoto Racecourse

    View Slide

  35. Hanshin Racecourse

    View Slide

  36. Nakayama Racecourse

    View Slide

  37. It is almost meaningless to compare the
    speed if the racecourse/distance is
    different

    View Slide

  38. Lessons learned
    •Fatigue is not significant factor in
    horse racing.
    •Knowing the target domain is very
    important.

    View Slide

  39. 7BMVFPGEBUBBOBMZTJT

    View Slide

  40. ROI

    View Slide



  41. JRA
    Pay back
    Japanese horse racing system

    View Slide

  42. Human predictions are
    reasonably accurate

    View Slide

  43. Win Fav Win Rate
    1 32.69
    2 18.85
    3 13.22
    4 9.41
    5 7.08
    6 5.45
    7 3.93
    8 2.86
    9 2.12
    10 1.45

    View Slide

  44. *O/PSUI"NFSJDB
    Public betting favorites win approximately 33 percent of all
    races and finish second 53 percent of the time. Second choices
    win approximately 21 percent of all races and finish second 42
    percent of the time. So the top two choices win 54 percent of
    the races and finish second 74 percent of the time. You might
    even want to consider the fact that third choices win
    approximately 14 percent of all races run over the course of a
    year.
    http://www.predictem.com/horse/profit.php

    View Slide

  45. View Slide

  46. Strategy
    • We humans sometime put too much emphasis
    on certain factors and ignore others.
    • That is where data analysis can make a
    difference.

    View Slide

  47. Strategy X

    View Slide

  48. Accumulated
    Payback
    Sequence of tickets bought from Jan.1 - Dec. 31

    View Slide

  49. )JTUPSJDBM3FDPSE
    1BZCBDL

    View Slide

  50. View Slide

  51. View Slide








  52. :FBS

    1SPpU :FO

    View Slide

  53. Lessons learned
    •Find under evaluated horses.
    •Win rate of strategy X is 8.8%.

    View Slide

  54. Next Goal
    • Use machine learning to tune parameters of
    strategy X

    View Slide

  55. View Slide

  56. View Slide