anonaka
September 03, 2017
820

# Introduction to the data analysis using python

PyCon APAC 2017 presentation

## anonaka

September 03, 2017

## Transcript

1. Introduction to the data
analysis using python
Akira Nonaka
XOXZO

2. #FJOUFSBDUJWF

3. Who am I ?
• XOXZO Evangelist
• A Flying Python Programmer

• Provide SMS, Telephony API
• No office
• Everybody works remotely

5. Ten people
Six countries
Nine cities

6. Outline
1. Tools
2. Domain knowledge
3. Value of data analysis

7. 5PPMT

8. Tools
• Python
• numpy
• pandas
• matplotlib
• Jupyter notebook

9. %PNBJOLOPXMFEHF

10. Why horse racing data?
• Over 30 years official data.
• Clean data. No need for scraping.
• Some chances of making money?

11. Let’s take a look at running speed

12. Speed
• Faster horse wins the race
• Distance and Time
• Regression analysis (linear model)

13. • Horses run about 60km/h.
• I want to compare the speed of
horse A runs 1km and horse B runs
2km

14. Is the relationship linear?

15. Hypothesis
• They must get tired if run long distance.
• Regression analysis with quadratic model.

16. %JTUBODF
5JNF

convex upwards

18. Check other years

• Every racecourse has different shape, straight
line length and corner radius, etc.
• It is not right to compare the data of various
racecourses together.

20. Analysis by racecourse
First off from Tokyo Racecourse

21. RVBESBUJDDPF⒏DJFOUJTQPTJUJWF
DPOWFYEPXOXBSET

22. Convex shape
2014 2015
Convex
downward
11 14
Convex upward 8 5

23. Tokyo Racecourse

24. Kyoto Racecourse

25. Hanshin Racecourse

26. Nakayama Racecourse

27. It is almost meaningless to compare the
speed if the racecourse/distance is
different

28. Lessons learned
•Fatigue is not signiﬁcant factor in
horse racing.
•Knowing the target domain is very
important.

29. 7BMVFPGEBUBBOBMZTJT

30. ROI

31. JRA
Pay back
Japanese horse racing system

32. Human predictions are
reasonably accurate

33. Win Fav Win Rate
1 32.69
2 18.85
3 13.22
4 9.41
5 7.08
6 5.45
7 3.93
8 2.86
9 2.12
10 1.45

34. *O/PSUI"NFSJDB
Public betting favorites win approximately 33 percent of all
races and ﬁnish second 53 percent of the time. Second choices
win approximately 21 percent of all races and ﬁnish second 42
percent of the time. So the top two choices win 54 percent of
the races and ﬁnish second 74 percent of the time. You might
even want to consider the fact that third choices win
approximately 14 percent of all races run over the course of a
year.
http://www.predictem.com/horse/proﬁt.php

35. Strategy
• We humans sometime put too much emphasis
on certain factors and ignore others.
• That is where data analysis can make a
difference.

36. Strategy X

37. Accumulated
Payback
Sequence of tickets bought from Jan.1 - Dec. 31

38. )JTUPSJDBM3FDPSE
1BZCBDL

39. :FBS

1SPpU :FO

40. Lessons learned
•Find under evaluated horses.
•Win rate of strategy X is 8.8%.

41. Next Goal
• Use machine learning to tune parameters of
strategy X