Slide 1

Slide 1 text

Introduction to the data analysis using python Akira Nonaka XOXZO

Slide 2

Slide 2 text

#FJOUFSBDUJWF

Slide 3

Slide 3 text

Who am I ? • XOXZO Evangelist • A Flying Python Programmer

Slide 4

Slide 4 text

About XOXZO • Provide SMS, Telephony API • No office • Everybody works remotely

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Ten people Six countries Nine cities

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Outline 1. Tools 2. Domain knowledge 3. Value of data analysis

Slide 9

Slide 9 text

5PPMT

Slide 10

Slide 10 text

Tools • Python • numpy • pandas • matplotlib • Jupyter notebook

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

%PNBJOLOPXMFEHF

Slide 15

Slide 15 text

Why horse racing data? • Over 30 years official data. • Clean data. No need for scraping. • Some chances of making money?

Slide 16

Slide 16 text

Let’s take a look at running speed

Slide 17

Slide 17 text

Speed • Faster horse wins the race • Distance and Time • Regression analysis (linear model)

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

• Horses run about 60km/h. • I want to compare the speed of horse A runs 1km and horse B runs 2km

Slide 20

Slide 20 text

Is the relationship linear?

Slide 21

Slide 21 text

Hypothesis • They must get tired if run long distance. • Regression analysis with quadratic model.

Slide 22

Slide 22 text

%JTUBODF 5JNF

Slide 23

Slide 23 text

quadratic coefficient is negative convex upwards

Slide 24

Slide 24 text

Check other years

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Expert advice • Every racecourse has different shape, straight line length and corner radius, etc. • It is not right to compare the data of various racecourses together.

Slide 28

Slide 28 text

Analysis by racecourse First off from Tokyo Racecourse

Slide 29

Slide 29 text

RVBESBUJDDPF⒏DJFOUJTQPTJUJWF DPOWFYEPXOXBSET

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Convex shape 2014 2015 Convex downward 11 14 Convex upward 8 5

Slide 33

Slide 33 text

Tokyo Racecourse

Slide 34

Slide 34 text

Kyoto Racecourse

Slide 35

Slide 35 text

Hanshin Racecourse

Slide 36

Slide 36 text

Nakayama Racecourse

Slide 37

Slide 37 text

It is almost meaningless to compare the speed if the racecourse/distance is different

Slide 38

Slide 38 text

Lessons learned •Fatigue is not significant factor in horse racing. •Knowing the target domain is very important.

Slide 39

Slide 39 text

7BMVFPGEBUBBOBMZTJT

Slide 40

Slide 40 text

ROI

Slide 41

Slide 41 text

JRA Pay back Japanese horse racing system

Slide 42

Slide 42 text

Human predictions are reasonably accurate

Slide 43

Slide 43 text

Win Fav Win Rate 1 32.69 2 18.85 3 13.22 4 9.41 5 7.08 6 5.45 7 3.93 8 2.86 9 2.12 10 1.45

Slide 44

Slide 44 text

*O/PSUI"NFSJDB Public betting favorites win approximately 33 percent of all races and finish second 53 percent of the time. Second choices win approximately 21 percent of all races and finish second 42 percent of the time. So the top two choices win 54 percent of the races and finish second 74 percent of the time. You might even want to consider the fact that third choices win approximately 14 percent of all races run over the course of a year. http://www.predictem.com/horse/profit.php

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

Strategy • We humans sometime put too much emphasis on certain factors and ignore others. • That is where data analysis can make a difference.

Slide 47

Slide 47 text

Strategy X

Slide 48

Slide 48 text

Accumulated Payback Sequence of tickets bought from Jan.1 - Dec. 31

Slide 49

Slide 49 text

)JTUPSJDBM3FDPSE 1BZCBDL

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

:FBS 1SPpU :FO

Slide 53

Slide 53 text

Lessons learned •Find under evaluated horses. •Win rate of strategy X is 8.8%.

Slide 54

Slide 54 text

Next Goal • Use machine learning to tune parameters of strategy X

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

No content