Slide 1

Slide 1 text

F3S Workshop Try R A Statistical Data Analysis Tool Takahiro Sumiya Information Media Center, Hiroshima University

Slide 2

Slide 2 text

‣ R is a language and environment for statistic computing and graphics https://www.r-project.org/about.html

Slide 3

Slide 3 text

What you want to do Algorithm 1: int main(int argc,char 2: int i=0; 3: char c; 4: while(i==0){ 5: c=getchar(); Program Do it! language environment

Slide 4

Slide 4 text

Agenda 1. Preparation (Installing Studio) 2. Invoking command and find the result 3. Importing data from Excel 4. Summarizing data 5. Visualizing data shape 6. Try a few statistical methods 4

Slide 5

Slide 5 text

1. Preparation

Slide 6

Slide 6 text

Rͷ४උ ‣ R ✓ Engine of R ✓ Include minimum environment ‣ RStudio ✓ More convenience environment ✓ For your daily use ✓ It requires Engine of R 6 ʴ

Slide 7

Slide 7 text

RͷΠϯετʔϧ 7 https://www.r-project.org/about.html Select mirror site in Japanese CRAN page Download R for (Mac) OS X → R-3.5.1.pkg Download R for Windows → Install R for the first time ɹɹɹɹɹɹɹɹɹɹˠ Download R 3.5.1 for Windows

Slide 8

Slide 8 text

RStudio ͷΠϯετʔϧ 8 https://www.rstudio.com at the bottom RStudio 1.1.463 - Windows Vista/7/8/10 RStudio 1.1.463 - Mac OS X 10.6+ (64-bit)

Slide 9

Slide 9 text

Launch RStudio 9 Check version

Slide 10

Slide 10 text

2. Invoking commands and Find the results

Slide 11

Slide 11 text

Input here

Slide 12

Slide 12 text

R is a calculator ‣ Try to input mathematical formula > 1+1 > 100/3 > 100/3*3 ‣ You can use parenthesis > (3+5)*7 > 3+5*7 # check the result 12

Slide 13

Slide 13 text

Save R commands to a File ‣ You can bind several files in "Project" ✓ Command file (script file) ✓ Data file ✓ Visualization ‣ File → New Project ɹɹ → New Directory → New Project 13

Slide 14

Slide 14 text

Save R commands to a File ‣ File→New File→R Script 14 Type and ctl-enter Results

Slide 15

Slide 15 text

R is a high level calculator ‣ Power > 2^16 ‣ Functions > sin(pi/2) # > exp(1) # > factorial(10) # > choose(5,2) # 15 sin ⇡ 2 10! 5C2 e1

Slide 16

Slide 16 text

Try graph > plot(c(5,5,4,3,3,4,1,1,1)) > x=c(5,4,3,3,1,4,1,1,1) # variable definition > plot(x) # simple! > plot(x,type="b") # What is this data? > plot(x,type="b",ylim=c(6,1)) > yr=2010:2018 > plot(yr,x,type="b",ylim=c(6,1)) 16

Slide 17

Slide 17 text

In R, variable (object) is vector > x # Just type name of variable > x+10 # Check the result > x+c(10,100) > x[1] # The first value of vector > x[c(1,3,5)] # the 1st, 3rd, 5th values 17

Slide 18

Slide 18 text

Close and re-open it ‣ Save script file ‣ Quit RStudio ‣ Check the script file in folder ‣ Re-open it by double-clicking the project file 18

Slide 19

Slide 19 text

3. Importing data from Excel

Slide 20

Slide 20 text

Import Excel data with CSV ‣ Download "carp-e.xlsx" from Bb9 ‣ Open it with Excel and check it ‣ Save as "CSV (UTF-8)" → carp.csv ‣ Save "CSV" on Windows 20

Slide 21

Slide 21 text

"Data frame" > read.csv("carp.csv") # just print > c=read.csv("carp.csv") # save in "c" > c # print the content ‣ You can bind several vectors with name in "data frame" > c$height # vector with name "height" > c$height[3] # third value of the vector > mean(c$height) # average of "height" 21

Slide 22

Slide 22 text

"Data frame" (cont.) > c$BMI=c$weight/(c$weight/100)^2 #Calcurate Body Mass Index > c[c$BMI>30,] # Players whose BMI is lager than 30 22 BMI = Weight(kg) Height(m) 2

Slide 23

Slide 23 text

Advance 1: You can read Excel file directly ‣ install.packages("readxl") ‣ library(readxl) ‣ carp=read_excel("carp.xlsx",1) ‣ dragons=read_excel("carp.xlsx",2) 23

Slide 24

Slide 24 text

4. Summarizing data

Slide 25

Slide 25 text

At first, try summary() ✓ summary(c) ‣ Numerical data ✓ Min(Minimum value),1st Qu.(the 1st quarter), Median, Mean, 3rd Qu. (the 3rd quarter), Max(Maximum value) ‣ Categorical data ✓ Frequency 25

Slide 26

Slide 26 text

Summary of data group > # summary of players whose BMI is lager than 26 > summary(c[c$BMI>26,]) > # summary of height by position > tapply(c$height,c$position,summary) 26

Slide 27

Slide 27 text

5. Visualizing data shape

Slide 28

Slide 28 text

On Mac, you need to set font to use Japanese in graphics > # ヒラギノ角ゴシックをW3使うように指定 > par(family = "HiraKakuProN-W3") 28

Slide 29

Slide 29 text

Histgram, scatter plot, box-wisker plot > hist(c$height) # histgram > barplot(table(c$position)) > plot(c$height,c$weight) # scatter plot > plot(c) # scatterplot matrix > plot(c[,c(4,5,6,9)]) > boxplot(c$height) # box-wicker plot > boxplot(c$height[c$position=="pitcher"],  c$height[c$position!="pitcher"], names=c("pitcher","other"),ylab="height") 29

Slide 30

Slide 30 text

Advance 2: Package for psychology > install.packages("psych") > library(psych) > pairs.panels(c[,c(4,5,6,9)]) 30

Slide 31

Slide 31 text

How to draw? Google it! (1) 31

Slide 32

Slide 32 text

How to draw? Google it! (2) 32

Slide 33

Slide 33 text

6. Try a few statistical methods

Slide 34

Slide 34 text

Linear regression > plot(c$weight ~ c$height) > result=lm(c$weight ~ c$height) > abline(result) > summary(result) 34

Slide 35

Slide 35 text

Regarding baseball player, is lefty ratio significantly large? Put the usual ratio to 0.1, try the binomial test. > summary(c$throwing) > binom.test(13,13+56,0.1) > cp=c[c$position=="pitcher",] > summary(cp$throwing) > binom.test(11,11+23,0.1) 35

Slide 36

Slide 36 text

The advantage of R ‣ Designed for statistic calculation ‣ Beautiful graphics ‣ Operations are recorded as script (=text), so it will be re-played easily ‣ New statistical methods are going to be implemented on R ‣ It's free! Open source! 36