Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Try-R-en

 Try-R-en

Takahiro Sumiya

March 01, 2019
Tweet

More Decks by Takahiro Sumiya

Other Decks in Education

Transcript

  1. F3S Workshop Try R A Statistical Data Analysis Tool Takahiro

    Sumiya Information Media Center, Hiroshima University
  2. ‣ R is a language and environment for statistic computing

    and graphics https://www.r-project.org/about.html
  3. What you want to do Algorithm 1: int main(int argc,char

    2: int i=0; 3: char c; 4: while(i==0){ 5: c=getchar(); Program Do it! language environment
  4. Agenda 1. Preparation (Installing Studio) 2. Invoking command and find

    the result 3. Importing data from Excel 4. Summarizing data 5. Visualizing data shape 6. Try a few statistical methods 4
  5. Rͷ४උ ‣ R ✓ Engine of R ✓ Include minimum

    environment ‣ RStudio ✓ More convenience environment ✓ For your daily use ✓ It requires Engine of R 6 ʴ
  6. RͷΠϯετʔϧ 7 https://www.r-project.org/about.html Select mirror site in Japanese CRAN page

    Download R for (Mac) OS X → R-3.5.1.pkg Download R for Windows → Install R for the first time ɹɹɹɹɹɹɹɹɹɹˠ Download R 3.5.1 for Windows
  7. RStudio ͷΠϯετʔϧ 8 https://www.rstudio.com at the bottom RStudio 1.1.463 -

    Windows Vista/7/8/10 RStudio 1.1.463 - Mac OS X 10.6+ (64-bit)
  8. R is a calculator ‣ Try to input mathematical formula

    > 1+1 > 100/3 > 100/3*3 ‣ You can use parenthesis > (3+5)*7 > 3+5*7 # check the result 12
  9. Save R commands to a File ‣ You can bind

    several files in "Project" ✓ Command file (script file) ✓ Data file ✓ Visualization ‣ File → New Project ɹɹ → New Directory → New Project 13
  10. R is a high level calculator ‣ Power > 2^16

    ‣ Functions > sin(pi/2) # > exp(1) # > factorial(10) # > choose(5,2) # 15 sin ⇡ 2 10! 5C2 e1
  11. Try graph > plot(c(5,5,4,3,3,4,1,1,1)) > x=c(5,4,3,3,1,4,1,1,1) # variable definition >

    plot(x) # simple! > plot(x,type="b") # What is this data? > plot(x,type="b",ylim=c(6,1)) > yr=2010:2018 > plot(yr,x,type="b",ylim=c(6,1)) 16
  12. In R, variable (object) is vector > x # Just

    type name of variable > x+10 # Check the result > x+c(10,100) > x[1] # The first value of vector > x[c(1,3,5)] # the 1st, 3rd, 5th values 17
  13. Close and re-open it ‣ Save script file ‣ Quit

    RStudio ‣ Check the script file in folder ‣ Re-open it by double-clicking the project file 18
  14. Import Excel data with CSV ‣ Download "carp-e.xlsx" from Bb9

    ‣ Open it with Excel and check it ‣ Save as "CSV (UTF-8)" → carp.csv ‣ Save "CSV" on Windows 20
  15. "Data frame" > read.csv("carp.csv") # just print > c=read.csv("carp.csv") #

    save in "c" > c # print the content ‣ You can bind several vectors with name in "data frame" > c$height # vector with name "height" > c$height[3] # third value of the vector > mean(c$height) # average of "height" 21
  16. "Data frame" (cont.) > c$BMI=c$weight/(c$weight/100)^2 #Calcurate Body Mass Index >

    c[c$BMI>30,] # Players whose BMI is lager than 30 22 BMI = Weight(kg) Height(m) 2
  17. Advance 1: You can read Excel file directly ‣ install.packages("readxl")

    ‣ library(readxl) ‣ carp=read_excel("carp.xlsx",1) ‣ dragons=read_excel("carp.xlsx",2) 23
  18. At first, try summary() ✓ summary(c) ‣ Numerical data ✓

    Min(Minimum value),1st Qu.(the 1st quarter), Median, Mean, 3rd Qu. (the 3rd quarter), Max(Maximum value) ‣ Categorical data ✓ Frequency 25
  19. Summary of data group > # summary of players whose

    BMI is lager than 26 > summary(c[c$BMI>26,]) > # summary of height by position > tapply(c$height,c$position,summary) 26
  20. On Mac, you need to set font to use Japanese

    in graphics > # ヒラギノ角ゴシックをW3使うように指定 > par(family = "HiraKakuProN-W3") 28
  21. Histgram, scatter plot, box-wisker plot > hist(c$height) # histgram >

    barplot(table(c$position)) > plot(c$height,c$weight) # scatter plot > plot(c) # scatterplot matrix > plot(c[,c(4,5,6,9)]) > boxplot(c$height) # box-wicker plot > boxplot(c$height[c$position=="pitcher"],  c$height[c$position!="pitcher"], names=c("pitcher","other"),ylab="height") 29
  22. Regarding baseball player, is lefty ratio significantly large? Put the

    usual ratio to 0.1, try the binomial test. > summary(c$throwing) > binom.test(13,13+56,0.1) > cp=c[c$position=="pitcher",] > summary(cp$throwing) > binom.test(11,11+23,0.1) 35
  23. The advantage of R ‣ Designed for statistic calculation ‣

    Beautiful graphics ‣ Operations are recorded as script (=text), so it will be re-played easily ‣ New statistical methods are going to be implemented on R ‣ It's free! Open source! 36