Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting started with R

5779dc3bf8c8a0c9dadb7ff95c67e9e3?s=47 Corey Chivers
February 01, 2013
180

Getting started with R

5779dc3bf8c8a0c9dadb7ff95c67e9e3?s=128

Corey Chivers

February 01, 2013
Tweet

Transcript

  1. http://zerotorhero.wordpress.com/

  2. 2 Why R ? • It's Free .. • “as

    in free beer” • “as in free speech” • use it for any purpose. • give copies to your friends & neighbours. • improve it and release improvements publicly.
  3. Tables Data Graphs Statistics Understanding Sigmaplot Excel SAS

  4. Tables Data Graphs Statistics Understanding

  5. 5 Why is R so hard to learn? • R

    is command-driven • R will not tell you what to do, or guide you through the steps of an analysis or method. • R will do all the calculations for you, and it will do exactly what you tell it (not necessarily what you want). • R has the flexibility and power to do exactly what you want, exactly how you want it done.
  6. 6 Learning Objectives • Open R(studio) for the first time

    • Navigate the R(studio) interface • Enter commands • input & output • common functions • Control stuctures • Use technical terms for R concepts • Get Help
  7. 7 Challenges • Throughout the workshop, you will be presented

    with a series of challenges. • Collaborate with your neighbour when the going gets tough!
  8. 8 Challenge 1 Open R-Studio

  9. 9 The Console

  10. Output (results) Input (commands) The R Console 10 Text in

    the R console typically looks like: > Input (commands) [1] output I will represent these as:
  11. [1] 2 1 + 1 R is a calculator 11

    2 * 2 [1] 4 2 ^ 3 [1] 8 10 - 1 [1] 9 8 / 2 [1] 4 sqrt(9) [1] 3 • Expressions are evaluated, and the result is returned (sometimes invisibly).
  12. Challenge • Use R to answer the following skill testing

    question: 2 + 16 x 24 – 56 / (2+1) - 457
  13. 13 R command-line tip • Use the ▲▼ arrow keys

    to re-produce previous commands • This lets you scroll through your command history
  14. • You can store values (objects) in symbolic variables (names)

    using an assignment operator • Variable names can include: • letters a-z A-Z • numbers 0-9 • periods . • underscores _ • Variable names should begin with a letter A <- 10 B <- 10*10 A_log <- log(A) B.seq <- 1:B <- assign the value on the right to the name on the left Objects
  15. Challenge Put your answer to the skill testing question into

    an object with a name of your choice.
  16. • When a variable name is evaluated, it returns the

    stored value. A [1] 10 Retrieve values B [1] 100 A_log [1] 2.302585 x [1] 3 B.seq [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 [22] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 [43] 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 [64] 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 [85] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
  17. • The most basic kind of object in R is

    a vector • Think of a vector as a list of related values (data) • A single value is an "atomic vector" (vector with a length of 1) [1] 2 1:10 Vectors index: the item number value (result) [1] 1 2 3 4 5 6 7 8 9 10
  18. • You can make a vector using the c() command:

    • Vectors can be used in a plot. • You can access an element of a vector by its index my_fav_nums<-c(1, 4, 10, 444, 42) Vectors plot(1:5, my_fav_nums) my_fav_nums[3] [1] 10
  19. • Vectors can be used in calculations • Operations are

    executed on each item my_fav_nums+20 my_fav_nums/2 +1 sqrt(my_fav_nums) mean(my_fav_nums) sum(my_fav_nums) Vectors
  20. R command-line tip • Use the Tab key to auto-

    complete • This helps you avoid spelling errors and speeds up command entering.
  21. [1] 15 A +5 Use variables in calculations [1] 20.76125

    22.22222 22.26562 24.93075 32.87197 19.94460 B/A Weight <- c(60 , 72 , 57 , 90 , 95 , 72 ) Height <- c(1.7, 1.8, 1.6, 1.9, 1.7, 1.9) BMI <- Weight/Height^2 BMI [1] 10 plot(Height,Weight)
  22. Challenge What is the sum of the square of all

    of the integers between 1 and 100. Hint: remember counting from x to y can be done with x:y.
  23. class(1.23) class('hello') class("1.23") class(FALSE) typeof(1.23) typeof(1:10) as.character(c(1,2,NA,3)) Some other data

    types • character (string) • in single ' or double " quotes. > 'hello world' > "1.23" • logical • TRUE or FALSE converting from one type to another = "coercion"
  24. • Some words and letters already have values in R

    and should never be used as variable names pi [1] 3.141593 Built-in variables letters [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" [14] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" LETTERS [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" [14] "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
  25. 25 •Some words and letters already have special meaning in

    the R language (keywords) and should never be used as variable names Reserved words NA "Not Available" (unknown or missing data) NaN "Not a Number" (undefined numeric values) NULL a special object (missing objects) Inf Infiniti TRUE Logical value FALSE Logical value T short for TRUE F short for FALSE c,q,t,C,D,I R functions diff, df, pt R functions
  26. ls() List all variables you have created rm(x) Remove the

    variable ‘x’ from memory rm(list=ls()) Remove all variables from memory (clear memory) Housekeeping
  27. 27 Comparisons • Comparison of 2 values results in logical

    values: TRUE or FALSE == "equal": Note the two equals signs. Not to be confused with a single equals sign (used to assign values). != "not equal" > "greater than" < "less than" >= "greater than or equal to" <= "less than or equal to"
  28. 28 Challenge Is 3.1415929 greater than, less than, or equal

    to π?
  29. • When R is given a command it does not

    understand, or cannot execute, it outputs an error to the console. Error in 1 + "2" : non-numeric argument to binary operator Fail <- 1 + "2" Errors Error: object 'fail' not found Fail
  30. • If a command does not work exactly as R

    (or the developers) think is "ideal", it may produce a warning instead. • Use the warnings() command to review them. Warning message:In log(-1) : NaNs produced oops <- log(-1) Warnings
  31. • Takes in arguments and returns a value. • To

    use a function (call), the command must be structured properly, following the "grammar rules" of the R language (syntax) log( 8 , base = 2 ) Functions function name
  32. • Takes in arguments and returns a value. • To

    use a function (call), the command must be structured properly, following the "grammar rules" of the R language (syntax) log( 8 , base = 2 ) Functions function name parentheses no space
  33. • Takes in arguments and returns a value. • To

    use a function (call), the command must be structured properly, following the "grammar rules" of the R language (syntax) log( 8 , base = 2 ) Functions function name argument 1 parentheses no space
  34. • Takes in arguments and returns a value. • To

    use a function (call), the command must be structured properly, following the "grammar rules" of the R language (syntax) log( 8 , base = 2 ) Functions function name argument 1 comma parentheses no space
  35. • Takes in arguments and returns a value. • To

    use a function (call), the command must be structured properly, following the "grammar rules" of the R language (syntax) log( 8 , base = 2 ) Functions function name argument 1 comma parentheses argument 2 no space
  36. 36 • Arguments are the values passed to a function

    when it is called • Arguments are values and instructions the function needs to do its thing Ex: Arguments x<-1:10 y<-sin(x) plot(x,y,type=‘l’) arguments
  37. Common & useful functions 37 sqrt log exp min max

    sum mean sd var summary plot par paste format head length str names typeof class attributes library ls rm setwd getwd file.choose (Mac) choose.file (PC) c seq rep tapply aggregate merge cbind rbind unique help ? help.search ?? help.start
  38. How do I use a new function? What arguments will

    it take? Use ?function ! What does it do? For example: ?seq
  39. function name package long name (title) arguments Details on how

    the function works
  40. Publications that describe the function (relevant theory & concepts) copy

    & paste Examples into the console to see the function in action. Try to modify the example code to do what you want. Va lu e returned You can also use example(seq) in the console to run all the example code in this section. Details
  41. Challenge 1) Create an unsorted vector of your favourite numbers.

    2) Find out how to sort it using ?sort. 3) Sort your vector in forward and in reverse order. 4) Put your sorted vectors into new objects.
  42. 42 HELP Books

  43. 43 HELP Web Sites • R web site: r-project.org •

    Start here: especially the Documentation section • r-bloggers.com • stackoverflow.com • Google search ...
  44. 44 Object types an object is a way of packaging

    information in R vector a combination of values, of the same type. list a combination of different types of values (or even objects) data frame a collection of vectors of the same length (# rows) columns = “variables” ; rows = “cases”
  45. load a built-in data file peek at first few rows

    structure of the object names of items in the object attributes of the object summary statistics plot of all variable combinations data(CO2) head(CO2) str(CO2) names(CO2) attributes(CO2) summary(CO2) plot(CO2) Working with a data frame
  46. • You can refer to parts of an data frame

    object by their index or name (if they have one) CO2$Treatment Indexing CO2[1:6,3] object name r ow s (dim. 1) colu m n s (dim. 2) object name $ operator column name
  47. Indexing names(CO2) CO2$Treatment CO2[,3] CO2[3,] CO2[1:6,] CO2[c(1,2,3,4,5,6),3] CO2$Treatment[1:6] CO2[CO2$conc>100,] CO2[CO2$Treatment=="chilled",]

    CO2[sample(nrow(CO2), 10),] available names "Treatment" column all rows, column 3 row 3, all columns rows 1-6, all columns rows 1-6, column 3 elements 1-6 of Treatment rows where conc > 100 rows where Treatment == “chilled" 10 random rows
  48. Challenge 1) What is the mean uptake of all plants

    in the non-chilled treatment? 2) What is the variance in uptake for plant ‘Mc3’?
  49. • R is a full fledged programming language. It can

    do loops and conditionals. i <- runif(1,0,100) if(i <= 50) { print(‘i is pretty small’) } Control Structures for(i in 1:100) { print(i) }
  50. Challenge 1) Print out all the numbers between 1 and

    100, but if the number is a multiple of 3, print ‘fizz’ instead. 2) Extend your code so that for multiples of 5, it prints out ‘buzz’. HINT: To get the remainder of integer division, use %%. Ex: 4 9%%5
  51. 51 Installing packages • In addition to all of the

    base functions in R, you can install additional packages to do specialized statistics and plotting. • Currently, the CRAN package repository features 4276 available packages. • http://cran.r-project.org/web/packages/
  52. The library() function loads the package, making its functions accessible.

    install.packages(‘ggplot2’) Installing packages library(ggplot2)
  53. demo(graphics) demo(image) demo(lm.glm) demo() R is a show-off 53 •

    Some plots and graphs that can be made using R • images and other graphics made using R • a demonstration of linear modelling & GLMs • a list of available demos
  54. • 2 players • Start with • Take turns using

    the variable ‘x’ as an argument in a function or expression • Assign the result to the same variable ‘x’ • How long can you keep the chain going without getting errors? x <- 0 Let’s play “Command-R” 54
  55. • Challenges • Change the object type of x into

    a : • vector of multiple items • data frame • Use x in a graph / plot x <- x + 1 x <- x * (x+10) x <- exp(x) x <- 1:x x <- seq(from=x, to=100, by=2) x <- rnorm(x) x <- x[1:3] x <- x[2] x <- data.frame( foo = rnorm(length(x)), x) “Command-R” 55