Upgrade to Pro — share decks privately, control downloads, hide ads and more …

R for Pirates

Avatar for Mandi Walls Mandi Walls
November 21, 2011

R for Pirates

My talk on some very basic R stuff for ESCConf in Boston, October, 2011

Avatar for Mandi Walls

Mandi Walls

November 21, 2011
Tweet

More Decks by Mandi Walls

Other Decks in Programming

Transcript

  1. whoami • stats misfit • R tinkerer • large-farm runner

    • not a professional statistician :D Monday, October 24, 2011
  2. What is R • Scripting language for stats work •

    Inspired by earlier S (for statistics) developed at AT&T • FOSS • Syntax inherits through Algol family, so looks somewhat like C/C++ Monday, October 24, 2011
  3. What Does R Do? • Manipulate data • Complex Modeling

    and Computation • Graphics and Visualization Monday, October 24, 2011
  4. But Other Math Stuff! • Mathematica • Minitab • MAPLE

    • Excel (yes. shutup h8rs. ask your CFOs what they use) • R provides sophisticated statistical and modeling capabilities, and is extendible through your own code Monday, October 24, 2011
  5. Fire! • R console on Mac • Interactive interpreter for

    your R needs • Can also run from the command line: R Monday, October 24, 2011
  6. R Basics • R considers all elements to be vectors

    • A single number is a one-element vector • Use <- for assignment • Use c() to concatenate values into a vector Monday, October 24, 2011
  7. Functions • Looks familiar! • Let’s see one! • “evencount”

    counts the number of even ints in a vector Monday, October 24, 2011
  8. Datatypes • Vectors, the important ones • Scalars are really

    single-element vectors • Character strings • Matrices, rectangular arrays of numbers • Lists • Tables, useful for data transitions and temp work Monday, October 24, 2011
  9. Vectors • R’s most-used data structure • All elements in

    a vector must have the same mode or data type • To add values to a vector, you concatenate into it with the c() function • Many mathematical functions can be performed on a vector, they can also be traversed like arrays • Index starts at 1, not 0! Monday, October 24, 2011
  10. Scalars • One-element vectors > x <- 8 > x[1]

    [1] 8 • also climb your rigging ©Disney. Monday, October 24, 2011
  11. Character Strings • Single-element vectors with mode character > y

    <- "abc" > length(y) [1] 1 > mode(y) [1] "character" • Can do normal string things, like > t <- paste("yo","dawg") > t [1] "yo dawg" > u <- strsplit(t,"") > u [[1]] [1] "y" "o" " " "d" "a" "w" "g" Monday, October 24, 2011
  12. Matrices • Two-dimensional array > m <- rbind(c(1,4),c(2,2)) > m

    [,1] [,2] [1,] 1 4 [2,] 2 2 > m[1,2] [1] 4 > m[1,] [1] 1 4 Monday, October 24, 2011
  13. Lists • Contain elements of different types • Have a

    particular syntax > x <- list(u=2, v="abc") > x $u [1] 2 $v [1] "abc" > x$u [1] 2 Monday, October 24, 2011
  14. Data Frames • Matrices are limited to only a single

    type for all elements • A data frame can contain different types of data, can be read in from a file or created in realtime > df <- data.frame(list(kids=c("Olivia","Madison"),ages=c(10,8))) > df kids ages 1 Olivia 10 2 Madison 8 > df$ages [1] 10 8 Monday, October 24, 2011
  15. Putting R to Work • Read in a log file:

    access <- read.table("access.log", header=FALSE) > head(access) V1 V2 V3 V4 V5 V6 V7 V8 1 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 401 401 2 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 200 1970 3 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.css HTTP/1.1 200 2258 Monday, October 24, 2011
  16. Fun with Plots • This plot series is going to

    make use of the “return codes” from the access log • We’ll do a series of plots that gradually get more sophisticated • This is a basic histogram of the data, it’s not much fun Monday, October 24, 2011
  17. Writing Graphical Output to Files • Set up the output

    target by calling a graphics function: • pdf(), png(), jpeg(), etc • jpeg(“/var/www/images/returncodes-date.jpg”) • Call the plot function you have chosen, then call dev.off() • Can be used in batch mode to create graphics from your data Monday, October 24, 2011
  18. Shopping is Hard, Let’s Do Math • Read in some

    load averages (one-min) loadavg<-read.table("load_avg.txt") head(loadavg) V1 1 3.79 2 3.11 3 2.94 4 4.81 Monday, October 24, 2011
  19. Summary Stats • Summarize the data with one function call

    • Gives the min, max, mean, median, and quartiles summary(loadavg) V1 Min. :0.760 1st Qu.:1.390 Median :1.970 Mean :2.302 3rd Qu.:3.080 Max. :5.070 Monday, October 24, 2011
  20. Same Thing, 3 Datacenters > cpu<-read.table("cpu") > head(cpu) V1 V2

    1 3.78 smq 2 2.57 smq 3 3.69 smq 4 0.86 smq • Looks like there’s outliers. That could spell trouble! You found them with R awesomeness. Horay! boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by Datacenter", ylab="One-Minute Load Average", main="Box Plot of One-Minute Load Average, FEs", col=topo.colors(3)) Monday, October 24, 2011
  21. Running R in Your Workflow • The little bit of

    boxplotting we did eariler, in a script: [mandi@mandi ~]$ cat sample.R #!/usr/bin/env Rscript cpu<-read.table("cpu") jpeg("./sample.jpg") boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by Datacenter", ylab="One-Minute Load Average", main="Box Plot of One-Minute Load Average, FEs", col=heat.colors(3)) dev.off() [mandi@mandi ~]$ Rscript sample.R > /dev/null [mandi@mandi ~]$ ls -l sample.jpg -rw-rw-r-- 1 mandi staff 20137 Oct 24 20:44 sample.jpg Monday, October 24, 2011
  22. What Else? • R can read data input from a

    variety of files with regular formats • R can also fetch data from the internet using the url() function • R has a number of functions available for dealing with reading data, creating data frames or other structures, and converting string text into numerical data modes • Extended packages provide support for structured data formats like JSON. Monday, October 24, 2011
  23. References • http://www.slideshare.net/dataspora/an-interactive-introduction-to-r-programming-language-for- statistics • http://www.harding.edu/fmccown/R/ • Art of R

    Programming, Norman Matloff, Copyright 2011 No Starch Press • Statistical Analysis with R, John M. Quick, Copyright 2011 Packt Publishing Monday, October 24, 2011