Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WebサービスのディレクションとR

 WebサービスのディレクションとR

第2回ペパボテックカンファレンス #pbtech

Hiroka Zaitsu

July 04, 2015
Tweet

More Decks by Hiroka Zaitsu

Other Decks in Technology

Transcript

  1. ΋͘͡ > σΟϨΫλʔͷ͜ͱ > Rͷಛ௃ > RͷϥΠϒϥϦ > ·ͱΊ ※

    ࠓ೔ͷ಺༰Ͱ͸౷ܭతͳख๏ʹ͸͋·Γ৮Ε·ͤΜ 
  2. Rͷಛ௃ > ౷ܭ/σʔλղੳ޲͚ݴޠ > ղੳ޲͚ʹಛԽͨ͠σʔλૢ࡞ > ॊೈͳܕͱߏ଄ / ܽଛ΍ඇ਺஋ͳͲ >

    ๛෋ͳ૊ΈࠐΈؔ਺ > ࡞ਤػೳʹΑΔՄࢹԽ > ύοέʔδ։ൃ΋׆ൃ 
  3. RStudio  > Rͷ౷߹։ൃ؀ڥ > ϓϩδΣΫτ؅ཧ / ίʔυΤσΟλ / ϑΝΠϥͳͲͷجຊػೳ

    > ࡞ਤͨ͠ը૾ͷදࣔ > Git΍SubversionͰόʔδϣϯ؅ཧ > RMarkdownͰϨϙʔτΛॻ͘ > shinyͰΞϓϦέʔγϣϯΛ࡞Δʢޙड़͠·͢ʣ
  4. iris > iris Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1

    3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa 7 4.6 3.4 1.4 0.3 setosa 8 5.0 3.4 1.5 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 4.9 3.1 1.5 0.1 setosa 11 5.4 3.7 1.5 0.2 setosa 12 4.8 3.4 1.6 0.2 setosa 13 4.8 3.0 1.4 0.1 setosa 14 4.3 3.0 1.1 0.1 setosa 15 5.8 4.0 1.2 0.2 setosa …  > ΞϠϝ3छྨ50ݸମͣͭͷ͕͘ͱՖหͦΕͧΕͷ௕͞ͱ෯
  5. ॊೈͳܕͱߏ଄ > str(iris) 'data.frame': 150 obs. of 5 variables: $

    Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...  > ϕΫτϧ / Ϧετ / σʔλϑϨʔϜ … > ೖΕࢠߏ଄΋ՄೳʮϦετͷதʹϦετʯʮϦετͷதʹσʔλϑϨʔϜʯ > ਺஋ܕ / จࣈྻܕ / ࿦ཧܕ / Ҽࢠܕ …ɹ࣮ࡍͷσʔλߏ଄ʹଈͨ͠දݱ
  6. ܽଛ஋΍ඇ਺஋ > enq1 q1 q2 1 ݘ Φε 2 ೣ

    ϝε 3 ແճ౴ Φε > str(enq1) 'data.frame': 3 obs. of 2 variables: $ q1: Factor w/ 3 levels "ݘ","ೣ","ແճ౴": 1 2 3 $ q2: Factor w/ 2 levels "Φε","ϝε": 1 2 1  > ར༻ྫʣϢʔβʔΞϯέʔτ > ແճ౴ͱ͍͏ճ౴͸ແ͍
  7. ܽଛ஋΍ඇ਺஋ > enq1 q1 q2 1 ݘ Φε 2 ೣ

    ϝε 3 <NA> Φε > str(enq1) ‘data.frame’: 3 obs. of 2 variables: $ q1: Factor w/ 2 levels "ݘ","ೣ": 1 2 NA $ q2: Factor w/ 2 levels "Φε","ϝε": 1 2 1  > NAʢNot Availableʣ > ଞʹ΋ʮ0ׂΔ0 = NaNʢඇ਺ʣʯͳͲ
  8. ૊ΈࠐΈؔ਺ > summary(iris) # irisͷجຊ౷ܭྔ Sepal.Length Sepal.Width Petal.Length Petal.Width Species

    Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50 Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 > var(iris[iris$Species == “setosa",]$Sepal.Length) # setosaछͷ͕͘ͷ௕͞ͷ෼ࢄ [1] 0.124249 > sd(iris[iris$Species == “setosa",]$Sepal.Length) # setosaछͷ͕͘ͷ௕͞ͷඪ४ภࠩ [1] 0.3524897  > σʔλͷཁ໿ > ֤छ౷ܭॲཧ
  9. ૊ΈࠐΈؔ਺ > setosa <- iris[iris$Species == “setosa",]$Sepal.Width > virginica <-

    iris[iris$Species == “virginica",]$Sepal.Width > summary(setosa) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.300 3.200 3.400 3.428 3.675 4.400 > summary(virginica) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.200 2.800 3.000 2.974 3.175 3.800  > setosaछͱvirginicaछͷ͕͘ͷ௕͞ͷฏۉ஋ͷࠩ > ʮͬͪ͜ͷํ͕௕͍Έ͍ͨʯͷݕূɹɹར༻ྫʣA/BςετͳͲ
  10. ૊ΈࠐΈؔ਺ > t.test(setosa, virginica, var.equal = TRUE) # ฏۉ஋ͷࠩͷݕఆ Two

    Sample t-test data: setosa and virginica t = 6.4503, df = 98, p-value = 4.246e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.3143257 0.5936743 sample estimates: mean of x mean of y 3.428 2.974  ※ લఏ৚݅ͷݕఆ͕ඞཁ
  11. RMySQLͰαʔϏεDB઀ଓ > library(RMySQL) > dbConnector <- dbConnect(dbDriver(“MySQL”), dbname = “hoge”

    …) > query <- “SELECT * FROM table” > result <- dbGetQuery(dbConnector, query) > dbDisconnect(dbconnector)  > ݁Ռ͸σʔλϑϨʔϜͰฦ٫͞ΕΔ > ͙͢ʹ౷ܭղੳʹճͨ͠Γ > όονͱͯ͠ຖ೔౤͛ͨΓ
  12. RGoogleAnalytics > library(RGoogleAnalytics) # ΫΤϦΛఆٛ > query$Init(start.date = “2015-01-01”, end.date

    = “2015-06-30”, dimensions = "ga:date", metrics = “ga:visitors,ga:pageviews”, sort = “ga:date”, segment= "gaid::quux", max.results = 10000, table.id = “ga:hoge", access_token = query$authorize()) # σʔλΛऔಘ > df.ga <- ga$GetReportData(query)  > ಛఆͷϩδοΫͰᮢ஋ΛఆΊͯ؂ࢹ
  13. rvestͰެ։σʔλऔಘ library(rvest) url <- "http://www.tripadvisor.com/Hotel_Review-g37209-d1762915-Reviews- JW_Marriott_Indianapolis-Indianapolis_Indiana.html" reviews <- url %>%

    read_html() %>% html_nodes("#REVIEWS .innerBubble") rating <- reviews %>% html_node(".rating .rating_s_fill") %>% html_attr("alt") %>% gsub(" of 5 stars", "", .) %>% as.integer() review <- reviews %>% html_node(".entry .partial_entry") %>% html_text() data.frame(quote, rating, review, stringsAsFactors = FALSE) %>% View()  https://github.com/hadley/rvest/blob/master/demo/tripadvisor.R
  14. ShinyͰΞϓϦέʔγϣϯ࡞੒ > σʔλղੳͰݟग़ͨ͠࿮૊ΈΛಈతͳɹɹɹɹɹɹ ΞϓϦέʔγϣϯʹ > HTML / CSS / JSෆཁ

    > Shiny-ServerͰWebΞϓϦέʔγϣϯԽ΋Մೳɹɹɹ R΍౷ܭͷ஌ࣝ΋ෆཁͳͷͰνʔϜར༻ʹศར 
  15. irisͷΫϥελʔ෼ੳ selectedData <- iris[,c("Sepal.Length", "Sepal.Width")] str(selectedData) head(selectedData, 10) clusters <-

    kmeans(selectedData, 3) str(clusters) par(mar = c(5.1, 4.1, 0, 1)) # ϚʔδϯΛઃఆ plot(selectedData, # બ୒ͨ͠σʔλΛϓϩοτ col = clusters$cluster, # ৭ΛΫϥελຖʹׂΓ౰ͯ pch = 20, # ృΓͭͿؙ͠ҹͰϓϩοτ cex = 3) # จࣈͷ֦େ཰Λࢦఆ points(clusters$centers, pch = 4, cex = 4, lwd = 4) # ઢ෼ͷ෯Λ4ഒʹ 
  16. ΞϓϦέʔγϣϯԽ # ui.R shinyUI(pageWithSidebar( headerPanel('Iris k-means clustering'), sidebarPanel( selectInput('xcol', 'X

    Variable', names(iris)), selectInput('ycol', 'Y Variable', names(iris), selected=names(iris)[[2]]), numericInput('clusters', 'Cluster count', 3, min = 1, max = 9) ), mainPanel( plotOutput('plot1') ) )) 
  17. ΞϓϦέʔγϣϯԽ # Server.R shinyServer(function(input, output, session) { # Combine the

    selected variables into a new data frame selectedData <- reactive({ iris[, c(input$xcol, input$ycol)] }) clusters <- reactive({ kmeans(selectedData(), input$clusters) }) output$plot1 <- renderPlot({ par(mar = c(5.1, 4.1, 0, 1)) plot(selectedData(), col = clusters()$cluster, pch = 20, cex = 3) points(clusters()$centers, pch = 4, cex = 4, lwd = 4) }) })