Save 37% off PRO during our Black Friday Sale! »

WebサービスのディレクションとR

 WebサービスのディレクションとR

第2回ペパボテックカンファレンス #pbtech

10caa9b4d041b23c2ecd6a947fdb1607?s=128

Hiroka Zaitsu

July 04, 2015
Tweet

Transcript

  1. WebαʔϏεͷ σΟϨΫγϣϯͱR GMO Pepabo, Inc. ࡒ௡େՆ 2015.07.04 ୈ2ճϖύϘςοΫΧϯϑΝϨϯε

  2. ࣗݾ঺հ > ࡒ௡େՆʢ@zaimy611ʣ > 2012೥5݄ೖࣾ > ϩϦϙοϓʂ -> minne >

    σΟϨΫλʔ > ࣾձௐࠪͱ౷ܭֶ 
  3. ΋͘͡ > σΟϨΫλʔͷ͜ͱ > Rͷಛ௃ > RͷϥΠϒϥϦ > ·ͱΊ ※

    ࠓ೔ͷ಺༰Ͱ͸౷ܭతͳख๏ʹ͸͋·Γ৮Ε·ͤΜ 
  4. σΟϨΫλʔͷ͜ͱ

  5. σΟϨΫλʔ͕΍ͬͯΔ͜ͱ > αʔϏεͷશମతͳࡒ຿/ඇࡒ຿ࢦඪͷ؅ཧ > αʔϏεαΠτͷσΟϨΫγϣϯ > ΩϟϯϖʔϯͳͲͷاը > Web޿ࠂͷ؅ཧ >

    ޿ใ΍ӦۀͬΆ͍͜ͱɹetc… 
  6. σΟϨΫλʔ͋Δ͋Δ > ѻ͏σʔλ͕৭ʑͳͱ͜ΖʹࢄΒ͹͍ͬͯΔ > ຾͍ͬͯΔେྔͷσʔλΛ࢖͍͍ͨ > ൑அࡐྉΛ૿΍͍ͨ͠ > ൓෮Ͱ͖Δσʔλղੳ͕͍ͨ͠ 

  7. None
  8. Rͷಛ௃

  9. Rͷಛ௃ > ౷ܭ/σʔλղੳ޲͚ݴޠ > ղੳ޲͚ʹಛԽͨ͠σʔλૢ࡞ > ॊೈͳܕͱߏ଄ / ܽଛ΍ඇ਺஋ͳͲ >

    ๛෋ͳ૊ΈࠐΈؔ਺ > ࡞ਤػೳʹΑΔՄࢹԽ > ύοέʔδ։ൃ΋׆ൃ 
  10. RStudio  > Rͷ౷߹։ൃ؀ڥ > ϓϩδΣΫτ؅ཧ / ίʔυΤσΟλ / ϑΝΠϥͳͲͷجຊػೳ

    > ࡞ਤͨ͠ը૾ͷදࣔ > Git΍SubversionͰόʔδϣϯ؅ཧ > RMarkdownͰϨϙʔτΛॻ͘ > shinyͰΞϓϦέʔγϣϯΛ࡞Δʢޙड़͠·͢ʣ
  11. iris > iris Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1

    3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa 7 4.6 3.4 1.4 0.3 setosa 8 5.0 3.4 1.5 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 4.9 3.1 1.5 0.1 setosa 11 5.4 3.7 1.5 0.2 setosa 12 4.8 3.4 1.6 0.2 setosa 13 4.8 3.0 1.4 0.1 setosa 14 4.3 3.0 1.1 0.1 setosa 15 5.8 4.0 1.2 0.2 setosa …  > ΞϠϝ3छྨ50ݸମͣͭͷ͕͘ͱՖหͦΕͧΕͷ௕͞ͱ෯
  12. ॊೈͳܕͱߏ଄ > str(iris) 'data.frame': 150 obs. of 5 variables: $

    Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...  > ϕΫτϧ / Ϧετ / σʔλϑϨʔϜ … > ೖΕࢠߏ଄΋ՄೳʮϦετͷதʹϦετʯʮϦετͷதʹσʔλϑϨʔϜʯ > ਺஋ܕ / จࣈྻܕ / ࿦ཧܕ / Ҽࢠܕ …ɹ࣮ࡍͷσʔλߏ଄ʹଈͨ͠දݱ
  13. ܽଛ஋΍ඇ਺஋ > enq1 q1 q2 1 ݘ Φε 2 ೣ

    ϝε 3 ແճ౴ Φε > str(enq1) 'data.frame': 3 obs. of 2 variables: $ q1: Factor w/ 3 levels "ݘ","ೣ","ແճ౴": 1 2 3 $ q2: Factor w/ 2 levels "Φε","ϝε": 1 2 1  > ར༻ྫʣϢʔβʔΞϯέʔτ > ແճ౴ͱ͍͏ճ౴͸ແ͍
  14. ܽଛ஋΍ඇ਺஋ > enq1 q1 q2 1 ݘ Φε 2 ೣ

    ϝε 3 <NA> Φε > str(enq1) ‘data.frame’: 3 obs. of 2 variables: $ q1: Factor w/ 2 levels "ݘ","ೣ": 1 2 NA $ q2: Factor w/ 2 levels "Φε","ϝε": 1 2 1  > NAʢNot Availableʣ > ଞʹ΋ʮ0ׂΔ0 = NaNʢඇ਺ʣʯͳͲ
  15. ૊ΈࠐΈؔ਺ > summary(iris) # irisͷجຊ౷ܭྔ Sepal.Length Sepal.Width Petal.Length Petal.Width Species

    Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50 Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 > var(iris[iris$Species == “setosa",]$Sepal.Length) # setosaछͷ͕͘ͷ௕͞ͷ෼ࢄ [1] 0.124249 > sd(iris[iris$Species == “setosa",]$Sepal.Length) # setosaछͷ͕͘ͷ௕͞ͷඪ४ภࠩ [1] 0.3524897  > σʔλͷཁ໿ > ֤छ౷ܭॲཧ
  16. ૊ΈࠐΈؔ਺ > setosa <- iris[iris$Species == “setosa",]$Sepal.Width > virginica <-

    iris[iris$Species == “virginica",]$Sepal.Width > summary(setosa) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.300 3.200 3.400 3.428 3.675 4.400 > summary(virginica) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.200 2.800 3.000 2.974 3.175 3.800  > setosaछͱvirginicaछͷ͕͘ͷ௕͞ͷฏۉ஋ͷࠩ > ʮͬͪ͜ͷํ͕௕͍Έ͍ͨʯͷݕূɹɹར༻ྫʣA/BςετͳͲ
  17. ૊ΈࠐΈؔ਺ > t.test(setosa, virginica, var.equal = TRUE) # ฏۉ஋ͷࠩͷݕఆ Two

    Sample t-test data: setosa and virginica t = 6.4503, df = 98, p-value = 4.246e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.3143257 0.5936743 sample estimates: mean of x mean of y 3.428 2.974  ※ લఏ৚݅ͷݕఆ͕ඞཁ
  18. ࡞ਤػೳ > hist(iris$Sepal.Length) > plot(iris) 

  19. RͷϥΠϒϥϦ

  20. ࣮ࡍʹ࢖͍ͬͯΔϥΠϒϥϦ > RMySQL > RGoogleAnalytics > rvest … RͰΫϩʔϥʔ >

    shiny … RͰWebΞϓϦέʔγϣϯ etc… 
  21. RMySQLͰαʔϏεDB઀ଓ > library(RMySQL) > dbConnector <- dbConnect(dbDriver(“MySQL”), dbname = “hoge”

    …) > query <- “SELECT * FROM table” > result <- dbGetQuery(dbConnector, query) > dbDisconnect(dbconnector)  > ݁Ռ͸σʔλϑϨʔϜͰฦ٫͞ΕΔ > ͙͢ʹ౷ܭղੳʹճͨ͠Γ > όονͱͯ͠ຖ೔౤͛ͨΓ
  22. RGoogleAnalytics > library(RGoogleAnalytics) # ΫΤϦΛఆٛ > query$Init(start.date = “2015-01-01”, end.date

    = “2015-06-30”, dimensions = "ga:date", metrics = “ga:visitors,ga:pageviews”, sort = “ga:date”, segment= "gaid::quux", max.results = 10000, table.id = “ga:hoge", access_token = query$authorize()) # σʔλΛऔಘ > df.ga <- ga$GetReportData(query)  > ಛఆͷϩδοΫͰᮢ஋ΛఆΊͯ؂ࢹ
  23. rvestͰެ։σʔλऔಘ library(rvest) url <- "http://www.tripadvisor.com/Hotel_Review-g37209-d1762915-Reviews- JW_Marriott_Indianapolis-Indianapolis_Indiana.html" reviews <- url %>%

    read_html() %>% html_nodes("#REVIEWS .innerBubble") rating <- reviews %>% html_node(".rating .rating_s_fill") %>% html_attr("alt") %>% gsub(" of 5 stars", "", .) %>% as.integer() review <- reviews %>% html_node(".entry .partial_entry") %>% html_text() data.frame(quote, rating, review, stringsAsFactors = FALSE) %>% View()  https://github.com/hadley/rvest/blob/master/demo/tripadvisor.R
  24. ShinyͰΞϓϦέʔγϣϯ࡞੒ > σʔλղੳͰݟग़ͨ͠࿮૊ΈΛಈతͳɹɹɹɹɹɹ ΞϓϦέʔγϣϯʹ > HTML / CSS / JSෆཁ

    > Shiny-ServerͰWebΞϓϦέʔγϣϯԽ΋Մೳɹɹɹ R΍౷ܭͷ஌ࣝ΋ෆཁͳͷͰνʔϜར༻ʹศར 
  25. ShinyͰΞϓϦέʔγϣϯ࡞੒ ᵋ/shiny-app ᵓ ui.R ᵋ server.R  > جຊߏ੒͸͜Ε͚ͩ

  26. irisͷΫϥελʔ෼ੳ selectedData <- iris[,c("Sepal.Length", "Sepal.Width")] str(selectedData) head(selectedData, 10) clusters <-

    kmeans(selectedData, 3) str(clusters) par(mar = c(5.1, 4.1, 0, 1)) # ϚʔδϯΛઃఆ plot(selectedData, # બ୒ͨ͠σʔλΛϓϩοτ col = clusters$cluster, # ৭ΛΫϥελຖʹׂΓ౰ͯ pch = 20, # ృΓͭͿؙ͠ҹͰϓϩοτ cex = 3) # จࣈͷ֦େ཰Λࢦఆ points(clusters$centers, pch = 4, cex = 4, lwd = 4) # ઢ෼ͷ෯Λ4ഒʹ 
  27. ΞϓϦέʔγϣϯԽ # ui.R shinyUI(pageWithSidebar( headerPanel('Iris k-means clustering'), sidebarPanel( selectInput('xcol', 'X

    Variable', names(iris)), selectInput('ycol', 'Y Variable', names(iris), selected=names(iris)[[2]]), numericInput('clusters', 'Cluster count', 3, min = 1, max = 9) ), mainPanel( plotOutput('plot1') ) )) 
  28. ΞϓϦέʔγϣϯԽ # Server.R shinyServer(function(input, output, session) { # Combine the

    selected variables into a new data frame selectedData <- reactive({ iris[, c(input$xcol, input$ycol)] }) clusters <- reactive({ kmeans(selectedData(), input$clusters) }) output$plot1 <- renderPlot({ par(mar = c(5.1, 4.1, 0, 1)) plot(selectedData(), col = clusters()$cluster, pch = 20, cex = 3) points(clusters()$centers, pch = 4, cex = 4, lwd = 4) }) }) 
  29. ΞϓϦέʔγϣϯԽͷར఺ > ύϥϝʔλΛมಈͤ͞ΒΕΔ > ୳ࡧతσʔλղੳʹศར > R͚ͩͰͪΐͬͱͨ͠πʔϧͷ࡞੒ʹ΋ 

  30. ࠷ۙ࡞ͬͨ΋ͷ 

  31. ࠷ۙ࡞ͬͨ΋ͷ 

  32. ·ͱΊ

  33. σΟϨΫλʔ͋Δ͋Δ > ѻ͏σʔλ͕৭ʑͳͱ͜ΖʹࢄΒ͹͍ͬͯΔ > ຾͍ͬͯΔେྔͷσʔλΛ࢖͍͍ͨ > ൑அࡐྉΛ૿΍͍ͨ͠ > ൓෮Ͱ͖Δσʔλղੳ͕͍ͨ͠ 

  34. σΟϨΫλʔ͋Δ͋ΔΛղܾ΁ > ͍ΖΜͳΠϯλʔϑΣʔεʹ༰қʹܨ͕Δ > ౷ܭֶతͳΞϓϩʔνͰ൑அࡐྉΛ૿΍͢ > ࡞ਤ΍ΞϓϦԽͰ݁ՌΛ෼͔Γ΍͘͢ > ίʔυԽͰ൓෮Ͱ͖Δσʔλղੳ 

  35. ͜Ε͔Βͷ࿩ > ͍ΖΜͳΠϯλʔϑΣʔεʹ༰қʹܨ͕Δ > ౷ܭֶతͳΞϓϩʔνͰ൑அࡐྉΛʢ͞Βʹʣ ૿΍͢ > ࡞ਤ΍ΞϓϦԽͰ݁ՌΛ෼͔Γ΍͘͢ > ίʔυԽͰ൓෮Ͱ͖Δσʔλղੳ

    
  36. ͋Γ͕ͱ͏͍͟͝·ͨ͠