Slide 1

Slide 1 text

WebαʔϏεͷ σΟϨΫγϣϯͱR GMO Pepabo, Inc. ࡒ௡େՆ 2015.07.04 ୈ2ճϖύϘςοΫΧϯϑΝϨϯε

Slide 2

Slide 2 text

ࣗݾ঺հ > ࡒ௡େՆʢ@zaimy611ʣ > 2012೥5݄ೖࣾ > ϩϦϙοϓʂ -> minne > σΟϨΫλʔ > ࣾձௐࠪͱ౷ܭֶ

Slide 3

Slide 3 text

΋͘͡ > σΟϨΫλʔͷ͜ͱ > Rͷಛ௃ > RͷϥΠϒϥϦ > ·ͱΊ ※ ࠓ೔ͷ಺༰Ͱ͸౷ܭతͳख๏ʹ͸͋·Γ৮Ε·ͤΜ

Slide 4

Slide 4 text

σΟϨΫλʔͷ͜ͱ

Slide 5

Slide 5 text

σΟϨΫλʔ͕΍ͬͯΔ͜ͱ > αʔϏεͷશମతͳࡒ຿/ඇࡒ຿ࢦඪͷ؅ཧ > αʔϏεαΠτͷσΟϨΫγϣϯ > ΩϟϯϖʔϯͳͲͷاը > Web޿ࠂͷ؅ཧ > ޿ใ΍ӦۀͬΆ͍͜ͱɹetc…

Slide 6

Slide 6 text

σΟϨΫλʔ͋Δ͋Δ > ѻ͏σʔλ͕৭ʑͳͱ͜ΖʹࢄΒ͹͍ͬͯΔ > ຾͍ͬͯΔେྔͷσʔλΛ࢖͍͍ͨ > ൑அࡐྉΛ૿΍͍ͨ͠ > ൓෮Ͱ͖Δσʔλղੳ͕͍ͨ͠

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Rͷಛ௃

Slide 9

Slide 9 text

Rͷಛ௃ > ౷ܭ/σʔλղੳ޲͚ݴޠ > ղੳ޲͚ʹಛԽͨ͠σʔλૢ࡞ > ॊೈͳܕͱߏ଄ / ܽଛ΍ඇ਺஋ͳͲ > ๛෋ͳ૊ΈࠐΈؔ਺ > ࡞ਤػೳʹΑΔՄࢹԽ > ύοέʔδ։ൃ΋׆ൃ

Slide 10

Slide 10 text

RStudio > Rͷ౷߹։ൃ؀ڥ > ϓϩδΣΫτ؅ཧ / ίʔυΤσΟλ / ϑΝΠϥͳͲͷجຊػೳ > ࡞ਤͨ͠ը૾ͷදࣔ > Git΍SubversionͰόʔδϣϯ؅ཧ > RMarkdownͰϨϙʔτΛॻ͘ > shinyͰΞϓϦέʔγϣϯΛ࡞Δʢޙड़͠·͢ʣ

Slide 11

Slide 11 text

iris > iris Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa 7 4.6 3.4 1.4 0.3 setosa 8 5.0 3.4 1.5 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 4.9 3.1 1.5 0.1 setosa 11 5.4 3.7 1.5 0.2 setosa 12 4.8 3.4 1.6 0.2 setosa 13 4.8 3.0 1.4 0.1 setosa 14 4.3 3.0 1.1 0.1 setosa 15 5.8 4.0 1.2 0.2 setosa … > ΞϠϝ3छྨ50ݸମͣͭͷ͕͘ͱՖหͦΕͧΕͷ௕͞ͱ෯

Slide 12

Slide 12 text

ॊೈͳܕͱߏ଄ > str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... > ϕΫτϧ / Ϧετ / σʔλϑϨʔϜ … > ೖΕࢠߏ଄΋ՄೳʮϦετͷதʹϦετʯʮϦετͷதʹσʔλϑϨʔϜʯ > ਺஋ܕ / จࣈྻܕ / ࿦ཧܕ / Ҽࢠܕ …ɹ࣮ࡍͷσʔλߏ଄ʹଈͨ͠දݱ

Slide 13

Slide 13 text

ܽଛ஋΍ඇ਺஋ > enq1 q1 q2 1 ݘ Φε 2 ೣ ϝε 3 ແճ౴ Φε > str(enq1) 'data.frame': 3 obs. of 2 variables: $ q1: Factor w/ 3 levels "ݘ","ೣ","ແճ౴": 1 2 3 $ q2: Factor w/ 2 levels "Φε","ϝε": 1 2 1 > ར༻ྫʣϢʔβʔΞϯέʔτ > ແճ౴ͱ͍͏ճ౴͸ແ͍

Slide 14

Slide 14 text

ܽଛ஋΍ඇ਺஋ > enq1 q1 q2 1 ݘ Φε 2 ೣ ϝε 3 Φε > str(enq1) ‘data.frame’: 3 obs. of 2 variables: $ q1: Factor w/ 2 levels "ݘ","ೣ": 1 2 NA $ q2: Factor w/ 2 levels "Φε","ϝε": 1 2 1 > NAʢNot Availableʣ > ଞʹ΋ʮ0ׂΔ0 = NaNʢඇ਺ʣʯͳͲ

Slide 15

Slide 15 text

૊ΈࠐΈؔ਺ > summary(iris) # irisͷجຊ౷ܭྔ Sepal.Length Sepal.Width Petal.Length Petal.Width Species Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50 Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 > var(iris[iris$Species == “setosa",]$Sepal.Length) # setosaछͷ͕͘ͷ௕͞ͷ෼ࢄ [1] 0.124249 > sd(iris[iris$Species == “setosa",]$Sepal.Length) # setosaछͷ͕͘ͷ௕͞ͷඪ४ภࠩ [1] 0.3524897 > σʔλͷཁ໿ > ֤छ౷ܭॲཧ

Slide 16

Slide 16 text

૊ΈࠐΈؔ਺ > setosa <- iris[iris$Species == “setosa",]$Sepal.Width > virginica <- iris[iris$Species == “virginica",]$Sepal.Width > summary(setosa) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.300 3.200 3.400 3.428 3.675 4.400 > summary(virginica) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.200 2.800 3.000 2.974 3.175 3.800 > setosaछͱvirginicaछͷ͕͘ͷ௕͞ͷฏۉ஋ͷࠩ > ʮͬͪ͜ͷํ͕௕͍Έ͍ͨʯͷݕূɹɹར༻ྫʣA/BςετͳͲ

Slide 17

Slide 17 text

૊ΈࠐΈؔ਺ > t.test(setosa, virginica, var.equal = TRUE) # ฏۉ஋ͷࠩͷݕఆ Two Sample t-test data: setosa and virginica t = 6.4503, df = 98, p-value = 4.246e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.3143257 0.5936743 sample estimates: mean of x mean of y 3.428 2.974 ※ લఏ৚݅ͷݕఆ͕ඞཁ

Slide 18

Slide 18 text

࡞ਤػೳ > hist(iris$Sepal.Length) > plot(iris)

Slide 19

Slide 19 text

RͷϥΠϒϥϦ

Slide 20

Slide 20 text

࣮ࡍʹ࢖͍ͬͯΔϥΠϒϥϦ > RMySQL > RGoogleAnalytics > rvest … RͰΫϩʔϥʔ > shiny … RͰWebΞϓϦέʔγϣϯ etc…

Slide 21

Slide 21 text

RMySQLͰαʔϏεDB઀ଓ > library(RMySQL) > dbConnector <- dbConnect(dbDriver(“MySQL”), dbname = “hoge” …) > query <- “SELECT * FROM table” > result <- dbGetQuery(dbConnector, query) > dbDisconnect(dbconnector) > ݁Ռ͸σʔλϑϨʔϜͰฦ٫͞ΕΔ > ͙͢ʹ౷ܭղੳʹճͨ͠Γ > όονͱͯ͠ຖ೔౤͛ͨΓ

Slide 22

Slide 22 text

RGoogleAnalytics > library(RGoogleAnalytics) # ΫΤϦΛఆٛ > query$Init(start.date = “2015-01-01”, end.date = “2015-06-30”, dimensions = "ga:date", metrics = “ga:visitors,ga:pageviews”, sort = “ga:date”, segment= "gaid::quux", max.results = 10000, table.id = “ga:hoge", access_token = query$authorize()) # σʔλΛऔಘ > df.ga <- ga$GetReportData(query) > ಛఆͷϩδοΫͰᮢ஋ΛఆΊͯ؂ࢹ

Slide 23

Slide 23 text

rvestͰެ։σʔλऔಘ library(rvest) url <- "http://www.tripadvisor.com/Hotel_Review-g37209-d1762915-Reviews- JW_Marriott_Indianapolis-Indianapolis_Indiana.html" reviews <- url %>% read_html() %>% html_nodes("#REVIEWS .innerBubble") rating <- reviews %>% html_node(".rating .rating_s_fill") %>% html_attr("alt") %>% gsub(" of 5 stars", "", .) %>% as.integer() review <- reviews %>% html_node(".entry .partial_entry") %>% html_text() data.frame(quote, rating, review, stringsAsFactors = FALSE) %>% View() https://github.com/hadley/rvest/blob/master/demo/tripadvisor.R

Slide 24

Slide 24 text

ShinyͰΞϓϦέʔγϣϯ࡞੒ > σʔλղੳͰݟग़ͨ͠࿮૊ΈΛಈతͳɹɹɹɹɹɹ ΞϓϦέʔγϣϯʹ > HTML / CSS / JSෆཁ > Shiny-ServerͰWebΞϓϦέʔγϣϯԽ΋Մೳɹɹɹ R΍౷ܭͷ஌ࣝ΋ෆཁͳͷͰνʔϜར༻ʹศར

Slide 25

Slide 25 text

ShinyͰΞϓϦέʔγϣϯ࡞੒ ᵋ/shiny-app ᵓ ui.R ᵋ server.R > جຊߏ੒͸͜Ε͚ͩ

Slide 26

Slide 26 text

irisͷΫϥελʔ෼ੳ selectedData <- iris[,c("Sepal.Length", "Sepal.Width")] str(selectedData) head(selectedData, 10) clusters <- kmeans(selectedData, 3) str(clusters) par(mar = c(5.1, 4.1, 0, 1)) # ϚʔδϯΛઃఆ plot(selectedData, # બ୒ͨ͠σʔλΛϓϩοτ col = clusters$cluster, # ৭ΛΫϥελຖʹׂΓ౰ͯ pch = 20, # ృΓͭͿؙ͠ҹͰϓϩοτ cex = 3) # จࣈͷ֦େ཰Λࢦఆ points(clusters$centers, pch = 4, cex = 4, lwd = 4) # ઢ෼ͷ෯Λ4ഒʹ

Slide 27

Slide 27 text

ΞϓϦέʔγϣϯԽ # ui.R shinyUI(pageWithSidebar( headerPanel('Iris k-means clustering'), sidebarPanel( selectInput('xcol', 'X Variable', names(iris)), selectInput('ycol', 'Y Variable', names(iris), selected=names(iris)[[2]]), numericInput('clusters', 'Cluster count', 3, min = 1, max = 9) ), mainPanel( plotOutput('plot1') ) ))

Slide 28

Slide 28 text

ΞϓϦέʔγϣϯԽ # Server.R shinyServer(function(input, output, session) { # Combine the selected variables into a new data frame selectedData <- reactive({ iris[, c(input$xcol, input$ycol)] }) clusters <- reactive({ kmeans(selectedData(), input$clusters) }) output$plot1 <- renderPlot({ par(mar = c(5.1, 4.1, 0, 1)) plot(selectedData(), col = clusters()$cluster, pch = 20, cex = 3) points(clusters()$centers, pch = 4, cex = 4, lwd = 4) }) })

Slide 29

Slide 29 text

ΞϓϦέʔγϣϯԽͷར఺ > ύϥϝʔλΛมಈͤ͞ΒΕΔ > ୳ࡧతσʔλղੳʹศར > R͚ͩͰͪΐͬͱͨ͠πʔϧͷ࡞੒ʹ΋

Slide 30

Slide 30 text

࠷ۙ࡞ͬͨ΋ͷ

Slide 31

Slide 31 text

࠷ۙ࡞ͬͨ΋ͷ

Slide 32

Slide 32 text

·ͱΊ

Slide 33

Slide 33 text

σΟϨΫλʔ͋Δ͋Δ > ѻ͏σʔλ͕৭ʑͳͱ͜ΖʹࢄΒ͹͍ͬͯΔ > ຾͍ͬͯΔେྔͷσʔλΛ࢖͍͍ͨ > ൑அࡐྉΛ૿΍͍ͨ͠ > ൓෮Ͱ͖Δσʔλղੳ͕͍ͨ͠

Slide 34

Slide 34 text

σΟϨΫλʔ͋Δ͋ΔΛղܾ΁ > ͍ΖΜͳΠϯλʔϑΣʔεʹ༰қʹܨ͕Δ > ౷ܭֶతͳΞϓϩʔνͰ൑அࡐྉΛ૿΍͢ > ࡞ਤ΍ΞϓϦԽͰ݁ՌΛ෼͔Γ΍͘͢ > ίʔυԽͰ൓෮Ͱ͖Δσʔλղੳ

Slide 35

Slide 35 text

͜Ε͔Βͷ࿩ > ͍ΖΜͳΠϯλʔϑΣʔεʹ༰қʹܨ͕Δ > ౷ܭֶతͳΞϓϩʔνͰ൑அࡐྉΛʢ͞Βʹʣ ૿΍͢ > ࡞ਤ΍ΞϓϦԽͰ݁ՌΛ෼͔Γ΍͘͢ > ίʔυԽͰ൓෮Ͱ͖Δσʔλղੳ

Slide 36

Slide 36 text

͋Γ͕ͱ͏͍͟͝·ͨ͠