Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WebサービスのディレクションとR

 WebサービスのディレクションとR

第2回ペパボテックカンファレンス #pbtech

Hiroka Zaitsu

July 04, 2015
Tweet

More Decks by Hiroka Zaitsu

Other Decks in Technology

Transcript

  1. WebαʔϏεͷ
    σΟϨΫγϣϯͱR
    GMO Pepabo, Inc.
    ࡒ௡େՆ
    2015.07.04 ୈ2ճϖύϘςοΫΧϯϑΝϨϯε

    View Slide

  2. ࣗݾ঺հ
    > ࡒ௡େՆʢ@zaimy611ʣ
    > 2012೥5݄ೖࣾ
    > ϩϦϙοϓʂ -> minne
    > σΟϨΫλʔ
    > ࣾձௐࠪͱ౷ܭֶ

    View Slide

  3. ΋͘͡
    > σΟϨΫλʔͷ͜ͱ
    > Rͷಛ௃
    > RͷϥΠϒϥϦ
    > ·ͱΊ
    ※ ࠓ೔ͷ಺༰Ͱ͸౷ܭతͳख๏ʹ͸͋·Γ৮Ε·ͤΜ

    View Slide

  4. σΟϨΫλʔͷ͜ͱ

    View Slide

  5. σΟϨΫλʔ͕΍ͬͯΔ͜ͱ
    > αʔϏεͷશମతͳࡒ຿/ඇࡒ຿ࢦඪͷ؅ཧ
    > αʔϏεαΠτͷσΟϨΫγϣϯ
    > ΩϟϯϖʔϯͳͲͷاը
    > Web޿ࠂͷ؅ཧ
    > ޿ใ΍ӦۀͬΆ͍͜ͱɹetc…

    View Slide

  6. σΟϨΫλʔ͋Δ͋Δ
    > ѻ͏σʔλ͕৭ʑͳͱ͜ΖʹࢄΒ͹͍ͬͯΔ
    > ຾͍ͬͯΔେྔͷσʔλΛ࢖͍͍ͨ
    > ൑அࡐྉΛ૿΍͍ͨ͠
    > ൓෮Ͱ͖Δσʔλղੳ͕͍ͨ͠

    View Slide

  7. View Slide

  8. Rͷಛ௃

    View Slide

  9. Rͷಛ௃
    > ౷ܭ/σʔλղੳ޲͚ݴޠ
    > ղੳ޲͚ʹಛԽͨ͠σʔλૢ࡞
    > ॊೈͳܕͱߏ଄ / ܽଛ΍ඇ਺஋ͳͲ
    > ๛෋ͳ૊ΈࠐΈؔ਺
    > ࡞ਤػೳʹΑΔՄࢹԽ
    > ύοέʔδ։ൃ΋׆ൃ

    View Slide

  10. RStudio

    > Rͷ౷߹։ൃ؀ڥ
    > ϓϩδΣΫτ؅ཧ / ίʔυΤσΟλ / ϑΝΠϥͳͲͷجຊػೳ
    > ࡞ਤͨ͠ը૾ͷදࣔ
    > Git΍SubversionͰόʔδϣϯ؅ཧ
    > RMarkdownͰϨϙʔτΛॻ͘
    > shinyͰΞϓϦέʔγϣϯΛ࡞Δʢޙड़͠·͢ʣ

    View Slide

  11. iris
    > iris
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1 5.1 3.5 1.4 0.2 setosa
    2 4.9 3.0 1.4 0.2 setosa
    3 4.7 3.2 1.3 0.2 setosa
    4 4.6 3.1 1.5 0.2 setosa
    5 5.0 3.6 1.4 0.2 setosa
    6 5.4 3.9 1.7 0.4 setosa
    7 4.6 3.4 1.4 0.3 setosa
    8 5.0 3.4 1.5 0.2 setosa
    9 4.4 2.9 1.4 0.2 setosa
    10 4.9 3.1 1.5 0.1 setosa
    11 5.4 3.7 1.5 0.2 setosa
    12 4.8 3.4 1.6 0.2 setosa
    13 4.8 3.0 1.4 0.1 setosa
    14 4.3 3.0 1.1 0.1 setosa
    15 5.8 4.0 1.2 0.2 setosa


    > ΞϠϝ3छྨ50ݸମͣͭͷ͕͘ͱՖหͦΕͧΕͷ௕͞ͱ෯

    View Slide

  12. ॊೈͳܕͱߏ଄
    > str(iris)
    'data.frame': 150 obs. of 5 variables:
    $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
    $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
    $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
    $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1
    1 1 1 1 1 1 1 ...

    > ϕΫτϧ / Ϧετ / σʔλϑϨʔϜ …
    > ೖΕࢠߏ଄΋ՄೳʮϦετͷதʹϦετʯʮϦετͷதʹσʔλϑϨʔϜʯ
    > ਺஋ܕ / จࣈྻܕ / ࿦ཧܕ / Ҽࢠܕ …ɹ࣮ࡍͷσʔλߏ଄ʹଈͨ͠දݱ

    View Slide

  13. ܽଛ஋΍ඇ਺஋
    > enq1
    q1 q2
    1 ݘ Φε
    2 ೣ ϝε
    3 ແճ౴ Φε
    > str(enq1)
    'data.frame': 3 obs. of 2 variables:
    $ q1: Factor w/ 3 levels "ݘ","ೣ","ແճ౴": 1 2 3
    $ q2: Factor w/ 2 levels "Φε","ϝε": 1 2 1

    > ར༻ྫʣϢʔβʔΞϯέʔτ
    > ແճ౴ͱ͍͏ճ౴͸ແ͍

    View Slide

  14. ܽଛ஋΍ඇ਺஋
    > enq1
    q1 q2
    1 ݘ Φε
    2 ೣ ϝε
    3 Φε
    > str(enq1)
    ‘data.frame’: 3 obs. of 2 variables:
    $ q1: Factor w/ 2 levels "ݘ","ೣ": 1 2 NA
    $ q2: Factor w/ 2 levels "Φε","ϝε": 1 2 1

    > NAʢNot Availableʣ
    > ଞʹ΋ʮ0ׂΔ0 = NaNʢඇ਺ʣʯͳͲ

    View Slide

  15. ૊ΈࠐΈؔ਺
    > summary(iris) # irisͷجຊ౷ܭྔ
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
    1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
    Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
    Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
    3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
    Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
    > var(iris[iris$Species == “setosa",]$Sepal.Length) # setosaछͷ͕͘ͷ௕͞ͷ෼ࢄ
    [1] 0.124249
    > sd(iris[iris$Species == “setosa",]$Sepal.Length) # setosaछͷ͕͘ͷ௕͞ͷඪ४ภࠩ
    [1] 0.3524897

    > σʔλͷཁ໿
    > ֤छ౷ܭॲཧ

    View Slide

  16. ૊ΈࠐΈؔ਺
    > setosa <- iris[iris$Species == “setosa",]$Sepal.Width
    > virginica <- iris[iris$Species == “virginica",]$Sepal.Width
    > summary(setosa)
    Min. 1st Qu. Median Mean 3rd Qu. Max.
    2.300 3.200 3.400 3.428 3.675 4.400
    > summary(virginica)
    Min. 1st Qu. Median Mean 3rd Qu. Max.
    2.200 2.800 3.000 2.974 3.175 3.800

    > setosaछͱvirginicaछͷ͕͘ͷ௕͞ͷฏۉ஋ͷࠩ
    > ʮͬͪ͜ͷํ͕௕͍Έ͍ͨʯͷݕূɹɹར༻ྫʣA/BςετͳͲ

    View Slide

  17. ૊ΈࠐΈؔ਺
    > t.test(setosa, virginica, var.equal = TRUE) # ฏۉ஋ͷࠩͷݕఆ
    Two Sample t-test
    data: setosa and virginica
    t = 6.4503, df = 98, p-value = 4.246e-09
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
    0.3143257 0.5936743
    sample estimates:
    mean of x mean of y
    3.428 2.974

    ※ લఏ৚݅ͷݕఆ͕ඞཁ

    View Slide

  18. ࡞ਤػೳ
    > hist(iris$Sepal.Length)
    > plot(iris)

    View Slide

  19. RͷϥΠϒϥϦ

    View Slide

  20. ࣮ࡍʹ࢖͍ͬͯΔϥΠϒϥϦ
    > RMySQL
    > RGoogleAnalytics
    > rvest … RͰΫϩʔϥʔ
    > shiny … RͰWebΞϓϦέʔγϣϯ etc…

    View Slide

  21. RMySQLͰαʔϏεDB઀ଓ
    > library(RMySQL)
    > dbConnector <- dbConnect(dbDriver(“MySQL”), dbname = “hoge” …)
    > query <- “SELECT * FROM table”
    > result <- dbGetQuery(dbConnector, query)
    > dbDisconnect(dbconnector)

    > ݁Ռ͸σʔλϑϨʔϜͰฦ٫͞ΕΔ
    > ͙͢ʹ౷ܭղੳʹճͨ͠Γ
    > όονͱͯ͠ຖ೔౤͛ͨΓ

    View Slide

  22. RGoogleAnalytics
    > library(RGoogleAnalytics)
    # ΫΤϦΛఆٛ
    > query$Init(start.date = “2015-01-01”,
    end.date = “2015-06-30”,
    dimensions = "ga:date",
    metrics = “ga:visitors,ga:pageviews”,
    sort = “ga:date”,
    segment= "gaid::quux",
    max.results = 10000,
    table.id = “ga:hoge",
    access_token = query$authorize())
    # σʔλΛऔಘ
    > df.ga <- ga$GetReportData(query)

    > ಛఆͷϩδοΫͰᮢ஋ΛఆΊͯ؂ࢹ

    View Slide

  23. rvestͰެ։σʔλऔಘ
    library(rvest)
    url <- "http://www.tripadvisor.com/Hotel_Review-g37209-d1762915-Reviews-
    JW_Marriott_Indianapolis-Indianapolis_Indiana.html"
    reviews <- url %>%
    read_html() %>%
    html_nodes("#REVIEWS .innerBubble")
    rating <- reviews %>%
    html_node(".rating .rating_s_fill") %>%
    html_attr("alt") %>%
    gsub(" of 5 stars", "", .) %>%
    as.integer()
    review <- reviews %>%
    html_node(".entry .partial_entry") %>%
    html_text()
    data.frame(quote, rating, review, stringsAsFactors = FALSE) %>% View()

    https://github.com/hadley/rvest/blob/master/demo/tripadvisor.R

    View Slide

  24. ShinyͰΞϓϦέʔγϣϯ࡞੒
    > σʔλղੳͰݟग़ͨ͠࿮૊ΈΛಈతͳɹɹɹɹɹɹ
    ΞϓϦέʔγϣϯʹ
    > HTML / CSS / JSෆཁ
    > Shiny-ServerͰWebΞϓϦέʔγϣϯԽ΋Մೳɹɹɹ
    R΍౷ܭͷ஌ࣝ΋ෆཁͳͷͰνʔϜར༻ʹศར

    View Slide

  25. ShinyͰΞϓϦέʔγϣϯ࡞੒
    ᵋ/shiny-app
    ᵓ ui.R
    ᵋ server.R

    > جຊߏ੒͸͜Ε͚ͩ

    View Slide

  26. irisͷΫϥελʔ෼ੳ
    selectedData <- iris[,c("Sepal.Length", "Sepal.Width")]
    str(selectedData)
    head(selectedData, 10)
    clusters <- kmeans(selectedData, 3)
    str(clusters)
    par(mar = c(5.1, 4.1, 0, 1)) # ϚʔδϯΛઃఆ
    plot(selectedData, # બ୒ͨ͠σʔλΛϓϩοτ
    col = clusters$cluster, # ৭ΛΫϥελຖʹׂΓ౰ͯ
    pch = 20, # ృΓͭͿؙ͠ҹͰϓϩοτ
    cex = 3) # จࣈͷ֦େ཰Λࢦఆ
    points(clusters$centers,
    pch = 4,
    cex = 4,
    lwd = 4) # ઢ෼ͷ෯Λ4ഒʹ

    View Slide

  27. ΞϓϦέʔγϣϯԽ
    # ui.R
    shinyUI(pageWithSidebar(
    headerPanel('Iris k-means clustering'),
    sidebarPanel(
    selectInput('xcol', 'X Variable', names(iris)),
    selectInput('ycol', 'Y Variable', names(iris),
    selected=names(iris)[[2]]),
    numericInput('clusters', 'Cluster count', 3,
    min = 1, max = 9)
    ),
    mainPanel(
    plotOutput('plot1')
    )
    ))

    View Slide

  28. ΞϓϦέʔγϣϯԽ
    # Server.R
    shinyServer(function(input, output, session) {
    # Combine the selected variables into a new data frame
    selectedData <- reactive({
    iris[, c(input$xcol, input$ycol)]
    })
    clusters <- reactive({
    kmeans(selectedData(), input$clusters)
    })
    output$plot1 <- renderPlot({
    par(mar = c(5.1, 4.1, 0, 1))
    plot(selectedData(),
    col = clusters()$cluster,
    pch = 20, cex = 3)
    points(clusters()$centers, pch = 4, cex = 4, lwd = 4)
    })
    })

    View Slide

  29. ΞϓϦέʔγϣϯԽͷར఺
    > ύϥϝʔλΛมಈͤ͞ΒΕΔ
    > ୳ࡧతσʔλղੳʹศར
    > R͚ͩͰͪΐͬͱͨ͠πʔϧͷ࡞੒ʹ΋

    View Slide

  30. ࠷ۙ࡞ͬͨ΋ͷ

    View Slide

  31. ࠷ۙ࡞ͬͨ΋ͷ

    View Slide

  32. ·ͱΊ

    View Slide

  33. σΟϨΫλʔ͋Δ͋Δ
    > ѻ͏σʔλ͕৭ʑͳͱ͜ΖʹࢄΒ͹͍ͬͯΔ
    > ຾͍ͬͯΔେྔͷσʔλΛ࢖͍͍ͨ
    > ൑அࡐྉΛ૿΍͍ͨ͠
    > ൓෮Ͱ͖Δσʔλղੳ͕͍ͨ͠

    View Slide

  34. σΟϨΫλʔ͋Δ͋ΔΛղܾ΁
    > ͍ΖΜͳΠϯλʔϑΣʔεʹ༰қʹܨ͕Δ
    > ౷ܭֶతͳΞϓϩʔνͰ൑அࡐྉΛ૿΍͢
    > ࡞ਤ΍ΞϓϦԽͰ݁ՌΛ෼͔Γ΍͘͢
    > ίʔυԽͰ൓෮Ͱ͖Δσʔλղੳ

    View Slide

  35. ͜Ε͔Βͷ࿩
    > ͍ΖΜͳΠϯλʔϑΣʔεʹ༰қʹܨ͕Δ
    > ౷ܭֶతͳΞϓϩʔνͰ൑அࡐྉΛʢ͞Βʹʣ
    ૿΍͢
    > ࡞ਤ΍ΞϓϦԽͰ݁ՌΛ෼͔Γ΍͘͢
    > ίʔυԽͰ൓෮Ͱ͖Δσʔλղੳ

    View Slide

  36. ͋Γ͕ͱ͏͍͟͝·ͨ͠

    View Slide