Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rによるデータ可視化と地図表現

 Rによるデータ可視化と地図表現

2022年8月28日に実施された統計思考院 オンラインワークショップ「探索的ビッグデータ解析と再現可能研究」https://sites.google.com/view/ws-ebda-rr-2022/ の発表資料です。
スライドの中のRコードは https://github.com/uribo/220828ism_ws にあげています。

Uryu Shinya

August 28, 2022
Tweet

More Decks by Uryu Shinya

Other Decks in Programming

Transcript

  1. 2022-08-28@ 統 計 思 考 院 探 索 的 解

    析 再 現 可 能 研 究 ⽠ ⽣ 真 也 徳 島 ⼤ 学 デ ザ イ ン 型 AI 教 育 研 究 セ ン タ
  2. 前置 7 {ggplot 2 } GitHub {magrittr} %>% R 4

    . 1 . 0 |> び R 4 . 2 . 1 Speaker Deck 01 # σʔλૢ࡞ͷͨΊͷύοέʔδ 02 library(dplyr, warn.conflicts = FALSE) 03 penguins_xy < - 04 palmerpenguins : : penguins | > 05 select(flipper_length_mm, bill_length_mm, species) | > 06 f i lter(!is.na(flipper_length_mm))
  3. 前置 8 
 01 packages < - c( 02 "tidyverse",

    "sf", "zipangu", "tabularmaps", "geofacet", 03 "palmerpenguins", "datasauRus", "gt", "gapminder", "statebins", 04 "ggtext", "ggrepel", "gghighlight", "patchwork", 05 "rnaturalearth", "ggokabeito") 06 install.packages(setdiff(packages, rownames(installed.packages()))) 07 08 ropensci_pkgs < - c("rnaturalearthhires") 09 install.packages(setdiff(ropensci_pkgs, rownames(installed.packages())), 10 repos = "https: / / ropensci.r - universe.dev") 11 12 wilkelab_pkgs < - c("gridtext") 13 install.packages(setdiff(wilkelab_pkgs, rownames(installed.packages())), 14 repos = "https: / / wilkelab.r - universe.dev") 15 16 uris_pkgs < - c("ssdse") 17 install.packages(setdiff(uris_pkgs, rownames(installed.packages())), 18 repos = "https: / / uribo.r - universe.dev")
  4. ࣮ࡏ ৘ใ : : #609527 : : 29.3cm : 1

    実在 情報、 変換 過程 情報量 損失 ⽣ 写像・ ・可視化 σʔλ   mapping #020202 10
  5. 法則 基 情報 理解 16 ۙ઀ ྨࣅ ғ͍ࠐΈ ด࠯ ࿈ଓੑ

    ઀ଓ 視覚的情報 基 、要素間 関係性 推論
  6. ggplot2 利⽤可能 01 # install.packages("ggplot2") 02 # ggplot2ύοέʔδͷಡΈࠐΈ 03 #

    ൃදࢿྉͰ͸2022೥8݄࣌఺ͰͷCRAN࠷৽όʔδϣϯͰ͋Δ3.3.6Λར༻͠·͢ 04 library(ggplot2) 05 06 # tidyverseύοέʔδʹ಺แ͞Ε͍ͯΔͨΊɺͪ͜ΒΛಡΈࠐΜͰ΋OKͰ͢ 07 # install.packages("tidyverse") 08 # library(tidyverse) 25
  7. ggplot2 第⼀歩 fl ipper_length_mm bill_length_mm species 181 39.1 Adelie 186

    39.5 Adelie 195 40.3 Adelie … … … 198 50.2 Chinstrap 27 01 # σʔλૢ࡞ͷͨΊͷύοέʔδ 02 library(dplyr, warn.conflicts = FALSE) 03 penguins_xy < - 04 palmerpenguins : : penguins | > 05 select(flipper_length_mm, bill_length_mm, species) | > 06 f i lter(!is.na(flipper_length_mm))
  8. 01 ggplot(data = penguins_xy) 02 aes(x = flipper_length_mm, 03 y

    = bill_length_mm) ggplot2 第⼀歩 デ x y 
 + + 
 29 data penguins_xy 
 x fl ipper_length_mm 
 y bill_length_mm
  9. 審美的 (aesthetic) 要素 aes(άϥϑͷதͷ໾ׂ = ม਺) x, y color, fi

    ll size shape, linetype alpha group 32 linewidth* linewidth v 3 . 4 . 0
  10. 審美的要素 指定⽅法 34 01 # 1. aes()͸ϨΠϠͱͯ͠ggplotΦϒδΣΫτʹ௥Ճͯ͠΋ྑ͍ 02 ggplot(data =

    penguins_xy) + 03 aes(x = flipper_length_mm, 04 y = bill_length_mm) 05 06 # 2. ggplot(mapping = aes(...))ͱͯ͠༩͑ͯ΋ྑ͍ 07 ggplot(data = penguins_xy, 08 mapping = aes(x = flipper_length_mm, 09 y = bill_length_mm)) 10 11 # 3. ggplot(data = ,mapping = aes(x = ,y = ))͸҉໧తʹҾ਺Λলུͯ͠هड़Ͱ͖Δ 12 ggplot(penguins_xy, 13 aes(flipper_length_mm, bill_length_mm))
  11. 審美的要素 指定⽅法 35 共通 内 04 color = species, 05

    group = species aes(color = species) aes(group = species) (species) 
 01 ggplot(data = penguins_xy) +
  12. ⼈⼝ 可視化 01 dplyr : : glimpse(df_ssdse_b) 02 #> Rows:

    564 03 #> Columns: 7 04 #> $ prefecture <chr> "๺ւಓ", "๺ւಓ", "๺ւಓ", "๺ւಓ", … 05 #> $ year <int> 2019, 2018, 2017, 2016, 2015, 2014, 2013, … 06 #> $ population <dbl> 5250000, 5286000, 5320000, 5352000, 5381733, … 07 #> $ birth_male <dbl> 15988, 16681, 17503, 17888, 18838, 19010, … 08 #> $ birth_female <dbl> 15032, 15961, 16537, 17237, 17857, 18048, … 09 #> $ spending <dbl> 294682, 281054, 286698, 287325, 272124, … 10 #> $ food_expenses <dbl> 72912, 69044, 69640, 69445, 65912, 65450, … 11 12 df_ssdse_b2019 < - 13 df_ssdse_b | > 14 f i lter(year = = 2019) 符 符 36 統計 提供 教育⽤標準 (SSDSE-B) 加⼯ IUUQTXXXOTUBDHPKQVTFMJUFSBDZTTETF
  13. ⼈⼝ 可視化 37 01 df_ssdse_b2019 | > 02 ggplot() +

    03 aes(prefecture, population) + 04 geom_bar(stat = "identity") x
  14. ⼈⼝ 可視化 39 02 # ͓͓ΑͦͷҢ౓ͷॱ൪ʹ഑ஔ͢Δ 03 mutate(prefecture = 04

    forcats : : fct_rev( 05 forcats : : fct_inorder( 06 prefecture))) | > 01 df_ssdse_b2019
  15. 統計処理 44 stat = "identity" ? geom_bar #> function (mapping

    = NULL, data = NULL, stat = "count", position = "stack", #> . . . , width = NULL, na.rm = FALSE, orientation = NA, show.legend = NA, #> inherit.aes = TRUE) geom_bar(stat = "count") 01 df_ssdse_b2019 | > 02 f i lter_shikoku() | > 03 ggplot() + 04 aes(prefecture, population) + 05 geom_bar(stat = "identity")
  16. 統計処理 : stat_*() 上 変形・集計 ⾏ 45 stat = "count"

    ... aes(x = ) y 01 tibble : : tibble( 02 group = c(rep("a", 4), rep("b", 2), "c") 03 ) | > 04 ggplot() + 05 aes(x = group) + 06 geom_bar(stat = "count") 01 df_ssdse_b2019 | > 02 f i lter_shikoku() | > 03 ggplot() + 04 aes(prefecture, population) + 05 geom_bar(stat = "identity") stat = "identity" ... aes(y = ) stat_count() group
  17. 統計処理 : stat_*() geom_*(stat = ) or stat_*() 利⽤ 46

    01 p < - 02 df_ssdse_b2019 | > 03 ggplot() + 04 aes(x = spending) p + geom_bar(stat = "bin", bins = 10) p + geom_histogram(bins = 10) p + stat_bin(bins = 10)
  18. : scale_<aesthetic>_*() 48 01 df_ssdse_b | > 02 f i

    lter_shikoku() | > 03 ggplot() + 04 aes(year, population, 05 group = prefecture, 06 color = prefecture) + 07 geom_line()
  19. : scale_<aesthetic>_*() 49 01 df_ssdse_b 08 scale_x_continuous( 09 breaks =

    seq.int(2008, 2019, by = 2)) + 10 scale_y_log10() + 11 scale_color_viridis_d() x y
  20. 座標系 50 値 配置 決 2 ggplot 2 x y

    x y 
 ) x 3 
 y 10 (0,10, 20,...)
  21. 座標系: coord_*() 51 p < - ggplot(data = penguins_xy) +

    aes(x = flipper_length_mm, y = bill_length_mm) + geom_point() p + coord_flip() x y x y p + coord_f i xed(ratio = 1)
  22. 52 任意 変数 基 分割、 形式 表⽰ 'SBODJT"8BMLFS 1VCMJDEPNBJO WJB8JLJNFEJB$PNNPOT

    IUUQTVQMPBEXJLJNFEJBPSHXJLJQFEJBDPNNPOTG1FSTPOT@8JUI@(BJOGVM@0DDVQBUJPOT@BOE@"UUFOEJOH@4DIPPM@JO@KQH 9 (1874)
  23. : facet_*() 53 01 p < - 02 df_ssdse_b |

    > 03 f i lter_shikoku() | > 04 select(year, prefecture, starts_with("birth")) | > 05 tidyr : : pivot_longer(cols = starts_with("birth"), 06 names_to = "gender", 07 values_to = "value", 08 names_pref i x = "birth_") | > 09 ggplot() + 10 aes(year, value) + 11 geom_line(aes(group = prefecture, color = prefecture)) + 12 scale_x_continuous(breaks = seq.int(2008, 2019, by = 2)) + 13 scale_color_manual(values = custom_pals[1 : 4]) 14 15 p$data
  24. : facet_*() 54 p_facet < - p + facet_wrap(~ gender,

    scales = "free_y", ncol = 2) p + facet_grid(rows = vars(gender), scales = "free_y")
  25. : theme(), theme_*() 56 01 p_facet + 02 theme_classic() +

    03 theme( 04 strip.background = element_rect(f i ll = "gray"), 05 strip.text = element_text(color = "white", 06 face = "bold"), 07 legend.position = "top", 08 axis.title = element_text(color = "#28a87d"), 09 axis.text = element_text(color = "#28a87d"), 10 axis.ticks = element_line(color = "#28a87d")) 細 部位 theme()、element_*() 制御 element_blank() element_rect() element_line() element_text()
  26. {ggtext} ⽂字 装飾 57 01 library(palmerpenguins) 02 p_penguins_scatter < -

    03 penguins | > 04 ggplot() + 05 aes(flipper_length_mm, bill_length_mm) + 06 geom_point(aes(color = species), 07 alpha = 0.5, 08 show.legend = FALSE) + 09 geom_smooth(method = "lm", 10 aes(group = species), 11 se = FALSE, 12 color = "#0E0E0E") + 13 scale_color_manual(values = custom_pals) + 14 theme_light() + 15 labs(title = "ϖϯΪϯͷମͷେ͖͞ͷؔ܎", 16 x = "ཌྷͷ௕͞(mm)", 17 y = "ͪ͘͹͠ͷ௕͞(mm)")
  27. {ggtext} ⽂字 装飾 58 01 library(ggtext) 03 labs(title = "

    * * ϖϯΪϯͷମͷେ͖͞ͷؔ܎ * * ", 04 x = "*F l ipper length * (mm)", 05 y = "*Bill length * (mm)") + 06 theme( 07 plot.title = ggtext : : element_markdown( 08 color = custom_pals[4], 09 f i ll = "gray80" 10 ), 11 axis.title.x = element_markdown(), 12 axis.title.y = element_markdown()) 02 p_penguins_scatter + Markdown
  28. 対 注釈 {ggrepel} 追加 60 01 library(ggrepel) 02 p_gapminder <

    - 03 gapminder : : gapminder | > 04 f i lter(year = = 2007, 05 continent = = "Americas") | > 06 ggplot() + 07 aes(gdpPercap, lifeExp) + 08 geom_point()
  29. 対 注釈 {ggrepel} 追加 61 01 p_gapminder + 02 ggrepel

    : : geom_text_repel( 03ɹ aes(label = country))
  30. 01 ne_jpn 02 #> Simple feature collection with 47 features

    and 3 f i elds 03 #> Geometry type: MULTIPOLYGON 04 #> Dimension: XY 05 #> Bounding box: xmin: 122.9382 ymin: 24.2121 xmax: 153.9856 ymax: 45.52041 06 #> CRS : + proj=longlat + datum=WGS84 + no_defs + ellps=WGS84 + towgs84=0,0,0 07 #> First 10 features: 08 #> iso_3166_2 prefecture region geometry 09 #> 1 01 ๺ւಓ Hokkaido MULTIPOLYGON (((143.8965 44 . . . 10 #> 2 02 ੨৿ݝ Tohoku MULTIPOLYGON (((139.9438 40 . . . 11 #> 3 03 ؠखݝ Tohoku MULTIPOLYGON (((141.681 40.... 12 #> 4 04 ٶ৓ݝ Tohoku MULTIPOLYGON (((141.6403 38 . . . 13 #> 5 05 ळాݝ Tohoku MULTIPOLYGON (((139.8809 39 . . . 地図 作 67 地理空間情報 拡張可能 {sf} 便利 sf sfg(geometry) sfc (column) sf 3 Natural Earth {rnaturalearth}
  31. 地図 作 69 01 ne_jpn | > 02 ggplot() +

    03 aes(f i ll = prefecture) + 04 geom_sf(show.legend = FALSE)
  32. 地理空間情報 facet 71 01 sf_jpn_population | > 02 f i

    lter(prefecture %in% c("ಙౡݝ", "߳઒ݝ", "Ѫඤݝ", "ߴ஌ݝ")) | > 03 ggplot() + 04 aes(f i ll = population) + 05 geom_sf(color = "transparent") + 06 guides(f i ll = guide_colorbar(title = "૯ਓޱ")) + 07 scico : : scale_f i ll_scico(palette = "imola", 08 labels = zipangu : : label_kansuji(), 09 breaks = c(700000, 1000000, 1400000)) + 10 labs(title = "࢛ࠃ4ݝͷਓޱ", 11 subtitle = "2008೥͔Β2019೥ͷਪҠ", 12 caption = "Source: ౷ܭηϯλʔ ڭҭ༻ඪ४σʔληοτ\nSSDSE-ݝผਪҠʢSSDSE-Bʣ") + 13 theme_void() + 14 theme(legend.position = "top", 15 legend.key.width = unit(3.0, "line")) + 16 facet_wrap(~ year, nrow = 2)
  33. 地図投影法 変換 75 01 # Natural Earth͔ΒશٿϙϦΰϯΛऔಘ 02 ne_world <

    - 03 rnaturalearth : : ne_countries(scale = 10, 04 returnclass = "sf") 05 06 p < - 07 ne_world | > 08 ggplot() + 09 geom_sf() sf
  34. 地図投影法 変換 76 1 . coord_sf() 2. CRS 01 #

    ϞϧϫΠσਤ๏ʹΑΔੈք஍ਤͷඳը 02 p + 03 coord_sf(crs = " + proj=moll") 01 # st_transform()ʹΑΔ࠲ඪࢀরܥͷมߋ 02 ne_world_moll < - 03 sf : : st_transform(ne_world, crs = " + proj=moll") 04 05 ggplot(data = ne_world_moll) + 06 geom_sf()
  35. 地理空間 属性 紐付 77 prefecture geometry Hokkaido POLYGON(…) Aomori-ken POLYGON(…)

    … … Okinawa-ken POLYGON(…) prefecture population ๺ւಓ 5.281 ੨৿ݝ 1.249 … … ԭೄݝ 1.457 distinct geometry Hokkaido POLYGON(…) Honsyu POLYGON(…) … … Kyusyu POLYGON(…) prefecture geometry ๺ւಓ POLYGON(…) ੨৿ݝ POLYGON(…) … … ԭೄݝ POLYGON(…) × × ✓ + NG NG 
 = prefecture population geometry ๺ւಓ 5.281 POLYGON(…) ੨৿ݝ 1.249 POLYGON(…) … … … ԭೄݝ 1.457 POLYGON(…)
  36. 必要 応 名寄 集計単位 変更 78 01 ne_jpn_region < -

    02 ne_jpn | > 03 group_by(region) | > 04 summarise() 01 ne_jpn_region | > 02 left_join(df_region_pops_2019, 03 by = "region") | > 04 ggplot() + 05 aes(f i ll = population) + 06 geom_sf(color = "transparent") geometry
  37. 地図 使 表現 79 IUUQTXXXTUPQDPWJEKQ 空間的 位置関係 ⼤ 保持 形式

    表現 ΧϥϜ஍ਤ ※⾒ ⼈ 対 、空間的 配置 対 理解 暗黙的 求 {statebins}
  38. 01 library(tabularmaps) 02 jpn77 | > 03 select(jis_code, 04 prefecture

    = prefecture_kanji, 05 x, 06 y) | > 07 left_join(df_ssdse_b2019, 08 by = "prefecture") | > 09 tabularmap(x, 10 y, 11 group = jis_code, 12 label = prefecture, 13 f i ll = population, 14 size = 3) + 15 scale_f i ll_viridis_c() + 16 theme_tabularmap() {tabularmaps} 80
  39. {geofacet} 81 01 library(geofacet) 02 jp_prefs_grid1 < - 03 jp_prefs_grid1

    | > 04 left_join( 05 zipangu : : jpnprefs | > 06 select(jis_code, prefecture = prefecture_kanji), 07 by = c("code_pref_jis" = "jis_code")) 01 p < - 02 df_ssdse_b | > 03 ggplot() + 04 aes(year, population) + 05 geom_line() + 06 theme_gray(base_size = 6) 07 08 p + 09 facet_geo(~ prefecture, 10 grid = "jp_prefs_grid1", 11 scales = "free_y")
  40. 多 図・ ⾒ TidyTuesday 85 # 30 DayMapChallenge > A

    weekly social data project in R. IUUQTHJUIVCDPNSGPSEBUBTDJFODFUJEZUVFTEBZ from Data to Vis Twitter #tidytuesday R Python D 3 .js THE R GRAPH GALLERY IUUQTSHSBQIHBMMFSZDPN IUUQTXXXEBUBUPWJ[DPN awesome-ggplot2 IUUQTHJUIVCDPNFSJLHBIOFSBXFTPNFHHQMPU ggplot 2 ggplot2 extensions IUUQTFYUTHHQMPUUJEZWFSTFPSH ggplot 2 #rtistry
  41. 参考⽂献・URL 87 Kieran Healy, 2019. Data Visualization: A practical introduction.

    Princeton University Press (⽠⽣真也、江⼝哲史、三村喬⽣ 訳, 2021. 分析 可視化⼊⾨. 講談社 ) Claus Wilke, 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. OʼReilly (⼩林儀匡、瀬⼾⼭雅⼈訳, 2022. 基礎: 明確 、魅⼒的 、説得⼒ ⾒ ⽅・伝 ⽅. ・ ) Michael Friendly and Howard Wainer, 2021. A History of Data Visualization and Graphic Communication. Harvard University Press (飯嶋貴⼦訳, 2021. 視覚化 ⼈類史: 発明 時間 空間 可視化 . ⻘⼟社) Jonathan Schwabish, 2021. Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks Fundamentals of Data Visualization. Columbia University Press Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen, (2022). ggplot2: elegant graphics for data analysis. Springer Cédric Scherer 2022. Graphic Design with ggplot2: How to Create Engaging and Complex Visualizations in R IUUQTDMBVTXJMLFDPNEBUBWJ[ IUUQTGSJFOEMZHJUIVCJP)JTU%BUB7JT IUUQTHHQMPUCPPLPSH IUUQTTPDWJ[DP IUUQTSTUVEJPDPOGHJUIVCJPHHQMPUHSBQIJDEFTJHO