Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rによる大規模データの処理

Uryu Shinya
December 21, 2022

 Rによる大規模データの処理

2022年12月17日に開催された「R研究集会2022」の発表資料です。

当日の投影内容から、メモリの消費量の算出方法に誤りがあったため一部内容を変更しています。
リポジトリ: https://github.com/uribo/talk_221217_rjpusers

Uryu Shinya

December 21, 2022
Tweet

More Decks by Uryu Shinya

Other Decks in Programming

Transcript

  1. ӝੜਅ໵ʢಙౡେֶσβΠϯܕ"*ڭҭݚڀηϯλʔʣ
    ʹΑΔେن໛σʔλͷॲཧ
    ౷ܭ਺ཧݚڀॴڞಉར༻ݚڀूձʮσʔλղੳ؀ڥ3ͷ੔උͱར༻ʯ
    V@SJCP

    IUUQTHJUIVCDPNVSJCPUBML@@SKQVTFST



    View Slide

  2. Ұԡ͠ϙΠϯτ
    େن໛σʔληοτͷޮ཰తͳॲཧͱసૹͷͨΊʹઃܭ
    "QBDIF
    "SSPX
    \BSSPX^
    ύοέʔδ
    ΠϯϝϞϦͷΧϥϜܕσʔλΛऔΓѻ͏ʢγϦΞϥΠζෆཁʣ
    ଟݴޠπʔϧϘοΫεͱͯ͠ɺ͞·͟·ͳϓϩάϥϛϯάݴޠͰ࣮૷
    $ 3 1ZUIPO 3VTU +VMJB %VDL%#ͳͲ
    EQMZSόοΫΤϯυΛఏڙɻύΠϓԋࢉࢠʹΑΔॲཧΛద༻Մೳ
    QBSRVFUʹରԠɻύʔςΟγϣχϯάʹΑΔ؅ཧ
    MVCSJEBUFɺSMBOHɺTUSJOHSɺUJEZTFMFDUͳͲͷؔ਺ʹ΋ରԠ
    ະରԠͳؔ਺Λ\EVDLEC^ύοέʔδ͕Χόʔ
    ଎͍ খ͍͞
    ಡΈࠐΈɾूܭՃ޻ॲཧ͕ ϑΝΠϧɾϝϞϦαΠζ͕
    ޮ཰త
    DTWϑΝΠϧ͔ΒͷEQMZS΍EBUBUBCMFͰͷॲཧͱൺֱͯ͠ɺ͔ͳΓߴ଎
    σʔλͷૢ࡞ɾ؅ཧ͕

    View Slide

  3. ࣗݾ঺հ
    ӝੜਅ໵
    ͏ΓΎ͏͠Μ΍
    ஍ཧۭؒσʔλղੳ
    ೥݄ʙಙౡ
    ౷ܭɺσʔλαΠΤϯε
    Ծ
    $ Ҋ
    Χόʔɿϓϩηε $
    ࢖༻ $(ɿετοΫϑΥτͷ 3' ը૾ʢ ԁʣ
    σʔλՄࢹԽ 3ݴޠ
    aདྷ೥݄ग़൛

    View Slide

  4. େ͖͍σʔλͷॲཧɺͲ͏͍ͯ͠·͔͢ʁ
    ͜͜Ͱ͸ʮେ͖͍σʔλʯΛʮݱ࣮తʹѻ͑ͦ͏ͳ෺ʯͱ͠·͢
    ୯ҰͷςΩετɾόΠφϦϑΝΠϧ
    ҰͭͷϑΝΠϧ͕࠷େͰ਺(#ɺશମͰ(#Ҏ಺ఔ౓
    σʔλج൫ʢ%#ɺ)BEPPQɺ"NB[PO"UIFOBɺ#JH2VFSZʣ౳Λར༻͠ͳ͍
    BXLͳͲͰલॲཧΛ͠ͳ͍ఔ౓ͷେ͖͞
    ϩʔΧϧɾΫϥ΢υͰॲཧʜ
    ͳΔ΂҆͘Ձɾ௿ੑೳͷ؀ڥͰࡁ·͍ͤͨ

    ฒྻԽ
    ϝϞϦΛͨ͘͞ΜੵΜͩίϯϐϡʔλʔͰॲཧͤ͞Δ

    View Slide

  5. େ͖͍σʔλͷॲཧɺͲ͏͍ͯ͠·͔͢ʁ
    3ɺ34UVEJP͋Δ͋Δ
    ॲཧͷաఔͰϝϞϦΛ࢖͍੾Δ
    34UVEJP͕രൃ͢Δ

    View Slide

  6. "QBDIF"SSPX

    View Slide

  7. "QBDIF"SSPX
    ɹɹɹɹɹɹͷσʔλ෼ੳπʔϧͷϓϥοτϑΥʔϜ
    ڞ௨ͷ࣮૷ΛݴޠΛ௒͑ͯڞ༗Ͱ͖Δ
    ϝϞϦ಺ͷσʔλΛޮ཰తʹߏ଄Խͯ͠ॲཧʢγϦΞϥΠζෆཁɺߴ଎ʣ
    σʔλަ׵࣌ͷੑೳվળʹΑΔߴ଎ͳॲཧΛ࣮ݱ

    "QBDIF"SSPXܗࣜʜBSSPX
    QBSRVFU
    ྻࢦ޲ͷσʔλʢΧϥϜܕσʔλʣΛѻ͏͜ͱʹಛԽ
    ෳ਺ݴޠରԠ
    ΠϯϝϞϦͰͷॲཧΛ࣮ݱ

    View Slide

  8. ΧϥϜܕσʔλσʔλ෼ੳʹಛԽ
    IUUQTBSSPXBQBDIFPSHPWFSWJFX
    ৯೑ྨ
    ྶ௕ྨ
    ྶ௕ྨ






    Ϩοαʔύϯμ
    νϯύϯδʔ
    Ϛϯτώώ
    ৯೑ྨ
    ௗྨ
    ϥΠΦϯ
    ϑϯϘϧτϖϯΪϯ




    ߦ୯ҐͰσʔλΛอ࣋ɾऔΓग़͠
    $47΍.Z42-ͳͲͷσʔλϕʔε
    ߦࢦ޲
    ಛఆྻͷ஋΁ͷૢ࡞
    ˠશ෦ͷߦɾྻ΁ͷΞΫηεΛ൐͏
    ྻ୯ҐͰσʔλΛอ࣋ɾऔΓग़͠
    ྻࢦ޲ ৯೑ྨ
    ྶ௕ྨ
    ྶ௕ྨ






    Ϩοαʔύϯμ
    νϯύϯδʔ
    Ϛϯτώώ
    ৯೑ྨ
    ௗྨ
    ϥΠΦϯ
    ϑϯϘϧτϖϯΪϯ




    ˠࢦఆ͞ΕͨྻͷΈಡΈࠐΈ
    େྔͷߦʹର͢Δू໿ॲཧ͕ޮ཰త

    View Slide

  9. \BSSPX^ύοέʔδ

    View Slide

  10. \BSSPX^ύοέʔδͷಛ௃
    "SSPX$ϥΠϒϥϦ΁ͷΠϯλʔϑΣʔεΛఏڙ
    "SSPX"SSPX͕ѻ͏BSSPX QBSRVFUܗࣜͷσʔλ
    \EVDLEC^ύοέʔδɺ1ZUIPOͱͷ࿈ܞ
    ʢϝϞϦʹ৐Γ੾Βͳ͍ن໛ͷʣෳ਺ͷϑΝΠϧΛಡΈࠐΈɺॲཧ
    \EQMZS^Λ͸͡Ίͱͯ͠\MVCSJEBUF^ɺ\TUSJOHS^ɺ\UJEZTFMFDU^ͳͲͷؔ਺ʹ΋ରԠ
    ˠ\EQMZS^ʹΑΔύΠϓॲཧ΍3ͷؔ਺Ͱେن໛σʔλͷૢ࡞͕࣮ݱ
    σʔλͷίϐʔΛ͢ΔաఔͰ΋
    Ϋϥ΢υ؀ڥΛؚΊͨɺ
    \BSSPX^ͰରԠ͠ͳ͍ॲཧΛ࣮ߦ͢ΔͨΊɺผͷύοέʔδͱݴޠͰσʔλΛڞ༗
    ෳ਺ϑΝΠϧ΁ͷߴ଎ͳಡΈॻ͖
    ΠϯϝϞϦͰॲཧ

    View Slide

  11. \BSSPX^Λ࢖ͬͨॲཧͷྲྀΕ
    σʔλͷಡΈࠐΈ QBSRVFU
    GFBUIFS
    UYU
    DTW
    KTPO
    UTW
    📄
    σʔλϑϨʔϜͱͯ͠ฦΓ஋ΛಘΔ
    \EQMZS^ͳͲͷؔ਺ʹΑΔॲཧ
    d <- read_parquet("data/zoo.parquet.parquet",


    as_data_frame = FALSE)
    result <- d |>


    select(name, taxon, body_length_cm) |>


    filter(taxon %in% c("霊長類", "齧歯類", "鳥類")) |>


    group_by(taxon) |>


    summarise(body_length_mean = mean(body_length_cm))
    result |>


    collect()
    ू໿͞ΕͨσʔλΛ෼ੳɾՄࢹԽ
    σʔλΛϝϞϦ্ʹಡΈࠐΉ
    ஗ԆධՁ

    View Slide

  12. \EVDLEC^ύοέʔδ
    \BSSPX^ଆͰ\EVDLEC^ͱͷ࿈ܞͷͨΊʹto_duckdb() Λఏڙ
    ಈ࡞͢ΔίϯϐϡʔλʔͷϝϞϦ্Ͱಈ࡞͢ΔܰྔσʔλϕʔεγεςϜ
    Φϥϯμͷࠃཱݚڀॴͷϝϯόʔ͕த৺ʹͳͬͯ։ൃ
    $Ͱॻ͔Εͨ"1*Λଟݴޠʹϥοϓͯ͠ఏڙ
    \EVDLEC^͸ͦͷ͏ͪͷ3ݴޠ޲͚ϥούʔύοέʔδ
    ˠ\BSSPX^ଆͰରԠ͠ͳ͍Ұ෦ͷؔ਺Λ࣮ߦͰ͖Δ
    \%#*^ʹΑͬͯσʔλϑϨʔϜΛૢ࡞

    View Slide

  13. \BSSPX^͕ରԠ͠ͳ͍ؔ਺ʁ
    list_compute_functions()
    "SSPXRVFSZFOHJOF

    લड़ͷͱ͓Γɺ\BSSPX^Ͱ͸"SSPX$΁ͷΠϯλʔϑΣΠεΛఏڙ
    \EQMZS^ͷؔ਺΋ར༻Մೳ
    ?acero
    ˠ͢΂ͯͷ3ͷؔ਺͕\BSSPX^Ͱ࣮ߦՄೳͳΘ͚Ͱ͸ͳ͍
    d <-


    read_parquet(here::here("data/typeB/36_tokushima/year=2022/month=10/part-0.parquet"),


    as_data_frame = FALSE)
    ࣍ͷεϥΠυͰྫΛࣔͨ͢ΊʹBSSPXΦϒδΣΫτΛ༻ҙ
    EBUBGSBNFʹม׵ͯ͠ॲཧPSΤϥʔ

    View Slide

  14. EBUBGSBNFʹม׵ͯ͠ॲཧ
    d |>


    select(datetime) |>


    mutate(is_jholiday = zipangu::is_jholiday(datetime))


    #> Warning: Expression


    #> zipangu::is_jholiday(datetime) not supported in Arrow;


    #> pulling data into R


    #> datetime is_jholiday


    #> 1 2022-10-01 FALSE


    #> 2 2022-10-01 FALSE


    #> 3 2022-10-01 FALSE


    #> 4 2022-10-01 FALSE


    #> 5 2022-10-01 FALSE


    #> 6 2022-10-01 FALSE
    େ͖͍σʔλͷ··ͩͱॲཧʹ͕͔͔࣌ؒΔ͔΋

    View Slide

  15. ΤϥʔͰॲཧΛఀࢭ
    ˠ\EVDLEC^΁ͷม׵ͰରԠ
    d |>


    group_by(location_no) |>


    slice_min(order_by = traffic, n = 1) |>


    collect()


    #> Error: Slicing grouped data not supported in Arrow
    d |>


    group_by(location_no) |>


    to_duckdb() |>


    slice_min(order_by = traffic, n = 1) |>


    collect()


    #> # A tibble: 25,237 × 10


    #> # Groups: location_no [274]


    #> datetime source_…¹ locat…² locat…³ meshc…⁴ link_…⁵ link_no traffic


    #>


    #> 1 2022-10-29 17:50:00 3028 54 西ノ丸… 513404 2 662 2


    #> 2 2022-10-29 19:50:00 3028 54 西ノ丸… 513404 2 662 2

    View Slide

  16. \BSSPX^ύοέʔδͰͷ
    1BSRVFUϑΝΠϧͷѻ͍

    View Slide

  17. "QBDIF1BSRVFU
    େن໛σʔλΛѻ͏ͨΊʹ޿͘࢖ΘΕ͍ͯΔඪ४తͳܗࣜΛఏڙ
    ྻࢦ޲σʔλΛѹॖ͞ΕͨόΠφϦܗࣜͰอଘɺ$47ͳͲͷඇѹॖςΩετܗࣜ
    ͱൺֱͯ͠ϑΝΠϧαΠζ͕খ͘͞ͳΔ
    IUUQTQBSRVFUBQBDIFPSH
    ޮ཰తͳྻσʔλදݱͷར఺Λར༻Ͱ͖ΔΑ͏ʹઃܭ
    ݩ͸)BEPPQΤίγεςϜͷதͰػೳ͢ΔΑ͏ʹ։ൃ

    View Slide

  18. QBSRVSUͷಡΈࠐΈ୯ҰϑΝΠϧ
    "844ɺ(PPHMF$MPVE4UPSBHF (4$
    ͔ΒͷಡΈࠐΈ΋0,
    read_parquet("data/typeB/36_tokushima/year=2022/month=10/part-0.parquet")
    σʔλϑϨʔϜͱͯ͠ಡΈࠐΈʢσϑΥϧτʣ
    "SSPX5BCMFͱͯ͠ಡΈࠐΈˠΠϯϝϞϦͰॲཧՄೳ
    read_parquet("s3://[access_key:secret_key@]bucket/path[?region=]")
    read_parquet("gs://anonymous@bucket/path")
    IUUQTBSSPXBQBDIFPSHEPDTSBSUJDMFTGTIUNM
    read_parquet("data/typeB/36_tokushima/year=2022/month=10/part-0.parquet",


    as_data_frame = FALSE,


    # tidyselectでの列選択


    col_select = c(datetime, ends_with("no")))

    View Slide

  19. QBSRVSUͷಡΈࠐΈෳ਺ϑΝΠϧ
    ڞ௨ͷྻ഑ྻΛ΋ͭෳ਺ͷQBSRVFUϑΝΠϧΛಡΈࠐΉʢ͜ͷஈ֊Ͱ͸εΩʔϚͷΈʣ
    open_dataset(here::here("data/typeB/13_tokyo/year=2021/"))


    #> FileSystemDataset with 12 Parquet files


    #> datetime: timestamp[us, tz=Asia/Tokyo]


    #> source_code: string


    #> location_no: int32


    #> location_name: string


    #> meshcode10km: string


    #> link_type: int32


    #> link_no: int32


    #> traffic: int32


    #> to_link_end_10m: string


    #> link_ver: int32


    #> month: int32


    #>


    #> See $metadata for additional Schema metadata
    ݄ຖʹ෼͔ΕͨݸͷϑΝΠϧ
    DTWͰ΋0,
    open_dataset(..., format = "csv")

    View Slide

  20. ύʔςΟγϣχϯά
    Ұͭͷେ͖ͳσʔληοτΛෳ਺ͷࡉ͔ͳϑΝΠϧʹ෼ׂ͢Δઓུ
    )JWFܗࣜʜLFZWBMVF
    ඞཁͳσʔλʹૉૣ͘ΞΫηεɺॲཧͷෛ୲͕ܰݮ
    ෼ׂΛత֬ʹߦ͏͜ͱ͕ॏཁ ྫ͑͹ʜ೥ɺ݄ʢʣɺ౎ಓ෎ݝʢʣ
    d |>


    write_dataset(path = "data",


    partitioning = c("year", "month"))
    σʔληοτΛ ೥ɺ݄ʹ෼͚ͯอଘ
    EBUBZFBSNPOUIQBSUQBSRVFU
    NPOUIQBSUQBSRVFU
    NPOUIQBSUQBSRVFU
    ʜ
    EBUBZFBSNPOUIQBSUQBSRVFU
    NPOUIQBSUQBSRVFU
    NPOUIQBSUQBSRVFU

    View Slide

  21. εΩʔϚʢܕͷࢦఆʣ
    ྻࢦ޲ͷσʔλܗࣜʜྻ͝ͱʹܾ·ͬͨܕʢ·ͱ·ͬͨ഑ஔʣ͕͋Δ
    ๛෋ͳσʔλλΠϓͰྻ഑ྻͷ஋Λఆٛ͢Δ
    Ϗοτ੔਺ʢJOUʣͱϏοτ੔਺ JOU
    ͸۠ผ͞ΕΔ
    open_dataset("data",


    schema = schema(


    datetime = timestamp(unit = "ms",


    timezone = "Asia/Tokyo"),


    source_code = utf8(),


    location_no = int32(),


    …))


    #> FileSystemDataset with 12 Parquet files


    #> datetime: timestamp[ms, tz=Asia/Tokyo]


    #> source_code: string


    #> location_no: int32


    #> location_name: string


    #> ...
    IUUQTBSSPXBQBDIFPSHEPDTSSFGFSFODFEBUBUZQFIUNM

    View Slide

  22. \BSSPX^ύοέʔδʹΑΔ
    େن໛σʔλͷॲཧ

    View Slide

  23. Ұൠಓ࿏ͷஅ໘ަ௨ྔ৘ใσʔλ
    ެӹࡒஂ๏ਓ೔ຊಓ࿏ަ௨৘ใηϯλʔ͕Φʔϓϯσʔλͱͯ͠ެ։
    ֤౎ಓ෎ݝܯ࡯͕ं྆ײ஌ثͳͲͷܭଌػثͰऩूͨ͠அ໘ަ௨ྔʹؔ͢Δɹɹ
    ৘ใΛܯ࡯ிʹ͓͍ͯͱΓ·ͱΊͨ΋ͷ
    ΫϦΤΠςΟϒίϞϯζදࣔࠃࡍ $$#:

    IUUQTXXXKBSUJDPSKQ
    ˞શࠃ͢΂ͯͷ࣍ϝογϡΛɹ
    ໢ཏ͢ΔΘ͚Ͱ͸ͳ͍

    View Slide

  24. Ұൠಓ࿏ͷஅ໘ަ௨ྔ৘ใσʔλ
    [JQϑΝΠϧల։ޙͷDTWΛEBUBUBCMFͱͯ͠ಡΈࠐΉؔ਺Λ༻ҙ
    IUUQTHJUIVCDPNVSJCPKBSUJDS
    remotes::install_github("uribo/jarticr")


    dplyr::glimpse(


    jarticr::read_jartic_traffic("data-raw/typeB_tokushima_2022_10/徳島県警_202210.csv")


    )


    #> Rows: 2,384,773


    #> Columns: 10


    #> $ datetime 2022-10-01, 2022-10-01, 2022-10-01, 2022-10-01, 2022-…


    #> $ source_code "3028", "3028", "3028", "3028", "3028", "3028", "3028"…


    #> $ location_no 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…


    #> $ location_name "徳島本町→県庁前", "県庁前→徳島本町", "新助任橋北詰→徳…


    #> $ meshcode10km "513404", "513404", "513404", "513404", "513404", "513…


    #> $ link_type 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …


    #> $ link_no 710, 705, 681, 676, 679, 678, 11, 375, 6, 5, 6, 5, 842…


    #> $ traffic 21, 24, 25, 27, 25, 28, 30, 16, 42, 42, 27, 25, 16, 26…


    #> $ to_link_end_10m "6", "6", "11", "27", "21", "29", "26", "8", "7", "92"…


    #> $ link_ver 202100, 202100, 202100, 202100, 202100, 202100, 202100…

    View Slide

  25. ݕূσʔλ αΠζͷҟͳΔσʔλΛछ༻ҙ
    -BSHF
    4NBMM
    શࠃ
    .FEJVN
    )VHF શࠃ ೥ؒ
    ೥ؒ
    ౦ژ౎ ೥ؒ
    ౦ژ౎ ϲ݄ؒ
    ߦ਺ ྻ਺
    σʔλαΠζ
    QBSRVFU
    ஍Ҭ ظؒ
    ϑΝΠϧαΠζ
    DTW

    ԯສ
    (# .#
    (# (#


    ԯສ (#
    ԯສ (#
    (#
    (#
    -BSHFɺ)VHFͷDTWϑΝΠϧαΠζ͸[JQѹॖ࣌ͷେ͖͞
    ʢల։ޙ͸͞Βʹେ͖͍ʣ

    View Slide

  26. ݕূ؀ڥ
    "QQMF..BD
    3
    ϝϞϦ(#ʢݱߦͷ.#1Ͱ͸࠷େʣ
    library(arrow) # 10.0.1


    library(readr) # 2.1.3


    library(data.table) # 1.14.6


    library(dplyr) # 1.0.10


    library(dtplyr) # 1.2.2


    library(duckdb) # 0.6.1
    ͍ͣΕͷύοέʔδ΋$3"/࠷৽൛

    View Slide

  27. ݕূͭͷॲཧΛൺֱ
    άϧʔϓԽͱूܭ
    ಡΈࠐΈ
    ݁߹
    DTWϑΝΠϧͷಡΈࠐΈ଎౓
    readr::read_csv() data.table::fread()
    arrow::read_csv_arrow()
    \BSSPX^
    \EQMZS^
    \EBUBUBCMF^
    \EVDLEC^ʜBSSPXΦϒδΣΫτ͔Βม׵
    \EUQMZS^ʜEBUBUBCMFΦϒδΣΫτΛEQMZSͷؔ਺Ͱॲཧ
    όοΫΤϯυͰ\EBUBUBCMF^͕ػೳ

    View Slide

  28. ݁ՌϑΝΠϧͷಡΈࠐΈ
    4NBMM
    EBUBUBCMF
    🏅

    View Slide

  29. ݁ՌϑΝΠϧͷಡΈࠐΈ
    .FEJVN
    EBUBUBCMF
    🏅

    View Slide

  30. ݁ՌϑΝΠϧͷಡΈࠐΈ
    -BSHF )VHFͷDTW༻ҙͰ͖ͣ
    ಉن໛ͷQBSRVSUϑΝΠϧΛ ͰಡΈࠐΉʜ0,
    arrow::open_dataset()
    # 全国1年分(2021年)


    traffic_large <-


    open_dataset(here::here("data/typeB/"),


    schema = jartic_typeB_schema) |>


    filter(year == 2021)


    dim(traffic_large)


    #> [1] 3923767686 12


    lobstr::obj_size(traffic_large)


    #> 487.10 kB
    # 全国4年分


    traffic_huge <-


    open_dataset(here::here("data/typeB/"),


    schema = jartic_typeB_schema)


    dim(traffic_huge)


    #> [1] 15677276121 12


    lobstr::obj_size(traffic_huge)


    #> 261.59 kB
    -BSHF )VHF
    ॲཧͷ಺༰ʹΑͬͯ͸݁ՌΛಘΒΕΔঢ়ଶ

    View Slide

  31. ݁ՌάϧʔϓԽͱूܭ
    BSSPX
    🏅
    4NBMM

    View Slide

  32. ݁ՌάϧʔϓԽͱूܭ
    BSSPX
    🏅
    .FEJVN

    View Slide

  33. ݁ՌάϧʔϓԽͱूܭ
    छͷσʔλαΠζͷ݁ՌΛൺֱ
    σʔλαΠζʹԠ࣮ͯ͡ߦ࣌ؒ΋૿Ճ

    View Slide

  34. ݁Ռ݁߹
    EQMZS
    🏅
    4NBMM

    View Slide

  35. ݁Ռ݁߹
    BSSPX
    🏅
    .FEJVN

    View Slide

  36. ݁Ռ݁߹
    -BSHF )VHFͱ΋ʹࣦഊ
    fi
    MUFS΍EJTUJODUΛͤͣʹ݁߹ͨͨ͠Ίʁ
    tictoc::tic();


    traffic_large |>


    distinct(location_no, meshcode10km) |>


    left_join(arrow_mesh10km,


    by = c("meshcode10km" = "meshcode10km")) |>


    collect();


    tictoc::toc()


    #> 16.746 sec elapsed
    tictoc::tic();


    traffic_huge |>


    distinct(location_no, meshcode10km) |>


    left_join(arrow_mesh10km,


    by = c("meshcode10km" = "meshcode10km")) |>


    collect();


    tictoc::toc()


    #> 67.593 sec elapsed
    EJTUJODUΛڬΊ͹0,
    -BSHF
    )VHF

    View Slide

  37. ॴݟ
    ύʔςΟγϣχϯάͷج४ͷܾΊํ͕ॏཁ
    \BSSPX^Ͱେن໛ͳσʔλΛॲཧͰ͖Δʢগͳ͘ͱ΋ΠϯϝϞϦͷঢ়ଶʹʣ
    େن໛ͳϑΝΠϧ͸QBSRVFUͰอଘ͓ͯ͘͠ͷ͕ྑͦ͞͏
    ͲΜͳͱ͖ʢαΠζɾॲཧ಺༰ʣͰ΋\BSSPX^͕ϕετͳબ୒ࢶͰ͸ͳ͍
    αϙʔτ͠ͳ͍ؔ਺ˠ\EVDLEC^΁౉͢
    \EBUBUBCMF^ɺ\EUQMZS^΋ؤு͍ͬͯΔ

    View Slide

  38. ँࣙ
    ͜ͷ৔ΛआΓͯײँਃ্͛͠·͢
    ɻ
    ͜ͷൃද͸(JU)VC4QPOTPSTͰࢧԉͯͩ͘͠͞ΔํʑͷఏڙͰ͓ૹΓ͠·ͨ͠ɻ
    katsurakob
    siero5335
    ito4303
    niszet
    Nishina-NIES
    teram
    onagi
    takehikoihayashi
    ytknzw
    IUUQTHJUIVCDPNTQPOTPSTVSJCP

    View Slide

  39. ࢀߟจݙɾ63-
    ͦΖͦΖ3Ϣʔβʔ΋"QBDIF"SSPXͰ1BSRVFUΛ࢖ͬͯΈ·ͤΜ͔ʁ

    3GPS%BUB4DJFODF OEFEJUJPO


    "QBDIF"SSPXϑΥʔϚοτ͸ͳͥ଎͍ͷ͔

    ʲ3ʳ"QBDIF"SSPXͱEVDLECΛࢼͯ͠ΈΔ2JJUB

    "QBDIF"SSPXَ͸͑͑ʂ͜ͷ··$47શ෦1BSRVFUʹม׵͍ͯ͜͠͏ͥʂ

    "QBDIF"SSPX3$PPLCPPL

    -BSHFS5IBO.FNPSZ%BUB8PSL
    fl
    PXTXJUI"QBDIF"SSPX

    IUUQTOPUDIBJOFEIBUFOBCMPHDPNFOUSZ
    IUUQTSETIBEMFZO[BSSPXIUNM
    IUUQTTMJEFSBCCJUTIPDLFSPSHBVUIPSTLPVECUFDITIPXDBTFPOMJOF
    IUUQTFJUTVQJHJUIVCJPUPLZPSTMJEFUPLZPS@
    IUUQTBSSPXVTFSOFUMJGZBQQ
    IUUQTBSSPXBQBDIFPSHDPPLCPPLSJOEFYIUNM
    IUUQTRJJUBDPNFJUTVQJJUFNTDFFCGCFFEF

    View Slide