Upgrade to Pro — share decks privately, control downloads, hide ads and more …

テキストデータの連結

papiron
April 22, 2017

 テキストデータの連結

joinコマンドを中心とした、キーを元に2つのテキストデータを連結する処理。実際に業務で必要に迫られた事例も。イベントレポートブログはこちら。
http://papiro.hatenablog.jp/entry/2017/04/24/002533

papiron

April 22, 2017
Tweet

More Decks by papiron

Other Decks in Programming

Transcript

  1. ςΩετσʔλͷ࿈݁
    USP༑ͷձ @papiron

    View Slide

  2. RDBͷςʔϒϧ࿈݁
    • ෳ਺ͷςʔϒϧΛ࿈݁
    • SQLͷJOINͰ
    • PKͱFK
    • ಺෦݁߹ɺ֎෦݁߹ɺ׬શ݁߹

    View Slide

  3. ςΩετϑΝΠϧͷ࿈݁
    • ෳ਺ͷςΩετϑΝΠϧΛ࿈݁
    • joinίϚϯυͳͲͰ
    • ಛఆͷΧϥϜΛΩʔ
    • ಺෦݁߹ɺ֎෦݁߹ɺ׬શ݁߹

    View Slide

  4. ࿈݁ʹؔ͢ΔίϚϯυ
    • joinίϚϯυ
    • pasteίϚϯυ
    • TukubaiͷjoinܥɺloopjͳͲ
    • ͦͷଞิॿతͳίϚϯυ
    • commͳͲ

    View Slide

  5. ͓୊
    • ͓୊1
    • ϚελͱτϥϯβΫγϣϯͷ࿈݁
    • ಺෦݁߹
    • ͓୊2
    • ิ׬Λ൐͏σʔλͷ࿈݁
    • ֎෦݁߹
    ͓୊2͸࣮ࡍͷۀ຿Ͱ
    ඞཁʹഭΒΕͨࣄྫ

    View Slide

  6. ͓୊1
    01 suzuki
    02 saitou
    03 yamada
    04 isono
    05 toudou
    0001 05 20170101 600
    0002 04 20170101 800
    0003 01 20170101 500
    0004 02 20170102 900
    0005 03 20170102 800
    0006 02 20170102 100
    0007 02 20170102 900
    0008 05 20170102 600
    0009 03 20170103 400
    0010 01 20170103 200
    0011 03 20170103 900
    0012 03 20170103 900
    0013 01 20170103 500
    0014 03 20170103 600
    0015 04 20170103 900
    0016 05 20170104 700
    0017 04 20170104 300
    0018 02 20170104 700
    0019 05 20170104 400
    0020 01 20170104 700
    MASTER
    id,୲౰ऀ
    TRAN
    id,୲౰ऀid,೔෇,ൢചֹۚ

    View Slide

  7. ͓୊1.1
    $ sort -s -k2,2 TRAN | join -1 1 -2 2 MASTER -
    01 suzuki 0003 20170101 500
    01 suzuki 0010 20170103 200
    01 suzuki 0013 20170103 500
    01 suzuki 0020 20170104 700
    02 saitou 0004 20170102 900
    02 saitou 0006 20170102 100
    02 saitou 0007 20170102 900
    02 saitou 0018 20170104 700
    03 yamada 0005 20170102 800
    03 yamada 0009 20170103 400
    03 yamada 0011 20170103 900
    03 yamada 0012 20170103 900
    03 yamada 0014 20170103 600
    04 isono 0002 20170101 800
    04 isono 0015 20170103 900
    04 isono 0017 20170104 300
    05 toudou 0001 20170101 600
    05 toudou 0008 20170102 600
    05 toudou 0016 20170104 700
    05 toudou 0019 20170104 400
    جຊతͳ݁߹ʢΩʔɺ1൪໨ͷϑΝΠϧɺ2൪໨ͷϑΝΠϧͷॱ൪ͰฒͿʣ
    ࿈݁͢Δσʔλ͸
    ΩʔͰฒ΂ସ͑Λ
    ͢Δඞཁ͋Γ

    View Slide

  8. ͓୊1.1a
    $ sort -s -k2,2 TRAN | join1 key=2 MASTER -
    0003 01 suzuki 20170101 500
    0010 01 suzuki 20170103 200
    0013 01 suzuki 20170103 500
    0020 01 suzuki 20170104 700
    0004 02 saitou 20170102 900
    0006 02 saitou 20170102 100
    0007 02 saitou 20170102 900
    0018 02 saitou 20170104 700
    0005 03 yamada 20170102 800
    0009 03 yamada 20170103 400
    0011 03 yamada 20170103 900
    0012 03 yamada 20170103 900
    0014 03 yamada 20170103 600
    0002 04 isono 20170101 800
    0015 04 isono 20170103 900
    0017 04 isono 20170104 300
    0001 05 toudou 20170101 600
    0008 05 toudou 20170102 600
    0016 05 toudou 20170104 700
    0019 05 toudou 20170104 400
    Tukubaiͷjoin1Λ࢖͏ʢϚελ͕ࢦఆͨ͠Ωʔͷ࣍ͷΧϥϜ΁݁߹ʣ
    ࿈݁͢Δσʔλ͸
    ΩʔͰฒ΂ସ͑Λ
    ͢Δඞཁ͋Γ

    View Slide

  9. ͓୊1.1b
    $ cjoin1 key=2 MASTER TRAN
    0001 05 toudou 20170101 600
    0002 04 isono 20170101 800
    0003 01 suzuki 20170101 500
    0004 02 saitou 20170102 900
    0005 03 yamada 20170102 800
    0006 02 saitou 20170102 100
    0007 02 saitou 20170102 900
    0008 05 toudou 20170102 600
    0009 03 yamada 20170103 400
    0010 01 suzuki 20170103 200
    0011 03 yamada 20170103 900
    0012 03 yamada 20170103 900
    0013 01 suzuki 20170103 500
    0014 03 yamada 20170103 600
    0015 04 isono 20170103 900
    0016 05 toudou 20170104 700
    0017 04 isono 20170104 300
    0018 02 saitou 20170104 700
    0019 05 toudou 20170104 400
    0020 01 suzuki 20170104 700
    Tukubaiͷcjoin1Λ࢖͏ʢϚελ͕ࢦఆͨ͠Ωʔͷ࣍ͷΧϥϜ΁݁߹ʣ
    ࿈݁͢Δσʔλ͸
    ΩʔͰฒ΂ସ͑Λ
    ͢Δඞཁ͕ͳ͍

    View Slide

  10. ͓୊1.2
    $ sort -s -k2,2 TRAN | join -1 1 -2 2 -o 2.1,0,1.2,2.3,2.4 MASTER -
    0003 01 suzuki 20170101 500
    0010 01 suzuki 20170103 200
    0013 01 suzuki 20170103 500
    0020 01 suzuki 20170104 700
    0004 02 saitou 20170102 900
    0006 02 saitou 20170102 100
    0007 02 saitou 20170102 900
    0018 02 saitou 20170104 700
    0005 03 yamada 20170102 800
    0009 03 yamada 20170103 400
    0011 03 yamada 20170103 900
    0012 03 yamada 20170103 900
    0014 03 yamada 20170103 600
    0002 04 isono 20170101 800
    0015 04 isono 20170103 900
    0017 04 isono 20170104 300
    0001 05 toudou 20170101 600
    0008 05 toudou 20170102 600
    0016 05 toudou 20170104 700
    0019 05 toudou 20170104 400
    ΧϥϜͷฒͼॱΛࢦఆͯ݁͠߹ʢTRANͷ୲౰ऀidͷՕॴʹ୲౰ऀΛೖΕΔʣ
    -oΦϓγϣϯͰ
    ΩʔͷΧϥϜ͸0ͱࢦఆ
    Tukubaiͷjoin1΍cjoin1ͱ
    ಉ͡ฒͼΛࢦఆ

    View Slide

  11. ͓୊1.3
    $ awk '{a[$2" "$3]+=$4}END{for(v in a)print v,a[v] }' TRAN
    | sort -s -k1,1 | join -1 1 -2 1 MASTER -
    01 suzuki 20170101 500
    01 suzuki 20170103 700
    01 suzuki 20170104 700
    02 saitou 20170102 1900
    02 saitou 20170104 700
    03 yamada 20170102 800
    03 yamada 20170103 2800
    04 isono 20170101 800
    04 isono 20170103 900
    04 isono 20170104 300
    05 toudou 20170101 600
    05 toudou 20170102 600
    05 toudou 20170104 1100
    ୲౰ऀͱച্೔ຖʹूܭʢawkͷ࿈૝഑ྻΛར༻ʣ
    awkͷ࿈૝഑ྻ͸
    forϧʔϓͰग़ྗ͢Δࡍ
    ฒ΂ସ͕͑ඞཁ

    View Slide

  12. ͓୊1.3a
    $ sort -s -k2,2 TRAN | sm2 2 3 4 4 | join1 key=1 MASTER
    01 suzuki 20170101 500
    01 suzuki 20170103 700
    01 suzuki 20170104 700
    02 saitou 20170102 1900
    02 saitou 20170104 700
    03 yamada 20170102 800
    03 yamada 20170103 2800
    04 isono 20170101 800
    04 isono 20170103 900
    04 isono 20170104 300
    05 toudou 20170101 600
    05 toudou 20170102 600
    05 toudou 20170104 1100
    ୲౰ऀͱച্೔ຖʹूܭʢTukubaiͷsm2ͱjoin1Λར༻ʣ
    ूܭॲཧ͕
    sm2ίϚϯυͰ
    εοΩϦॻ͚Δ

    View Slide

  13. ͓୊2 01 xxxx.0c4d.0c44
    02 xxxx.0d46.f3c3
    03 xxxx.0d17.7313
    04 xxxx.0d81.33a8
    05 xxxx.0d17.9658
    06 xxxx.0c4d.09dc
    07 xxxx.0a69.b7b1
    08 xxxx.0d81.1dac
    09 xxxx.0fff.d828
    10 xxxx.0d17.7477
    11 xxxx.0d46.33c3
    12 xxxx.0148.3c58
    13 xxxx.0d46.f3c6
    14 xxxx.0d81.1c9c
    15 xxxx.04f8.c8e5
    16 xxxx.0d46.7eb6
    17 xxxx.0d42.5cc4
    18 xxxx.108b.cc1c
    19 xxxx.0fff.ce94
    20 xxxx.02b6.4c24
    xxxx.17d0.2c07 1914
    xxxx.0d81.33a8 2353
    xxxx.0d81.1c9c 2275
    xxxx.0d46.33c3 2500
    xxxx.0fff.d828 3766
    xxxx.13a2.8858 3235
    xxxx.13aa.091e 3119
    xxxx.0d81.1dac 3382
    xxxx.0c4d.0c44 3618
    xxxx.0d46.7eb6 3443
    xxxx.0d46.f3c3 3431
    xxxx.1301.bac4 1607
    xxxx.0d46.f3c6 3063
    xxxx.04f8.c8e5 3120
    xxxx.0d55.9b9f 3119
    DEVICELIST
    id,MACΞυϨε
    MEASUREMENT
    MACΞυϨε,ܭଌ஋
    MACΞυϨε͸ޓ͍ʹ଍Γͳ͍΋ͷ͕͋Δ

    View Slide

  14. ͓୊2.1
    $ comm <(awk '{print $2}' DEVICELIST | sort) <(awk '{print $1}' MEASUREMENT | sort)
    xxxx.0148.3c58
    xxxx.02b6.4c24
    xxxx.04f8.c8e5
    xxxx.0a69.b7b1
    xxxx.0c4d.09dc
    xxxx.0c4d.0c44
    xxxx.0d17.7313
    xxxx.0d17.7477
    xxxx.0d17.9658
    xxxx.0d42.5cc4
    xxxx.0d46.33c3
    xxxx.0d46.7eb6
    xxxx.0d46.f3c3
    xxxx.0d46.f3c6
    xxxx.0d55.9b9f
    xxxx.0d81.1c9c
    xxxx.0d81.1dac
    xxxx.0d81.33a8
    xxxx.0fff.ce94
    xxxx.0fff.d828
    xxxx.108b.cc1c
    xxxx.1301.bac4
    xxxx.13a2.8858
    xxxx.13aa.091e
    xxxx.17d0.2c07
    commίϚϯυͰ֬ೝ͢Δ࣌͸ฒ΂ସ͕͑ඞཁ
    ୈҰΧϥϜ
    ɹ1ͭ໨ͷϑΝΠϧͷΈʹଘࡏ͢Δߦ
    ୈೋΧϥϜ
    ɹ2ͭ໨ͷϑΝΠϧͷΈʹଘࡏ͢Δߦ
    ୈࡾΧϥϜ
    ɹ2ͭͷϑΝΠϧʹڞ௨ͯ͠ଘࡏ͢Δߦ
    ޓ͍ʹڞ௨͢ΔΩʔͱ଍Γͳ͍Ωʔͷ֬ೝ

    View Slide

  15. ͓୊2.2
    $ join -a 1 -1 2 <(sort -s -k2,2 DEVICELIST) <(sort -k1,1 MEASUREMENT) | sort
    -k2,2 | awk '{print $1,$3}'
    xxxx.0c4d.0c44 3618
    xxxx.0d46.f3c3 3431
    xxxx.0d17.7313
    xxxx.0d81.33a8 2353
    xxxx.0d17.9658
    xxxx.0c4d.09dc
    xxxx.0a69.b7b1
    xxxx.0d81.1dac 3382
    xxxx.0fff.d828 3766
    xxxx.0d17.7477
    xxxx.0d46.33c3 2500
    xxxx.0148.3c58
    xxxx.0d46.f3c6 3063
    xxxx.0d81.1c9c 2275
    xxxx.04f8.c8e5 3120
    xxxx.0d46.7eb6 3443
    xxxx.0d42.5cc4
    xxxx.108b.cc1c
    xxxx.0fff.ce94
    xxxx.02b6.4c24
    LEFT JOIN૬౰ͷ֎෦݁߹ʢjoinίϚϯυͷ-aΦϓγϣϯΛར༻ʣ
    Mac΍FreeBSDͷํ͸gjoinίϚϯυͰ
    DEVICELISTͷid൪߸Λ࢖ͬͯ
    ݩͷॱ൪ʹฒ΂ସ͍͑ͯΔ
    ࠷ޙʹid൪߸Λ࡟আ

    View Slide

  16. ͓୊2.2a
    $ join2 key=1 <(sort -k1,1 MEASUREMENT) <(sort -s -k2,2 DEVICELIST | self 2 1)
    | sort -k3,3 | delf 3
    xxxx.0c4d.0c44 3618
    xxxx.0d46.f3c3 3431
    xxxx.0d17.7313 _
    xxxx.0d81.33a8 2353
    xxxx.0d17.9658 _
    xxxx.0c4d.09dc _
    xxxx.0a69.b7b1 _
    xxxx.0d81.1dac 3382
    xxxx.0fff.d828 3766
    xxxx.0d17.7477 _
    xxxx.0d46.33c3 2500
    xxxx.0148.3c58 _
    xxxx.0d46.f3c6 3063
    xxxx.0d81.1c9c 2275
    xxxx.04f8.c8e5 3120
    xxxx.0d46.7eb6 3443
    xxxx.0d42.5cc4 _
    xxxx.108b.cc1c _
    xxxx.0fff.ce94 _
    xxxx.02b6.4c24 _
    LEFT JOIN૬౰ͷ֎෦݁߹ʢTukubaiͷjoin2ίϚϯυΛར༻ʣ
    TukubaiͰ͸LEFT JOIN૬౰ͷίϚϯυ͸ແ͍ɻ
    ͔͠͠σʔλΛೖΕସ͑ͨΓ͢Ε͹࣮ݱՄೳɻ
    id൪߸ͷ࡟আʹ͸delfίϚϯυΛར༻ɻ

    View Slide

  17. ͓୊2.3
    $ join -a 1 -a 2 -1 2 -2 1 -o 0,1.1,2.2 -e @ <(sort -k2,2 DEVICELIST)
    <(sort -k1,1 MEASUREMENT)
    xxxx.0148.3c58 12 @
    xxxx.02b6.4c24 20 @
    xxxx.04f8.c8e5 15 3120
    xxxx.0a69.b7b1 07 @
    xxxx.0c4d.09dc 06 @
    xxxx.0c4d.0c44 01 3618
    xxxx.0d17.7313 03 @
    xxxx.0d17.7477 10 @
    xxxx.0d17.9658 05 @
    xxxx.0d42.5cc4 17 @
    xxxx.0d46.33c3 11 2500
    xxxx.0d46.7eb6 16 3443
    xxxx.0d46.f3c3 02 3431
    xxxx.0d46.f3c6 13 3063
    xxxx.0d55.9b9f @ 3119
    xxxx.0d81.1c9c 14 2275
    xxxx.0d81.1dac 08 3382
    xxxx.0d81.33a8 04 2353
    xxxx.0fff.ce94 19 @
    xxxx.0fff.d828 09 3766
    xxxx.108b.cc1c 18 @
    xxxx.1301.bac4 @ 1607
    xxxx.13a2.8858 @ 3235
    xxxx.13aa.091e @ 3119
    xxxx.17d0.2c07 @ 1914
    ׬શ݁߹ʢjoinίϚϯυͷ-aΦϓγϣϯΛ2ճར༻ʣ
    -eΦϓγϣϯͰ
    ิ׬จࣈྻΛࢦఆՄೳ

    View Slide

  18. ͓୊2.3a
    $ loopj -d@ num=1 <(sort -s -k2,2 DEVICELIST | self 2 1) <(sort -k1,1
    MEASUREMENT)
    xxxx.0148.3c58 12 @
    xxxx.02b6.4c24 20 @
    xxxx.04f8.c8e5 15 3120
    xxxx.0a69.b7b1 07 @
    xxxx.0c4d.09dc 06 @
    xxxx.0c4d.0c44 01 3618
    xxxx.0d17.7313 03 @
    xxxx.0d17.7477 10 @
    xxxx.0d17.9658 05 @
    xxxx.0d42.5cc4 17 @
    xxxx.0d46.33c3 11 2500
    xxxx.0d46.7eb6 16 3443
    xxxx.0d46.f3c3 02 3431
    xxxx.0d46.f3c6 13 3063
    xxxx.0d55.9b9f @ 3119
    xxxx.0d81.1c9c 14 2275
    xxxx.0d81.1dac 08 3382
    xxxx.0d81.33a8 04 2353
    xxxx.0fff.ce94 19 @
    xxxx.0fff.d828 09 3766
    xxxx.108b.cc1c 18 @
    xxxx.1301.bac4 @ 1607
    xxxx.13a2.8858 @ 3235
    xxxx.13aa.091e @ 3119
    xxxx.17d0.2c07 @ 1914
    ׬શ݁߹ʢTukubaiͷloopjίϚϯυΛར༻ʣ
    -dΦϓγϣϯͰ
    ิ׬จࣈྻΛࢦఆՄೳ

    View Slide