$30 off During Our Annual Pro Sale. View Details »

動画像を入力とした深度推定のHW/SW協調設計によるFPGAベースの高速化手法 (ARC 2022/10)

動画像を入力とした深度推定のHW/SW協調設計によるFPGAベースの高速化手法 (ARC 2022/10)

一般社団法人情報処理学会のシステム・アーキテクチャ研究発表会 (ARC) での発表資料です (2022/10/11)。
機械学習ベースの動画像処理技術であり、深度推定タスクに対する DeepVideoMVS と呼ばれるアプリケーションを利用した FPGA 上での高速化手法についての検討を行った。
オープンソースの高位合成ツール NNgen を用いて、HW と SW の特性を最大限生かして高速化を行った結果、CPU のみの実装と比べて、60.2 倍の高速化を達成した。
・プログラムと抄録: http://sigarc.ipsj.or.jp/mtg/fy2022/arc242/
・論文 (Copyright ©2022 by IPSJ): https://projects.n-hassy.info/paper/ARC2022-10.pdf
・プロフィール: https://n-hassy.info/ja/

Nobuho Hashimoto

October 11, 2022
Tweet

More Decks by Nobuho Hashimoto

Other Decks in Research

Transcript

  1. ಈը૾Λೖྗͱͨ͠ਂ౓ਪఆͷ
    HW/SW ڠௐઃܭʹΑΔ FPGA ϕʔεͷߴ଎Խख๏
    ౦ژେֶ େֶӃ৘ใཧ޻ֶܥݚڀՊ
    ίϯϐϡʔλՊֶઐ߈
    ڮຊ ৴าɾߴલా ৳໵
    2022/10/11
    γεςϜɾΞʔΩςΫνϟݚڀൃදձ (ARC)

    View Slide

  2. ໨࣍
    1. എܠ
    2. ఏҊख๏
    3. ࣮ݧɾ݁Ռ
    4. ·ͱΊ
    2022/10/17 1

    View Slide

  3. ໨࣍
    1. എܠ
    1. ਂ౓ਪఆ
    2. DNN Λ༻͍ͨਂ౓ਪఆख๏
    3. DeepVideoMVS
    4. ํ਑
    2. ఏҊख๏
    3. ࣮ݧɾ݁Ռ
    4. ·ͱΊ
    2022/10/17 2

    View Slide

  4. ਂ౓ਪఆ
    Χϝϥͱର৅෺ͷڑ཭Λਪఆ
    ❖ࣗ཯૸ߦͷͨΊͷφϏήʔγϣϯ΍ ARͳͲ෯޿͍Ԡ༻ઌ
    ❖ಈը૾ॲཧಛ༗ͷॲཧͱ DNN ͔ΒͳΔෳ߹త͔ͭෳࡶͳॲཧ
    2022/10/17 3
    εςϨΦϚονϯά
    ࢹࠩΛਪఆ
    ࡾ֯ଌྔ
    Χϝϥ͔Β֤఺·Ͱͷ
    ڑ཭Λܭࢉ
    ೋຕҎ্ͷը૾ ࢹࠩɿ͋Δը૾ͷ֤఺ͱ
    ͦΕʹରԠ͢Δ΋͏Ұͭ
    ͷը૾಺ͷ఺ͷҐஔͷࠩ
    ਂ౓Ϛοϓ
    ਂ౓ਪఆͷྲྀΕ

    View Slide

  5. 1.2. DNN Λ༻͍ͨਂ౓ਪఆख๏
    ྨࣅ౓Λݩʹߴਫ਼౓Ͱೋຕͷը૾͔ΒରԠ఺ΛٻΊΔͷ͸ࠔ೉
    ͳͷͰɺDNN Λ༻͍ͨख๏͕ଟ਺ଘࡏ
    ❖ DeepV2D [Teed et al., 2020], DeepVideoMVS [Duzceker et al., 2020],
    HITNet [Tankovich et al., 2021], Open4D [Bansal et al., 2020],
    NeRF [Mildenhall et al., 2020], NSFF [Li et al., 2021]
    ࠓճ͸ DeepVideoMVS Λ࢖༻
    ❖Ұ୆ͷ୯؟ΧϝϥͰࡱӨͨ͠ಈը૾ͱΧϝϥͷϙʔζΛೖྗ
    ❖γʔϯ͝ͱʹֶश͠௚ͣ͞ʹਪ࿦͕Մೳ
    ❖௿ফඅిྗͷ૊ΈࠐΈ؀ڥͰϦΞϧλΠϜʹ͍ۙ଎౓Ͱ
    ಈ࡞Ͱ͖ΔՄೳੑ͕ߴ͍
    2022/10/17 4

    View Slide

  6. Input Frame
    Keyframe Buffer
    (KB)
    Output
    Depth Map
    cell state
    hidden state
    Input Pose hidden state
    correction
    Feature
    Extractor
    (FE)
    Cost
    Volume
    Encoder
    (CVE)
    Cost
    Volume
    Decoder
    (CVD)
    Conv
    LSTM
    (CL)
    Feature
    Shrinker
    (FS)
    Cost
    Volume
    Fusion
    (CVF)
    0.3 -0.4 0.9 -2.0
    -0.0 -0.9 -0.4 0.1
    1.0 0.1 -0.3 4.7
    0 0 0 1 KB.get
    KB.add
    pre-
    process
    post-
    process
    1.3. DeepVideoMVS
    2022/10/17 5
    DeepVideoMVS ͷߏ੒ਤ

    View Slide

  7. Input Frame
    Keyframe Buffer
    (KB)
    Output
    Depth Map
    cell state
    hidden state
    Input Pose hidden state
    correction
    Feature
    Extractor
    (FE)
    Cost
    Volume
    Encoder
    (CVE)
    Cost
    Volume
    Decoder
    (CVD)
    Conv
    LSTM
    (CL)
    Feature
    Shrinker
    (FS)
    Cost
    Volume
    Fusion
    (CVF)
    0.3 -0.4 0.9 -2.0
    -0.0 -0.9 -0.4 0.1
    1.0 0.1 -0.3 4.7
    0 0 0 1 KB.get
    KB.add
    pre-
    process
    post-
    process
    1.3. DeepVideoMVS
    2022/10/17 6
    DeepVideoMVS ͷߏ੒ਤ
    RNN Λ༻͍Δ͜ͱͰɺ
    ࣌ܥྻ৘ใΛར༻Մೳ
    ಛ௃ྔͷநग़
    ίετϘϦϡʔϜ
    (఺ಉ࢜ͷରԠͷ
    ౓߹͍) ͷௐ੔

    View Slide

  8. Input Frame
    Keyframe Buffer
    (KB)
    Output
    Depth Map
    cell state
    hidden state
    Input Pose hidden state
    correction
    Feature
    Extractor
    (FE)
    Cost
    Volume
    Encoder
    (CVE)
    Cost
    Volume
    Decoder
    (CVD)
    Conv
    LSTM
    (CL)
    Feature
    Shrinker
    (FS)
    Cost
    Volume
    Fusion
    (CVF)
    0.3 -0.4 0.9 -2.0
    -0.0 -0.9 -0.4 0.1
    1.0 0.1 -0.3 4.7
    0 0 0 1 KB.get
    KB.add
    pre-
    process
    post-
    process
    1.3. DeepVideoMVS
    2022/10/17 7
    Input Frame
    Keyframe Buffer
    (KB)
    Output
    Depth Map
    cell state
    hidden state
    Input Pose hidden state
    correction
    Feature
    Extractor
    (FS)
    Cost
    Volume
    Encoder
    (CVE)
    Cost
    Volume
    Decoder
    (CVD)
    Conv
    LSTM
    (CL)
    Feature
    Shrinker
    (FS)
    Cost
    Volume
    Fusion
    (CVF)
    0.3 -0.4 0.9 -2.0
    -0.0 -0.9 -0.4 0.1
    1.0 0.1 -0.3 4.7
    0 0 0 1 KB.get
    KB.add
    pre-
    process
    post-
    process
    ConvLSTM
    DeepVideoMVS ͷߏ੒ਤ
    աڈʹݟͨ͜ͱͷ͋Δ
    ϙʔζͷ͍ۙϑϨʔϜΛ࠶ར༻
    Grid sampling (ޙड़)Λ
    ߦͬͯɺࢹ఺Λม׵

    View Slide

  9. 1.4. ํ਑
    ❖͢΂ͯͷॲཧΛ HW Ͱߴ଎Խ͢Ε͹Α͍Θ͚Ͱ͸ͳ͍
    Ø DNN ʹಛԽͨ͠ϋʔυ΢ΣΞΞΫηϥϨʔλͷݚڀ͸੝Μ
    Ø Ұํɺಈը૾ॲཧಛ༗ͷॲཧͱ DNN ͕૊Έ߹Θͬͨ͞
    ෳࡶͳॲཧͷߴ଎Խ͸ࠔ೉
    ❖ۙ೥ͷ SoC FPGA ͸͋Δఔ౓ߴ଎ͳ CPU ΋౥ࡌ
    Ø ͜ͷಛੑΛ͏·͘ར༻͢Δ
    2022/10/17 8
    SoC FPGA
    PL
    (HW ෦෼)
    CPU
    (SW ෦෼)
    ϝϞϦ (DRAM)
    ΠϯλʔίωΫτ (AXI όε)
    SoC FPGA ͷΠϝʔδਤ
    PL (Programmable Logic) ͱ CPU Ͱ
    ฒྻॲཧ΍σʔλͷ΍ΓऔΓ͕Մೳ

    View Slide

  10. ໨࣍
    1. എܠ
    2. ఏҊख๏
    1. ఏҊख๏ͷ֓ཁ
    2. HW/SW ڠௐઃܭ
    3. ϋʔυ΢ΣΞઃܭ
    4. ιϑτ΢ΣΞઃܭ
    5. HW/SW εέδϡʔϦϯά
    1. HW/SW ؒͰͷ௨৴ػߏ
    2. λεΫϨϕϧฒྻԽ
    3. ࣮ݧɾ݁Ռ
    4. ·ͱΊ
    2022/10/17 9

    View Slide

  11. 2.1. ఏҊख๏ͷ֓ཁ
    ҎԼͷྲྀΕͰΞΫηϥϨʔλΛઃܭ
    ❖HW/SW ڠௐઃܭ
    Ø SW Ͱ࣮૷͢΂͖ॲཧΛ HW ͱ SW ͷͦΕͧΕͷಛੑΛߟྀͯ͠ݕ౼
    ❖ϋʔυ΢ΣΞઃܭ
    Ø HW ࣮૷ʹదͨ͠ॲཧΛߦ͏ΧελϜճ࿏ΛߴҐ߹੒πʔϧ NNgen
    [https://github.com/NNgen/nngen] Λ༻͍ͯ FPGA ্ͷ PL ʹઃܭ
    ❖ιϑτ΢ΣΞઃܭ
    Ø SW ࣮૷ʹదͨ͠ॲཧΛߦ͏࠷దԽ͞ΕͨϓϩάϥϜΛ CPU ্ʹઃܭ
    ❖HW/SW εέδϡʔϦϯά
    Ø PL ͱ CPU Λฒྻʹڠௐͯ͠ಈ࡞ͤ͞Δ͜ͱͰɺ
    HW ࣮૷ͱ SW ࣮૷ͷޓ͍ͷ࣮ߦϨΠςϯγʔΛӅṭ
    2022/10/17 10

    View Slide

  12. Input Frame
    Keyframe Buffer
    (KB)
    Output
    Depth Map
    cell state
    hidden state
    Input Pose hidden state
    correction
    Feature
    Extractor
    (FE)
    Cost
    Volume
    Encoder
    (CVE)
    Cost
    Volume
    Decoder
    (CVD)
    Conv
    LSTM
    (CL)
    Feature
    Shrinker
    (FS)
    Cost
    Volume
    Fusion
    (CVF)
    0.3 -0.4 0.9 -2.0
    -0.0 -0.9 -0.4 0.1
    1.0 0.1 -0.3 4.7
    0 0 0 1 KB.get
    KB.add
    pre-
    process
    post-
    process
    2.2. HW/SW ڠௐઃܭ
    ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠
    SW ࣮૷͢ΔԋࢉΛܾఆ
    → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ
    2022/10/17 11
    Operation
    Process
    FE FS CVF CVE CL CVD
    Conv (1, 1) 33 5 0 0 0 0
    Conv (3, 1) 6 4 0 9 1 14
    Conv (3, 2) 2 0 0 3 0 0
    Conv (5, 1) 7 0 0 3 0 5
    Conv (5, 2) 3 0 0 1 0 0
    Activation (ReLU) 34 0 0 16 0 14
    Activation (sigmoid) 0 0 0 0 3 5
    Activation (ELU) 0 0 0 0 2 0
    Addition 10 4 128 0 1 0
    Multiplication 0 0 64 0 3 0
    Concatenation 0 0 0 4 1 5
    Slice 0 0 0 0 4 0
    Layer Normalization 0 0 0 0 2 9
    Upsampling (nearest) 0 4 0 0 0 0
    Upsampling (bilinear) 0 0 0 0 0 9
    Grid Sampling 0 0 128 0 0 0
    : SW ࣮૷͞ΕΔԋࢉ
    ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺
    DeepVideoMVS ͷ
    ߏ੒ਤ (࠶ܝ)

    View Slide

  13. 2.2. HW/SW ڠௐઃܭ
    ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠
    SW ࣮૷͢ΔԋࢉΛܾఆ
    → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ
    2022/10/17 12
    Operation
    Process
    FE FS CVF CVE CL CVD
    Conv (1, 1) 33 5 0 0 0 0
    Conv (3, 1) 6 4 0 9 1 14
    Conv (3, 2) 2 0 0 3 0 0
    Conv (5, 1) 7 0 0 3 0 5
    Conv (5, 2) 3 0 0 1 0 0
    Activation (ReLU) 34 0 0 16 0 14
    Activation (sigmoid) 0 0 0 0 3 5
    Activation (ELU) 0 0 0 0 2 0
    Addition 10 4 128 0 1 0
    Multiplication 0 0 64 0 3 0
    Concatenation 0 0 0 4 1 5
    Slice 0 0 0 0 4 0
    Layer Normalization 0 0 0 0 2 9
    Upsampling (nearest) 0 4 0 0 0 0
    Upsampling (bilinear) 0 0 0 0 0 9
    Grid Sampling 0 0 128 0 0 0
    : SW ࣮૷͞ΕΔԋࢉ
    ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺
    Conv
    • ܭࢉճ਺͕ଟ͍
    • HW Ͱͷߴ଎Խख๏͕਺ଟ͘ݚڀ
    ͞Ε͍ͯΔ
    • ϝϞϦΞΫηε΋͋Δఔ౓نଇత
    → HW Ͱ࣮૷

    View Slide

  14. 2.2. HW/SW ڠௐઃܭ
    ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠
    SW ࣮૷͢ΔԋࢉΛܾఆ
    → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ
    2022/10/17 13
    Operation
    Process
    FE FS CVF CVE CL CVD
    Conv (1, 1) 33 5 0 0 0 0
    Conv (3, 1) 6 4 0 9 1 14
    Conv (3, 2) 2 0 0 3 0 0
    Conv (5, 1) 7 0 0 3 0 5
    Conv (5, 2) 3 0 0 1 0 0
    Activation (ReLU) 34 0 0 16 0 14
    Activation (sigmoid) 0 0 0 0 3 5
    Activation (ELU) 0 0 0 0 2 0
    Addition 10 4 128 0 1 0
    Multiplication 0 0 64 0 3 0
    Concatenation 0 0 0 4 1 5
    Slice 0 0 0 0 4 0
    Layer Normalization 0 0 0 0 2 9
    Upsampling (nearest) 0 4 0 0 0 0
    Upsampling (bilinear) 0 0 0 0 0 9
    Grid Sampling 0 0 128 0 0 0
    : SW ࣮૷͞ΕΔԋࢉ
    ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺
    Activation ͔Β Slice ·Ͱ
    • ܭࢉճ਺΋ͦΕ΄Ͳଟ͘ͳ͘ɺ
    ܭࢉࣗମ΋୯७
    • ཁૉ͝ͱʹܭࢉͰ͖ΔͷͰɺ
    ϝϞϦΞΫηε΋ࣗ༝ʹૢ࡞Մೳ
    → HW Ͱ࣮૷

    View Slide

  15. 2.2. HW/SW ڠௐઃܭ
    ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠
    SW ࣮૷͢ΔԋࢉΛܾఆ
    → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ
    2022/10/17 14
    Operation
    Process
    FE FS CVF CVE CL CVD
    Conv (1, 1) 33 5 0 0 0 0
    Conv (3, 1) 6 4 0 9 1 14
    Conv (3, 2) 2 0 0 3 0 0
    Conv (5, 1) 7 0 0 3 0 5
    Conv (5, 2) 3 0 0 1 0 0
    Activation (ReLU) 34 0 0 16 0 14
    Activation (sigmoid) 0 0 0 0 3 5
    Activation (ELU) 0 0 0 0 2 0
    Addition 10 4 128 0 1 0
    Multiplication 0 0 64 0 3 0
    Concatenation 0 0 0 4 1 5
    Slice 0 0 0 0 4 0
    Layer Normalization 0 0 0 0 2 9
    Upsampling (nearest) 0 4 0 0 0 0
    Upsampling (bilinear) 0 0 0 0 0 9
    Grid Sampling 0 0 128 0 0 0
    : SW ࣮૷͞ΕΔԋࢉ
    ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺
    Layer Normalization
    • ૚͝ͱͷฏۉͱඪ४ภࠩΛ
    ٻΊ্ͨͰɺਖ਼نԽΛߦ͏ͷͰ
    ֤ཁૉʹೋճͣͭΞΫηε͢Δ
    • ඪ४ภࠩʹ͸ฏํࠜԋࢉ͕ඞཁ
    → SW Ͱ࣮૷

    View Slide

  16. 2.2. HW/SW ڠௐઃܭ
    ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠
    SW ࣮૷͢ΔԋࢉΛܾఆ
    → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ
    2022/10/17 15
    Operation
    Process
    FE FS CVF CVE CL CVD
    Conv (1, 1) 33 5 0 0 0 0
    Conv (3, 1) 6 4 0 9 1 14
    Conv (3, 2) 2 0 0 3 0 0
    Conv (5, 1) 7 0 0 3 0 5
    Conv (5, 2) 3 0 0 1 0 0
    Activation (ReLU) 34 0 0 16 0 14
    Activation (sigmoid) 0 0 0 0 3 5
    Activation (ELU) 0 0 0 0 2 0
    Addition 10 4 128 0 1 0
    Multiplication 0 0 64 0 3 0
    Concatenation 0 0 0 4 1 5
    Slice 0 0 0 0 4 0
    Layer Normalization 0 0 0 0 2 9
    Upsampling (nearest) 0 4 0 0 0 0
    Upsampling (bilinear) 0 0 0 0 0 9
    Grid Sampling 0 0 128 0 0 0
    : SW ࣮૷͞ΕΔԋࢉ
    ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺
    Upsampling
    • ϝϞϦΞΫηε͸͋Δఔ౓نଇత
    • bilinear ͸ิؒͷܭࢉͷͨΊʹුಈ
    খ਺఺਺ԋࢉΛ༻͍ͨํ͕༗ར
    → nearest ͸ HWɺbilinear ͸ SW Ͱ
    ࣮૷

    View Slide

  17. 2.2. HW/SW ڠௐઃܭ
    ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠
    SW ࣮૷͢ΔԋࢉΛܾఆ
    → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ
    2022/10/17 16
    Operation
    Process
    FE FS CVF CVE CL CVD
    Conv (1, 1) 33 5 0 0 0 0
    Conv (3, 1) 6 4 0 9 1 14
    Conv (3, 2) 2 0 0 3 0 0
    Conv (5, 1) 7 0 0 3 0 5
    Conv (5, 2) 3 0 0 1 0 0
    Activation (ReLU) 34 0 0 16 0 14
    Activation (sigmoid) 0 0 0 0 3 5
    Activation (ELU) 0 0 0 0 2 0
    Addition 10 4 128 0 1 0
    Multiplication 0 0 64 0 3 0
    Concatenation 0 0 0 4 1 5
    Slice 0 0 0 0 4 0
    Layer Normalization 0 0 0 0 2 9
    Upsampling (nearest) 0 4 0 0 0 0
    Upsampling (bilinear) 0 0 0 0 0 9
    Grid Sampling 0 0 128 0 0 0
    : SW ࣮૷͞ΕΔԋࢉ
    ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺
    Grid Sampling
    • Bilinear ิؒͰ͸খ਺ԋࢉΛ͢Δ
    • ϝϞϦΞΫηε͕ϥϯμϜ
    → SW Ͱ࣮૷
    SW ࣮૷͞ΕΔԋࢉͷதͰ͸
    ϨΠςϯγʔ͕࠷େͱͳΔ

    View Slide

  18. 2.3. ϋʔυ΢ΣΞઃܭ
    HW ࣮૷ʹదͨ͠ॲཧΛߦ͏ΧελϜճ࿏Λ
    FPGA ্ͷ PL ʹઃܭ
    ❖BN ͱ Conv Λ݁߹ͯ͠ԋࢉճ਺Λ࡟ݮ
    ❖PTQ (Post-Training Quantization) Ͱ
    ֶशࡁΈύϥϝʔλΛྔࢠԽ
    ❖ࢦ਺ԋࢉΛ LUT Λ༻͍ͯۙࣅ
    ❖શମͷΞʔΩςΫνϟΛઃܭ
    ❖σʔλϨϕϧฒྻԽΛ࣮ࢪ
    2022/10/17 17
    BRAMs
    (data)
    Conv (1, 1)
    Conv (3, 1)
    Conv (3, 2)
    Conv (5, 2)
    Conv (5, 1)
    ReLU
    sigmoid
    upsampling
    add rshift clip
    add rshift clip
    add rshift clip
    lshift
    lshift
    rshift
    rshift sigmoid rshift
    mul
    rshift sigmoid rshift
    mul
    add rshift clip
    ELU
    rshift sigmoid
    mul
    ELU
    rshift clip
    BRAMs
    (params)
    concat
    concat
    slice
    extern
    decoder part
    encoder part
    encoder/decoder part
    every part
    ConvLSTM
    DRAM
    AXI Bus
    DMA Controller
    skip connection
    concat
    cell state
    hidden state
    ઐ༻ͷࢉज़ԋࢉύΠϓϥΠϯΛؚΉશମͷ
    HW ΞΫηϥϨʔλͷΞʔΩςΫνϟ

    View Slide

  19. 2.4. ιϑτ΢ΣΞઃܭ
    SW ࣮૷ʹదͨ͠ॲཧΛߦ͏࠷దԽ͞ΕͨϓϩάϥϜΛ
    CPU ্ʹઃܭ
    ❖Ωϟογϡώοτ཰Λ্͛ΔΑ͏ʹ
    ϝϞϦΞΫηεύλʔϯΛ࠷దԽ
    ❖ࣄલʹ֬ఆ͍ͯ͠Δม਺ͷຒΊࠐΈ
    ❖ྔࢠԽ
    ❖ϚϧνεϨουܕͷฒྻԽ
    2022/10/17 18

    View Slide

  20. 2.5. HW/SW εέδϡʔϦϯά
    PL ͱ CPU Λฒྻʹڠௐͯ͠ಈ࡞ͤ͞ɺϨΠςϯγʔΛ
    Ӆṭ͢Δʹ͸ҎԼͷೋ఺͕ඞཁ
    ❖֤ॲཧͷऴྃ௨஌΍σʔλͷ΍ΓऔΓΛߦ͏ͨΊͷɺ
    HW/SW ؒͰͷ௨৴ػߏͷઃܭ
    ❖λεΫϨϕϧฒྻԽͷ࣮ࢪ
    2022/10/17 19

    View Slide

  21. 2.5.1. HW/SW ؒͰͷ௨৴ػߏ
    CMA (Contiguous Memory Allocator) ͱׂΓࠐΈॲཧػߏΛ࢖༻
    CMA
    ❖࿈ଓͨ͠෺ཧϝϞϦྖҬΛ֬อ͢Δ࢓૊Έ
    ❖Ծ૝ϝϞϦۭ͕ؒѻ͑Δ SW ͱ෺ཧϝϞϦۭ͔ؒ͠ѻ͑ͳ͍ HW
    ͰϝϞϦྖҬͷڞ༗͕Մೳ
    2022/10/17 20
    HW
    SW
    ׂΓࠐΈॲཧػߏ

    View Slide

  22. Input Frame
    Keyframe Buffer
    (KB)
    Output
    Depth Map
    cell state
    hidden state
    Input Pose hidden state
    correction
    Feature
    Extractor
    (FE)
    Cost
    Volume
    Encoder
    (CVE)
    Cost
    Volume
    Decoder
    (CVD)
    Conv
    LSTM
    (CL)
    Feature
    Shrinker
    (FS)
    Cost
    Volume
    Fusion
    (CVF)
    0.3 -0.4 0.9 -2.0
    -0.0 -0.9 -0.4 0.1
    1.0 0.1 -0.3 4.7
    0 0 0 1 KB.get
    KB.add
    pre-
    process
    post-
    process
    2.5.2. λεΫϨϕϧฒྻԽ
    ฒྻ౓ΛߴΊɺՄೳͳݶΓ࣮ߦϨΠςϯγʔΛӅṭ
    Grid sampling ΛؚΉ CVF ͷॲཧͷ 93% ͷϨΠςϯγʔ͕ӅṭՄೳ
    ❖Grid sampling ͸લͷॲཧ (FS) ͱͷσʔλͷґଘؔ܎͕ͳ͍
    2022/10/17 21
    SW (CPU)
    HW (PL)
    pre-process
    CVF (preparation)
    post-process
    correction
    KB.get CVF
    CVE
    CL CVD
    layer normalization
    upsampling (bilinear)
    depth
    map
    frame
    KB.add
    time
    pose
    FE + FS
    ఏҊख๏ͷύΠϓϥΠϯνϟʔτ
    DeepVideoMVS ͷߏ੒ਤ (࠶ܝ)

    View Slide

  23. ໨࣍
    1. എܠ
    2. ఏҊख๏
    3. ࣮ݧɾ݁Ռ
    1. ධՁํ๏
    2. ࣮ߦ࣌ؒɾHW ࢿݯ
    3. ਫ਼౓
    4. ·ͱΊ
    2022/10/17 22

    View Slide

  24. 3.1. ධՁํ๏
    FPGA ্ʹఏҊख๏Λ࣮૷͠ɺCPU ͷΈΛ༻͍࣮ͨߦͱൺֱ
    2022/10/17 23
    ೖྗը૾αΠζ 96 º 64
    Ϟσϧ TUM RGB-D [Sturm et al., 2012] Λ༻͍ͯࣄલֶश͞ΕͨϞσϧ
    FPGA Xilinx ZCU104 Ϙʔυ
    HW ࣮૷ Python Ͱهड़͠ɺNNgen Λ༻͍ͯߴҐ߹੒Λߦ͍ɺ
    Vivado 2021.2 Λ༻͍ͯϏοτετϦʔϜΛੜ੒
    SW ࣮૷ Cython v0.29 Λ༻͍ͯࣄલίϯύΠϧ
    ࣮ߦ PYNQ v2.6
    ධՁσʔληοτ 7-Scenes [Shotton et al., 2013]
    ൺֱ༻ͷ࣮૷ g++ 7.3.0 Ͱ -O3 ΦϓγϣϯΛ͚ͭͯίϯύΠϧ
    ZCU104 Ϙʔυ

    View Slide

  25. 3.2.࣮ߦ࣌ؒɾHW ࢿݯ
    ΫϩοΫप೾਺͸ 187.512 MHz
    CPU ͷΈͰ࣮ߦͨ͠৔߹ͱൺ΂ͯ 60.2 ഒͷߴ଎Խ
    HW ࢿݯ͸࠷େݶ༗ޮ׆༻
    2022/10/17 24
    Platform median [s] std [s] frequency [MHz]
    CPU-only 16.744 0.049 N/A
    CPU-only (w/ PTQ) 13.248 0.035 N/A
    PL + CPU (ours) 0.278 0.118 187.52
    Name #Utilization Available Utilization [%]
    Slice 28256 28800 98.1
    LUT 176377 230400 76.6
    FF 143072 460800 31.0
    DSP 128 1728 7.41
    BRAM 309 312 99.0
    ࣮ߦ࣌ؒͷൺֱ HW ࢿݯͷ࢖༻ঢ়گ

    View Slide

  26. 3.3. ਫ਼౓
    ݟͨ໨Ͱ۠ผͰ͖Δ΄Ͳͷେ͖ͳਫ਼౓ྼԽ͸ͳ͍
    एׯਫ਼౓͸མ͍ͪͯΔ͕ɺ΄ͱΜͲͷ৔߹Ͱਫ਼౓ྼԽ͸ 10% ະຬ
    2022/10/17 25
    (a) Input (b) Ground truth (c) Output of C++
    impl
    (d) Output of C++
    impl w/ PTQ
    (e) Output of the
    proposed accelerator
    γʔϯ fire-seq-01ɺϑϨʔϜ൪߸ 000139 ͷॲཧͷ݁Ռɻ
    ground truth ͱͷؒͷ MSE ͸ͦΕͧΕ (c) 0.091, (d) 0.073, (e) 0.089, (f) 0.084 Ͱ͋Δɻ
    (a) Input (b) Ground truth (c) Output of C++
    impl
    (d) Output of C++
    impl w/ PTQ
    (e) Output of the
    proposed accelerator
    γʔϯ redkitchen-seq-07ɺϑϨʔϜ൪߸ 000268 ͷॲཧͷ݁Ռɻ
    ground truth ͱͷؒͷ MSE ͸ͦΕͧΕ (c) 0.808, (d) 0.880, (e) 1.099, (f) 1.050 Ͱ͋Δɻ
    Ground truth ͱͷ MSE ͷ
    γʔϯ͝ͱͷൺֱ

    View Slide

  27. ໨࣍
    1. എܠ
    2. ఏҊख๏
    3. ࣮ݧɾ݁Ռ
    4. ·ͱΊ
    2022/10/17 26

    View Slide

  28. 4. ·ͱΊ
    ❖ಈը૾ॲཧಛ༗ͷॲཧͱ DNN Λ૊Έ߹Θͤͨ
    ෳࡶͳਂ౓ਪఆλεΫͷߴ଎Խ
    ❖HW/SW ڠௐઃܭΛߦͬͯɺDeepVideoMVS ͷͨΊͷ
    FPGA ϕʔεͷΞΫηϥϨʔλΛఏҊ
    ❖ఏҊख๏Λ ZCU104 Ϙʔυʹ࣮૷ͨ݁͠Ռɺਫ਼౓ͷྼԽΛ
    ཈͑ͳ͕Βɺιϑτ΢ΣΞͷΈͷ࣮૷ΑΓ 60.2 ഒ
    ߴ଎Խ͞ΕΔ͜ͱΛ֬ೝ
    2022/10/17 27

    View Slide

  29. 2022/10/17 28

    View Slide