Upgrade to Pro — share decks privately, control downloads, hide ads and more …

動画像を入力とした深度推定のHW/SW協調設計によるFPGAベースの高速化手法 (ARC 2022/10)

動画像を入力とした深度推定のHW/SW協調設計によるFPGAベースの高速化手法 (ARC 2022/10)

一般社団法人情報処理学会のシステム・アーキテクチャ研究発表会 (ARC) での発表資料です (2022/10/11)。
機械学習ベースの動画像処理技術であり、深度推定タスクに対する DeepVideoMVS と呼ばれるアプリケーションを利用した FPGA 上での高速化手法についての検討を行った。
オープンソースの高位合成ツール NNgen を用いて、HW と SW の特性を最大限生かして高速化を行った結果、CPU のみの実装と比べて、60.2 倍の高速化を達成した。
・プログラムと抄録: http://sigarc.ipsj.or.jp/mtg/fy2022/arc242/
・論文 (Copyright ©2022 by IPSJ): https://projects.n-hassy.info/paper/ARC2022-10.pdf
・プロフィール: https://n-hassy.info/ja/

Nobuho Hashimoto

October 11, 2022
Tweet

More Decks by Nobuho Hashimoto

Other Decks in Research

Transcript

  1. ಈը૾Λೖྗͱͨ͠ਂ౓ਪఆͷ HW/SW ڠௐઃܭʹΑΔ FPGA ϕʔεͷߴ଎Խख๏ ౦ژେֶ େֶӃ৘ใཧ޻ֶܥݚڀՊ ίϯϐϡʔλՊֶઐ߈ ڮຊ ৴าɾߴલా

    ৳໵ 2022/10/11 γεςϜɾΞʔΩςΫνϟݚڀൃදձ (ARC)
  2. ໨࣍ 1. എܠ 2. ఏҊख๏ 3. ࣮ݧɾ݁Ռ 4. ·ͱΊ 2022/10/17

    1
  3. ໨࣍ 1. എܠ 1. ਂ౓ਪఆ 2. DNN Λ༻͍ͨਂ౓ਪఆख๏ 3. DeepVideoMVS

    4. ํ਑ 2. ఏҊख๏ 3. ࣮ݧɾ݁Ռ 4. ·ͱΊ 2022/10/17 2
  4. ਂ౓ਪఆ Χϝϥͱର৅෺ͷڑ཭Λਪఆ ❖ࣗ཯૸ߦͷͨΊͷφϏήʔγϣϯ΍ ARͳͲ෯޿͍Ԡ༻ઌ ❖ಈը૾ॲཧಛ༗ͷॲཧͱ DNN ͔ΒͳΔෳ߹త͔ͭෳࡶͳॲཧ 2022/10/17 3 εςϨΦϚονϯά

    ࢹࠩΛਪఆ ࡾ֯ଌྔ Χϝϥ͔Β֤఺·Ͱͷ ڑ཭Λܭࢉ ೋຕҎ্ͷը૾ ࢹࠩɿ͋Δը૾ͷ֤఺ͱ ͦΕʹରԠ͢Δ΋͏Ұͭ ͷը૾಺ͷ఺ͷҐஔͷࠩ ਂ౓Ϛοϓ ਂ౓ਪఆͷྲྀΕ
  5. 1.2. DNN Λ༻͍ͨਂ౓ਪఆख๏ ྨࣅ౓Λݩʹߴਫ਼౓Ͱೋຕͷը૾͔ΒରԠ఺ΛٻΊΔͷ͸ࠔ೉ ͳͷͰɺDNN Λ༻͍ͨख๏͕ଟ਺ଘࡏ ❖ DeepV2D [Teed et

    al., 2020], DeepVideoMVS [Duzceker et al., 2020], HITNet [Tankovich et al., 2021], Open4D [Bansal et al., 2020], NeRF [Mildenhall et al., 2020], NSFF [Li et al., 2021] ࠓճ͸ DeepVideoMVS Λ࢖༻ ❖Ұ୆ͷ୯؟ΧϝϥͰࡱӨͨ͠ಈը૾ͱΧϝϥͷϙʔζΛೖྗ ❖γʔϯ͝ͱʹֶश͠௚ͣ͞ʹਪ࿦͕Մೳ ❖௿ফඅిྗͷ૊ΈࠐΈ؀ڥͰϦΞϧλΠϜʹ͍ۙ଎౓Ͱ ಈ࡞Ͱ͖ΔՄೳੑ͕ߴ͍ 2022/10/17 4
  6. Input Frame Keyframe Buffer (KB) Output Depth Map cell state

    hidden state Input Pose hidden state correction Feature Extractor (FE) Cost Volume Encoder (CVE) Cost Volume Decoder (CVD) Conv LSTM (CL) Feature Shrinker (FS) Cost Volume Fusion (CVF) 0.3 -0.4 0.9 -2.0 -0.0 -0.9 -0.4 0.1 1.0 0.1 -0.3 4.7 0 0 0 1 KB.get KB.add pre- process post- process 1.3. DeepVideoMVS 2022/10/17 5 DeepVideoMVS ͷߏ੒ਤ
  7. Input Frame Keyframe Buffer (KB) Output Depth Map cell state

    hidden state Input Pose hidden state correction Feature Extractor (FE) Cost Volume Encoder (CVE) Cost Volume Decoder (CVD) Conv LSTM (CL) Feature Shrinker (FS) Cost Volume Fusion (CVF) 0.3 -0.4 0.9 -2.0 -0.0 -0.9 -0.4 0.1 1.0 0.1 -0.3 4.7 0 0 0 1 KB.get KB.add pre- process post- process 1.3. DeepVideoMVS 2022/10/17 6 DeepVideoMVS ͷߏ੒ਤ RNN Λ༻͍Δ͜ͱͰɺ ࣌ܥྻ৘ใΛར༻Մೳ ಛ௃ྔͷநग़ ίετϘϦϡʔϜ (఺ಉ࢜ͷରԠͷ ౓߹͍) ͷௐ੔
  8. Input Frame Keyframe Buffer (KB) Output Depth Map cell state

    hidden state Input Pose hidden state correction Feature Extractor (FE) Cost Volume Encoder (CVE) Cost Volume Decoder (CVD) Conv LSTM (CL) Feature Shrinker (FS) Cost Volume Fusion (CVF) 0.3 -0.4 0.9 -2.0 -0.0 -0.9 -0.4 0.1 1.0 0.1 -0.3 4.7 0 0 0 1 KB.get KB.add pre- process post- process 1.3. DeepVideoMVS 2022/10/17 7 Input Frame Keyframe Buffer (KB) Output Depth Map cell state hidden state Input Pose hidden state correction Feature Extractor (FS) Cost Volume Encoder (CVE) Cost Volume Decoder (CVD) Conv LSTM (CL) Feature Shrinker (FS) Cost Volume Fusion (CVF) 0.3 -0.4 0.9 -2.0 -0.0 -0.9 -0.4 0.1 1.0 0.1 -0.3 4.7 0 0 0 1 KB.get KB.add pre- process post- process ConvLSTM DeepVideoMVS ͷߏ੒ਤ աڈʹݟͨ͜ͱͷ͋Δ ϙʔζͷ͍ۙϑϨʔϜΛ࠶ར༻ Grid sampling (ޙड़)Λ ߦͬͯɺࢹ఺Λม׵
  9. 1.4. ํ਑ ❖͢΂ͯͷॲཧΛ HW Ͱߴ଎Խ͢Ε͹Α͍Θ͚Ͱ͸ͳ͍ Ø DNN ʹಛԽͨ͠ϋʔυ΢ΣΞΞΫηϥϨʔλͷݚڀ͸੝Μ Ø Ұํɺಈը૾ॲཧಛ༗ͷॲཧͱ

    DNN ͕૊Έ߹Θͬͨ͞ ෳࡶͳॲཧͷߴ଎Խ͸ࠔ೉ ❖ۙ೥ͷ SoC FPGA ͸͋Δఔ౓ߴ଎ͳ CPU ΋౥ࡌ Ø ͜ͷಛੑΛ͏·͘ར༻͢Δ 2022/10/17 8 SoC FPGA PL (HW ෦෼) CPU (SW ෦෼) ϝϞϦ (DRAM) ΠϯλʔίωΫτ (AXI όε) SoC FPGA ͷΠϝʔδਤ PL (Programmable Logic) ͱ CPU Ͱ ฒྻॲཧ΍σʔλͷ΍ΓऔΓ͕Մೳ
  10. ໨࣍ 1. എܠ 2. ఏҊख๏ 1. ఏҊख๏ͷ֓ཁ 2. HW/SW ڠௐઃܭ

    3. ϋʔυ΢ΣΞઃܭ 4. ιϑτ΢ΣΞઃܭ 5. HW/SW εέδϡʔϦϯά 1. HW/SW ؒͰͷ௨৴ػߏ 2. λεΫϨϕϧฒྻԽ 3. ࣮ݧɾ݁Ռ 4. ·ͱΊ 2022/10/17 9
  11. 2.1. ఏҊख๏ͷ֓ཁ ҎԼͷྲྀΕͰΞΫηϥϨʔλΛઃܭ ❖HW/SW ڠௐઃܭ Ø SW Ͱ࣮૷͢΂͖ॲཧΛ HW ͱ

    SW ͷͦΕͧΕͷಛੑΛߟྀͯ͠ݕ౼ ❖ϋʔυ΢ΣΞઃܭ Ø HW ࣮૷ʹదͨ͠ॲཧΛߦ͏ΧελϜճ࿏ΛߴҐ߹੒πʔϧ NNgen [https://github.com/NNgen/nngen] Λ༻͍ͯ FPGA ্ͷ PL ʹઃܭ ❖ιϑτ΢ΣΞઃܭ Ø SW ࣮૷ʹదͨ͠ॲཧΛߦ͏࠷దԽ͞ΕͨϓϩάϥϜΛ CPU ্ʹઃܭ ❖HW/SW εέδϡʔϦϯά Ø PL ͱ CPU Λฒྻʹڠௐͯ͠ಈ࡞ͤ͞Δ͜ͱͰɺ HW ࣮૷ͱ SW ࣮૷ͷޓ͍ͷ࣮ߦϨΠςϯγʔΛӅṭ 2022/10/17 10
  12. Input Frame Keyframe Buffer (KB) Output Depth Map cell state

    hidden state Input Pose hidden state correction Feature Extractor (FE) Cost Volume Encoder (CVE) Cost Volume Decoder (CVD) Conv LSTM (CL) Feature Shrinker (FS) Cost Volume Fusion (CVF) 0.3 -0.4 0.9 -2.0 -0.0 -0.9 -0.4 0.1 1.0 0.1 -0.3 4.7 0 0 0 1 KB.get KB.add pre- process post- process 2.2. HW/SW ڠௐઃܭ ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠ SW ࣮૷͢ΔԋࢉΛܾఆ → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ 2022/10/17 11 Operation Process FE FS CVF CVE CL CVD Conv (1, 1) 33 5 0 0 0 0 Conv (3, 1) 6 4 0 9 1 14 Conv (3, 2) 2 0 0 3 0 0 Conv (5, 1) 7 0 0 3 0 5 Conv (5, 2) 3 0 0 1 0 0 Activation (ReLU) 34 0 0 16 0 14 Activation (sigmoid) 0 0 0 0 3 5 Activation (ELU) 0 0 0 0 2 0 Addition 10 4 128 0 1 0 Multiplication 0 0 64 0 3 0 Concatenation 0 0 0 4 1 5 Slice 0 0 0 0 4 0 Layer Normalization 0 0 0 0 2 9 Upsampling (nearest) 0 4 0 0 0 0 Upsampling (bilinear) 0 0 0 0 0 9 Grid Sampling 0 0 128 0 0 0 : SW ࣮૷͞ΕΔԋࢉ ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺ DeepVideoMVS ͷ ߏ੒ਤ (࠶ܝ)
  13. 2.2. HW/SW ڠௐઃܭ ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠ SW ࣮૷͢ΔԋࢉΛܾఆ → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ 2022/10/17

    12 Operation Process FE FS CVF CVE CL CVD Conv (1, 1) 33 5 0 0 0 0 Conv (3, 1) 6 4 0 9 1 14 Conv (3, 2) 2 0 0 3 0 0 Conv (5, 1) 7 0 0 3 0 5 Conv (5, 2) 3 0 0 1 0 0 Activation (ReLU) 34 0 0 16 0 14 Activation (sigmoid) 0 0 0 0 3 5 Activation (ELU) 0 0 0 0 2 0 Addition 10 4 128 0 1 0 Multiplication 0 0 64 0 3 0 Concatenation 0 0 0 4 1 5 Slice 0 0 0 0 4 0 Layer Normalization 0 0 0 0 2 9 Upsampling (nearest) 0 4 0 0 0 0 Upsampling (bilinear) 0 0 0 0 0 9 Grid Sampling 0 0 128 0 0 0 : SW ࣮૷͞ΕΔԋࢉ ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺ Conv • ܭࢉճ਺͕ଟ͍ • HW Ͱͷߴ଎Խख๏͕਺ଟ͘ݚڀ ͞Ε͍ͯΔ • ϝϞϦΞΫηε΋͋Δఔ౓نଇత → HW Ͱ࣮૷
  14. 2.2. HW/SW ڠௐઃܭ ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠ SW ࣮૷͢ΔԋࢉΛܾఆ → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ 2022/10/17

    13 Operation Process FE FS CVF CVE CL CVD Conv (1, 1) 33 5 0 0 0 0 Conv (3, 1) 6 4 0 9 1 14 Conv (3, 2) 2 0 0 3 0 0 Conv (5, 1) 7 0 0 3 0 5 Conv (5, 2) 3 0 0 1 0 0 Activation (ReLU) 34 0 0 16 0 14 Activation (sigmoid) 0 0 0 0 3 5 Activation (ELU) 0 0 0 0 2 0 Addition 10 4 128 0 1 0 Multiplication 0 0 64 0 3 0 Concatenation 0 0 0 4 1 5 Slice 0 0 0 0 4 0 Layer Normalization 0 0 0 0 2 9 Upsampling (nearest) 0 4 0 0 0 0 Upsampling (bilinear) 0 0 0 0 0 9 Grid Sampling 0 0 128 0 0 0 : SW ࣮૷͞ΕΔԋࢉ ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺ Activation ͔Β Slice ·Ͱ • ܭࢉճ਺΋ͦΕ΄Ͳଟ͘ͳ͘ɺ ܭࢉࣗମ΋୯७ • ཁૉ͝ͱʹܭࢉͰ͖ΔͷͰɺ ϝϞϦΞΫηε΋ࣗ༝ʹૢ࡞Մೳ → HW Ͱ࣮૷
  15. 2.2. HW/SW ڠௐઃܭ ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠ SW ࣮૷͢ΔԋࢉΛܾఆ → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ 2022/10/17

    14 Operation Process FE FS CVF CVE CL CVD Conv (1, 1) 33 5 0 0 0 0 Conv (3, 1) 6 4 0 9 1 14 Conv (3, 2) 2 0 0 3 0 0 Conv (5, 1) 7 0 0 3 0 5 Conv (5, 2) 3 0 0 1 0 0 Activation (ReLU) 34 0 0 16 0 14 Activation (sigmoid) 0 0 0 0 3 5 Activation (ELU) 0 0 0 0 2 0 Addition 10 4 128 0 1 0 Multiplication 0 0 64 0 3 0 Concatenation 0 0 0 4 1 5 Slice 0 0 0 0 4 0 Layer Normalization 0 0 0 0 2 9 Upsampling (nearest) 0 4 0 0 0 0 Upsampling (bilinear) 0 0 0 0 0 9 Grid Sampling 0 0 128 0 0 0 : SW ࣮૷͞ΕΔԋࢉ ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺ Layer Normalization • ૚͝ͱͷฏۉͱඪ४ภࠩΛ ٻΊ্ͨͰɺਖ਼نԽΛߦ͏ͷͰ ֤ཁૉʹೋճͣͭΞΫηε͢Δ • ඪ४ภࠩʹ͸ฏํࠜԋࢉ͕ඞཁ → SW Ͱ࣮૷
  16. 2.2. HW/SW ڠௐઃܭ ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠ SW ࣮૷͢ΔԋࢉΛܾఆ → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ 2022/10/17

    15 Operation Process FE FS CVF CVE CL CVD Conv (1, 1) 33 5 0 0 0 0 Conv (3, 1) 6 4 0 9 1 14 Conv (3, 2) 2 0 0 3 0 0 Conv (5, 1) 7 0 0 3 0 5 Conv (5, 2) 3 0 0 1 0 0 Activation (ReLU) 34 0 0 16 0 14 Activation (sigmoid) 0 0 0 0 3 5 Activation (ELU) 0 0 0 0 2 0 Addition 10 4 128 0 1 0 Multiplication 0 0 64 0 3 0 Concatenation 0 0 0 4 1 5 Slice 0 0 0 0 4 0 Layer Normalization 0 0 0 0 2 9 Upsampling (nearest) 0 4 0 0 0 0 Upsampling (bilinear) 0 0 0 0 0 9 Grid Sampling 0 0 128 0 0 0 : SW ࣮૷͞ΕΔԋࢉ ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺ Upsampling • ϝϞϦΞΫηε͸͋Δఔ౓نଇత • bilinear ͸ิؒͷܭࢉͷͨΊʹුಈ খ਺఺਺ԋࢉΛ༻͍ͨํ͕༗ར → nearest ͸ HWɺbilinear ͸ SW Ͱ ࣮૷
  17. 2.2. HW/SW ڠௐઃܭ ֤ԋࢉͷճ਺ɺੑ࣭ɺϝϞϦΞΫηεύλʔϯΛߟྀͯ͠ SW ࣮૷͢ΔԋࢉΛܾఆ → HW ͱฒྻ࣮ߦ͢Δ͜ͱͰϨΠςϯγʔΛӅṭ 2022/10/17

    16 Operation Process FE FS CVF CVE CL CVD Conv (1, 1) 33 5 0 0 0 0 Conv (3, 1) 6 4 0 9 1 14 Conv (3, 2) 2 0 0 3 0 0 Conv (5, 1) 7 0 0 3 0 5 Conv (5, 2) 3 0 0 1 0 0 Activation (ReLU) 34 0 0 16 0 14 Activation (sigmoid) 0 0 0 0 3 5 Activation (ELU) 0 0 0 0 2 0 Addition 10 4 128 0 1 0 Multiplication 0 0 64 0 3 0 Concatenation 0 0 0 4 1 5 Slice 0 0 0 0 4 0 Layer Normalization 0 0 0 0 2 9 Upsampling (nearest) 0 4 0 0 0 0 Upsampling (bilinear) 0 0 0 0 0 9 Grid Sampling 0 0 128 0 0 0 : SW ࣮૷͞ΕΔԋࢉ ֤ύʔπͰ࣮ߦ͞ΕΔԋࢉͷճ਺ Grid Sampling • Bilinear ิؒͰ͸খ਺ԋࢉΛ͢Δ • ϝϞϦΞΫηε͕ϥϯμϜ → SW Ͱ࣮૷ SW ࣮૷͞ΕΔԋࢉͷதͰ͸ ϨΠςϯγʔ͕࠷େͱͳΔ
  18. 2.3. ϋʔυ΢ΣΞઃܭ HW ࣮૷ʹదͨ͠ॲཧΛߦ͏ΧελϜճ࿏Λ FPGA ্ͷ PL ʹઃܭ ❖BN ͱ

    Conv Λ݁߹ͯ͠ԋࢉճ਺Λ࡟ݮ ❖PTQ (Post-Training Quantization) Ͱ ֶशࡁΈύϥϝʔλΛྔࢠԽ ❖ࢦ਺ԋࢉΛ LUT Λ༻͍ͯۙࣅ ❖શମͷΞʔΩςΫνϟΛઃܭ ❖σʔλϨϕϧฒྻԽΛ࣮ࢪ 2022/10/17 17 BRAMs (data) Conv (1, 1) Conv (3, 1) Conv (3, 2) Conv (5, 2) Conv (5, 1) ReLU sigmoid upsampling add rshift clip add rshift clip add rshift clip lshift lshift rshift rshift sigmoid rshift mul rshift sigmoid rshift mul add rshift clip ELU rshift sigmoid mul ELU rshift clip BRAMs (params) concat concat slice extern decoder part encoder part encoder/decoder part every part ConvLSTM DRAM AXI Bus DMA Controller skip connection concat cell state hidden state ઐ༻ͷࢉज़ԋࢉύΠϓϥΠϯΛؚΉશମͷ HW ΞΫηϥϨʔλͷΞʔΩςΫνϟ
  19. 2.4. ιϑτ΢ΣΞઃܭ SW ࣮૷ʹదͨ͠ॲཧΛߦ͏࠷దԽ͞ΕͨϓϩάϥϜΛ CPU ্ʹઃܭ ❖Ωϟογϡώοτ཰Λ্͛ΔΑ͏ʹ ϝϞϦΞΫηεύλʔϯΛ࠷దԽ ❖ࣄલʹ֬ఆ͍ͯ͠Δม਺ͷຒΊࠐΈ ❖ྔࢠԽ

    ❖ϚϧνεϨουܕͷฒྻԽ 2022/10/17 18
  20. 2.5. HW/SW εέδϡʔϦϯά PL ͱ CPU Λฒྻʹڠௐͯ͠ಈ࡞ͤ͞ɺϨΠςϯγʔΛ Ӆṭ͢Δʹ͸ҎԼͷೋ఺͕ඞཁ ❖֤ॲཧͷऴྃ௨஌΍σʔλͷ΍ΓऔΓΛߦ͏ͨΊͷɺ HW/SW

    ؒͰͷ௨৴ػߏͷઃܭ ❖λεΫϨϕϧฒྻԽͷ࣮ࢪ 2022/10/17 19
  21. 2.5.1. HW/SW ؒͰͷ௨৴ػߏ CMA (Contiguous Memory Allocator) ͱׂΓࠐΈॲཧػߏΛ࢖༻ CMA ❖࿈ଓͨ͠෺ཧϝϞϦྖҬΛ֬อ͢Δ࢓૊Έ

    ❖Ծ૝ϝϞϦۭ͕ؒѻ͑Δ SW ͱ෺ཧϝϞϦۭ͔ؒ͠ѻ͑ͳ͍ HW ͰϝϞϦྖҬͷڞ༗͕Մೳ 2022/10/17 20 HW SW ׂΓࠐΈॲཧػߏ
  22. Input Frame Keyframe Buffer (KB) Output Depth Map cell state

    hidden state Input Pose hidden state correction Feature Extractor (FE) Cost Volume Encoder (CVE) Cost Volume Decoder (CVD) Conv LSTM (CL) Feature Shrinker (FS) Cost Volume Fusion (CVF) 0.3 -0.4 0.9 -2.0 -0.0 -0.9 -0.4 0.1 1.0 0.1 -0.3 4.7 0 0 0 1 KB.get KB.add pre- process post- process 2.5.2. λεΫϨϕϧฒྻԽ ฒྻ౓ΛߴΊɺՄೳͳݶΓ࣮ߦϨΠςϯγʔΛӅṭ Grid sampling ΛؚΉ CVF ͷॲཧͷ 93% ͷϨΠςϯγʔ͕ӅṭՄೳ ❖Grid sampling ͸લͷॲཧ (FS) ͱͷσʔλͷґଘؔ܎͕ͳ͍ 2022/10/17 21 SW (CPU) HW (PL) pre-process CVF (preparation) post-process correction KB.get CVF CVE CL CVD layer normalization upsampling (bilinear) depth map frame KB.add time pose FE + FS ఏҊख๏ͷύΠϓϥΠϯνϟʔτ DeepVideoMVS ͷߏ੒ਤ (࠶ܝ)
  23. ໨࣍ 1. എܠ 2. ఏҊख๏ 3. ࣮ݧɾ݁Ռ 1. ධՁํ๏ 2.

    ࣮ߦ࣌ؒɾHW ࢿݯ 3. ਫ਼౓ 4. ·ͱΊ 2022/10/17 22
  24. 3.1. ධՁํ๏ FPGA ্ʹఏҊख๏Λ࣮૷͠ɺCPU ͷΈΛ༻͍࣮ͨߦͱൺֱ 2022/10/17 23 ೖྗը૾αΠζ 96 º

    64 Ϟσϧ TUM RGB-D [Sturm et al., 2012] Λ༻͍ͯࣄલֶश͞ΕͨϞσϧ FPGA Xilinx ZCU104 Ϙʔυ HW ࣮૷ Python Ͱهड़͠ɺNNgen Λ༻͍ͯߴҐ߹੒Λߦ͍ɺ Vivado 2021.2 Λ༻͍ͯϏοτετϦʔϜΛੜ੒ SW ࣮૷ Cython v0.29 Λ༻͍ͯࣄલίϯύΠϧ ࣮ߦ PYNQ v2.6 ධՁσʔληοτ 7-Scenes [Shotton et al., 2013] ൺֱ༻ͷ࣮૷ g++ 7.3.0 Ͱ -O3 ΦϓγϣϯΛ͚ͭͯίϯύΠϧ ZCU104 Ϙʔυ
  25. 3.2.࣮ߦ࣌ؒɾHW ࢿݯ ΫϩοΫप೾਺͸ 187.512 MHz CPU ͷΈͰ࣮ߦͨ͠৔߹ͱൺ΂ͯ 60.2 ഒͷߴ଎Խ HW

    ࢿݯ͸࠷େݶ༗ޮ׆༻ 2022/10/17 24 Platform median [s] std [s] frequency [MHz] CPU-only 16.744 0.049 N/A CPU-only (w/ PTQ) 13.248 0.035 N/A PL + CPU (ours) 0.278 0.118 187.52 Name #Utilization Available Utilization [%] Slice 28256 28800 98.1 LUT 176377 230400 76.6 FF 143072 460800 31.0 DSP 128 1728 7.41 BRAM 309 312 99.0 ࣮ߦ࣌ؒͷൺֱ HW ࢿݯͷ࢖༻ঢ়گ
  26. 3.3. ਫ਼౓ ݟͨ໨Ͱ۠ผͰ͖Δ΄Ͳͷେ͖ͳਫ਼౓ྼԽ͸ͳ͍ एׯਫ਼౓͸མ͍ͪͯΔ͕ɺ΄ͱΜͲͷ৔߹Ͱਫ਼౓ྼԽ͸ 10% ະຬ 2022/10/17 25 (a) Input

    (b) Ground truth (c) Output of C++ impl (d) Output of C++ impl w/ PTQ (e) Output of the proposed accelerator γʔϯ fire-seq-01ɺϑϨʔϜ൪߸ 000139 ͷॲཧͷ݁Ռɻ ground truth ͱͷؒͷ MSE ͸ͦΕͧΕ (c) 0.091, (d) 0.073, (e) 0.089, (f) 0.084 Ͱ͋Δɻ (a) Input (b) Ground truth (c) Output of C++ impl (d) Output of C++ impl w/ PTQ (e) Output of the proposed accelerator γʔϯ redkitchen-seq-07ɺϑϨʔϜ൪߸ 000268 ͷॲཧͷ݁Ռɻ ground truth ͱͷؒͷ MSE ͸ͦΕͧΕ (c) 0.808, (d) 0.880, (e) 1.099, (f) 1.050 Ͱ͋Δɻ Ground truth ͱͷ MSE ͷ γʔϯ͝ͱͷൺֱ
  27. ໨࣍ 1. എܠ 2. ఏҊख๏ 3. ࣮ݧɾ݁Ռ 4. ·ͱΊ 2022/10/17

    26
  28. 4. ·ͱΊ ❖ಈը૾ॲཧಛ༗ͷॲཧͱ DNN Λ૊Έ߹Θͤͨ ෳࡶͳਂ౓ਪఆλεΫͷߴ଎Խ ❖HW/SW ڠௐઃܭΛߦͬͯɺDeepVideoMVS ͷͨΊͷ FPGA

    ϕʔεͷΞΫηϥϨʔλΛఏҊ ❖ఏҊख๏Λ ZCU104 Ϙʔυʹ࣮૷ͨ݁͠Ռɺਫ਼౓ͷྼԽΛ ཈͑ͳ͕Βɺιϑτ΢ΣΞͷΈͷ࣮૷ΑΓ 60.2 ഒ ߴ଎Խ͞ΕΔ͜ͱΛ֬ೝ 2022/10/17 27
  29. 2022/10/17 28