Slide 1

Slide 1 text

̏࣍ݩ෮ݩʹؔͯ͠ Learning a Multi-View Stereo Machine NIPS2017࿦จಡΈձˏΫοΫύου 1 ಛʹදه͕ͳ͍ݶΓɺҎԼͷࢿྉ͔ΒҾ༻ https://arxiv.org/pdf/1708.05375.pdf

Slide 2

Slide 2 text

Learning a Multi-View Stereo Machine ▸ චऀ • Abhishek Kar, Christian Häne, Jitendra Malik ʢUC Berkeley) ▸ ֓ཁ • Multi View StereoʢMVSʣʹΑΔີͳ3࣍ݩ෮ݩΛDeep LearningͰEnd2Endʹֶश • MVSΛ”ֶशͰ͖Δ”ͷͰ͸ແ͍͔ͱ͍͏ٙ໰ʹ౴͑Δ 2

Slide 3

Slide 3 text

എܠ ▸ Multi View Stereoͱ͸ 1. ಛ௃఺நग़ 2. Ϛονϯά 3. ̏࣍ݩ෮ݩ 4. Τϥʔͷআڈ 3

Slide 4

Slide 4 text

എܠ ▸ Multi View Stereoͱ͸ 1. ಛ௃఺நग़ 2. Ϛονϯά 3. ̏࣍ݩ෮ݩ 4. Τϥʔͷআڈ ==> DeepԿ๭ͰશͯղܾͰ͖ͦ͏ 4

Slide 5

Slide 5 text

എܠ ▸ Multi View Stereoͱ͸ 1. ಛ௃఺நग़ɹ← CNNͰ͍͚Δ 2. Ϛονϯά 3. ̏࣍ݩ෮ݩ 4. Τϥʔͷআڈ 5

Slide 6

Slide 6 text

എܠ ▸ Multi View Stereoͱ͸ 1. ಛ௃఺நग़ 2. Ϛονϯάɹ← CNNͱRNNͰ͍͚Δ 3. ̏࣍ݩ෮ݩ 4. Τϥʔͷআڈ 6

Slide 7

Slide 7 text

എܠ ▸ Multi View Stereoͱ͸ 1. ಛ௃఺நग़ 2. Ϛονϯά 3. ̏࣍ݩ෮ݩɹ← DeconvͰ͍͚Δ 4. Τϥʔͷআڈ 7

Slide 8

Slide 8 text

എܠ ▸ Multi View Stereoͱ͸ 1. ಛ௃఺நग़ 2. Ϛονϯά 3. ̏࣍ݩ෮ݩ 4. Τϥʔͷআڈɹ← Encoder-DecoderͰ͍͚Δ 8

Slide 9

Slide 9 text

DeepԿ๭Ͱࡾ࣍ݩ෮ݩ ▸ 3DR2N2(ECCV2016) • ෳ਺ը૾ΛΤϯίʔυ͠ɺLSTMͰϚονϯά 9 http://3d-r2n2.stanford.edu

Slide 10

Slide 10 text

DeepԿ๭Ͱࡾ࣍ݩ෮ݩ ▸ 3D Shape Reconstruction by Modeling 2.5D Sketch (NIPS2017) • ϦΞϧͷը૾͔Β2.5DͷεέονΛى͜͠ɺ2.5DεέονΛ΋ͱʹ 3DshapeਪఆΛEnd2EndֶशͰ͢Δ 10 https://arxiv.org/pdf/1711.03129.pdf

Slide 11

Slide 11 text

࿩͢಺༰ ▸ શମ૾ ▸ ख๏ ▸ ࣮ݧ ▸ ·ͱΊ 11

Slide 12

Slide 12 text

શମ૾ 12 http://bair.berkeley.edu/blog/2017/09/05/unified-3d/

Slide 13

Slide 13 text

શମ૾ 13 Learnt Stereo Machines

Slide 14

Slide 14 text

ख๏ ▸ Image Encoder • Encoder-DecoderܕʢU-netʣͷ૚ઃܭ • Ϛονϯάʹ༻͍Δ̎Dͷಛ௃Ϛοϓ࡞੒ • ࣍ݩ2Dnಛ௃Ϛο 14

Slide 15

Slide 15 text

ख๏ ▸ Unplojection ▸ 2࣍ݩͷಛ௃Ϛοϓ͸3࣍ݩͷຊདྷ͋Δ΂͖ಛ௃Ϛοϓ͔ΒࣹӨ ▸ 3࣍ݩάϦουʹٯࣹӨ 15 http://bair.berkeley.edu/blog/2017/09/05/unified-3d/

Slide 16

Slide 16 text

ख๏ ▸ Unplojection ▸ 2࣍ݩͷಛ௃Ϛοϓ͸3࣍ݩͷຊདྷ͋Δ΂͖ಛ௃Ϛοϓ͔ΒࣹӨ ▸ 3࣍ݩάϦουʹٯࣹӨ 16 http://bair.berkeley.edu/blog/2017/09/05/unified-3d/

Slide 17

Slide 17 text

ख๏ ▸ Unplohection ▸ 2࣍ݩͷಛ௃Ϛοϓ͸3࣍ݩͷຊདྷ͋Δ΂͖ಛ௃Ϛοϓ͔ΒࣹӨ ▸ 3࣍ݩάϦουʹٯࣹӨ 17 http://bair.berkeley.edu/blog/2017/09/05/unified-3d/

Slide 18

Slide 18 text

ख๏ ▸ Unplohection ▸ 2࣍ݩͷಛ௃Ϛοϓ͸3࣍ݩͷຊདྷ͋Δ΂͖ಛ௃Ϛοϓ͔ΒࣹӨ ▸ 3࣍ݩάϦουʹٯࣹӨ 18 http://bair.berkeley.edu/blog/2017/09/05/unified-3d/

Slide 19

Slide 19 text

ख๏ ▸ Recurrent Grid Fusion • 3࣍ݩͷಛ௃ϚοϓͷϚονϯάΛGated Recurrent Unit(GRU)Ͱ • GRUʹ͍࣋ͬͯͨ͘Ίɺ3D convolutionΛ࢖༻ • ͜ͷաఔ͕MVSͷܭࢉϚονϯάΛ୲౰ • ֶशͷࡍ͸ը૾ͷೖྗॱΛϥϯμϜʹೖΕସ͑Δ 19

Slide 20

Slide 20 text

ख๏ ▸ 3D Grid Reasoning • GRUͰ̏࣍ݩάϦουʹͨ͠ΒϊΠζ͕ଟ͔ͬͨɻ • 3U-netͰEncode Decode͢ΔͱFilteringͰ͖Δ 20

Slide 21

Slide 21 text

ख๏ ▸ Differentiable Projection • Depthͷ෮ݩʹ͸L1 loss(high frequency informationͷͨΊ) • Voxelͷ෮ݩʹ͸voxel͝ͱͷcross entropy loss 21

Slide 22

Slide 22 text

࣮ݧ ▸ σʔληοτ • ShapeNetσʔλΛར༻ • ̏࣍ݩCADϞσϧͷެ։σʔληοτ 22 https://shapenet.cs.stanford.edu/shrec17/

Slide 23

Slide 23 text

࣮ݧ • ೖྗը૾ ▸ ShapeNetͷ3DϞσϧΛϨϯμϦϯάͯ͠224x224x3 ▸ ̍ࢹ఺͋ͨΓ̐ຕ ▸ Χϝϥϙʔζ • Ξ΢τϓοτ ▸ Depth: 224x224x3 ▸ Voxel: 32x32x32 23

Slide 24

Slide 24 text

࣮ݧ ▸ ݁Ռ 24 3DR2N2ͱൺ΂ɺࡉ͔͍෮ݩ͕Մೳ

Slide 25

Slide 25 text

࣮ݧ ▸ ݁Ռ 25 3DR2N2ͱൺ΂ɺগͳ͍ຕ਺Ͱ෮ݩ͕Մೳ ຕ਺૿͑Δͱੑೳ্͕͕Δ

Slide 26

Slide 26 text

࣮ݧ ▸ ݁Ռ 26 stereo matchingͰ͸෮ݩ͠ͳ͍ ૭΋෮ݩՄೳ

Slide 27

Slide 27 text

࣮ݧ ▸ ݁Ռ 27 stereo matchingʹൺ΂
 গͳ͍ຕ਺Ͱ΋෮ݩ͕Մೳ චऀᐌ͘ CNNͷίϯςΫετΛݟΔྗ͸
 ैདྷͷstereo matchingΛ͙྇ DepthMapͷਪఆ݁ՌΛෳ਺૊Έ߹Θͤͯ̏࣍ݩ෮ݩͨ͠

Slide 28

Slide 28 text

·ͱΊ ▸ Learnt Stereo MachinesΛఏҊ ▸ ෳ਺ࢹ఺͔Βͷೖྗը૾Λݩʹɺ
 DepthMapͱVoxelͷਪఆ͕Մೳͱͳͬͨ ▸ ՝୊ • ग़ྗVoxel͕32x32x32ͱখ͍͞ 28