Slide 1

Slide 1 text

NIPS2017ͷਂ૚ֶशؔ࿈ͰڵຯΛ࣋ͬͨ΋ͷ 20180306 ٠ా ངฏ (@yohei_kikuta) Ref: https://nips.cc/Conferences/2017

Slide 2

Slide 2 text

໨࣍ • NIPS2017 ʹ͓͚Δਂ૚ֶश • ਂ૚ֶशͷ൚Խੑೳʹؔ͢Δཧղͷਐల • GAN ͷऩଋੑʹؔ͢Δཧղͷਐల • ਂ૚ֶशͷ৽ͨͳํ޲ੑ • ·ͱΊ 2

Slide 3

Slide 3 text

NIPS2017 ʹ͓͚Δਂ૚ֶश 3

Slide 4

Slide 4 text

ਂ૚ֶशؔ࿈ͷൃද਺ʢ਺͸֓ࢉʣ • Tutorials: શ 9 ݅த 3 ݅ • Invited talks: શ 7 ݅த 2 ݅ • Orals: શ 41 ݅த 8 ݅ • Posters: શ 679 ݅த ໿200 ݅ • Workshops: શ 53 ݅த 5 ݅ʢؚ Deep Learning in λΠτϧʣ 4

Slide 5

Slide 5 text

ਂ૚ֶशؔ࿈ͷൃද: Tutorials • Tutorials: શ 9 ݅த 3 ݅ • Deep Learning: Practice and Trends • Deep Probabilistic Modeling with Gaussian Processes • Geometric Deep Learning on Graphs and Manifolds 5

Slide 6

Slide 6 text

ਂ૚ֶशؔ࿈ͷൃද: Invited talks • Invited talks: શ 7 ݅த 2 ݅ • Deep Learning for Robotics • On Bayesian Deep Learning and Deep Bayesian Learning 6

Slide 7

Slide 7 text

ਂ૚ֶशؔ࿈ͷൃද: Orals • Orals: શ 41 ݅த 8 ݅ • TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning • Train longer, generalize better: closing the generalization gap in large batch training of neural networks • End-to-End Differentiable Proving • Gradient descent GAN optimization is locally stable 7

Slide 8

Slide 8 text

ਂ૚ֶशؔ࿈ͷൃද: Orals (Cont'd) • Orals: શ 41 ݅த 8 ݅ • Imagination-Augmented Agents for Deep Reinforcement Learning • Masked Autoregressive Flow for Density Estimation • Deep Sets • From Bayesian Sparsity to Gated Recurrent Nets 8

Slide 9

Slide 9 text

ਂ૚ֶशؔ࿈ͷൃද: Posters • Posters: શ 679 ݅த ໿200 ݅ • ଟ͗ͯ͢ྻڍͰ͖ͳ͍ ! • ͪͳΈʹಠஅͱภݟͰ਺ΛΧ΢ϯτ͍ͯ͠·͢ • ৄࡉ͸ NIPSͷϖʔδ ΛνΣοΫͯ͠Լ͍͞ NIPSͷϖʔδ: https://nips.cc/Conferences/2017/Schedule?type=Poster 9

Slide 10

Slide 10 text

ਂ૚ֶशؔ࿈ͷൃද: Workshops • Workshops: શ 53 ݅த 5 ݅ʢؚ Deep Learning in λΠτϧʣ • Deep Learning for Physical Sciences • Deep Learning at Supercomputer Scale • Deep Learning: Bridging Theory and Practice • Bayesian Deep Learning • Interpreting, Explaining and Visualizing Deep Learning - Now what? 10

Slide 11

Slide 11 text

ਂ૚ֶशݚڀͷํ޲ੑʹؔ͢Δॴײ ਂ૚ֶशͷ੎͍͸૿͠ଓ͚͍ͯΔ • τϨϯυ΋ଟ༷Խ • ൚ԽੑೳͷཧղɺGANͷོ੝ɺ৽ͨͳϞσϦϯάͷํ޲ੑ • େن໛෼ࢄֶशɺϕΠδΞϯɺϩϘςΟΫεɺ... • ͜ͷࢿྉͰ͸্هҰͭ໨ʹ͍͔ؔͯͭ͘͠ͷ࿦จΛ঺հ 11

Slide 12

Slide 12 text

ਂ૚ֶशͷ൚Խੑೳʹؔ͢Δཧղͷਐల 12

Slide 13

Slide 13 text

൚Խੑೳͱ͸ • ֶशͰ࢖͍ͬͯͳ͍σʔλʹର͢ΔϞσϧͷੑೳ • ਂ૚ֶशͷొ৔Ͱ൚Խੑೳʹ࠶౓஫໨͕ू·͍ͬͯΔ • (ύϥϝλ਺) >> (σʔλ਺)ͷΑ͏ͳ "φϯηϯε" ͳϞσϧ • ͦΕͰ΋ѹ౗త൚ԽੑೳΛ࣋ͪɺͦͷੑ࣭Λ໌Β͔ʹ͍ͨ͠ • ͜ͷ࿩୊͸ਂ૚ֶश͚ͩʹݶΒͳ͍ 13

Slide 14

Slide 14 text

ڵຯΛ࣋ͬͨจݙʢNIPS2017ʣ • Train longer, generalize better: closing the generalization gap in large batch training of neural networks ൚Խੑೳͱֶश཰ͱόοναΠζͷؔ܎Λߟ࡯ɺGBN ͷఏҊ • Exploring Generalization in Deep Learning ൚ԽੑೳࢦඪΛݕ౼ɺPAC-Bayesian ʹجͮ͘ sharpness ΛఏҊ • Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks SGD Λϥϯδϡόϯํఔࣜͱղऍ͠Թ౓ͷμΠφϛΫεΛղੳ 14

Slide 15

Slide 15 text

ؔ࿈͢ΔจݙʢNIPS2017Ҏ֎ʣ • A Bayesian Perspective on Generalization and Stochastic Gradient Descent "noise scale" ͱֶश཰΍όοναΠζͷؔ܎ࣜΛಋग़ • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour ਫ਼౓Λམͱͣ͞ʹେ͖ͳϛχόοναΠζͰͷֶशΛ࣮ݱ • Understanding deep learning requires rethinking generalization ਂ૚ֶशʢ͚ͩʹؔΘΒͣઢܗܥͰ͑͞ʂʣͷ൚ԽੑೳΛཧղ ͢ΔͨΊʹ৽͍͠࿮૊Έ͕ඞཁͰ͋Δ͜ͱΛఏݴ 15

Slide 16

Slide 16 text

Ҏ߱ͷ࿩ NIPS ࿦จͰ͸ͳ͍͕ɺ͍ۙ಺༰ͰΑΓܥ౷తʹཧղ͕Ͱ͖ΔͨΊ ҎԼͷೋͭͷ࿦จΛઆ໌͢Δ • A Bayesian Perspective on Generalization and Stochastic Gradient Descent • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour ൚Խੑೳࢦඪʹؔͯ͠͸Ҏલʹ ͜Μͳൃද Λͯ͠·͢ ͜Μͳൃද: https://speakerdeck.com/diracdiego/some-understanding-of-generalization-in-deep-learing 16

Slide 17

Slide 17 text

Bayesian evidence ؆୯ͳ৔߹ͱͯ͠1࣍ݩύϥϝλ Λߟ͑ɺࣄޙ֬཰Λߟ͑Δ ͜͜Ͱɺ ͸ೖྗɺ ͸ϥϕϧɺ ͸Ϟσϧ ໬౓Λ exponentiate ͯ͠ cross entropy Ͱදݱ 17

Slide 18

Slide 18 text

Bayesian evidence (Cont'd) gaussian prior Λߟ͑Ε͹ɺະ஌ϥϕϧ ͷ༧ଌ͸࣍ࣜ Ϟσϧൺֱ͸࣍ࣜͷӈลҰ߲໨Λൺ΂Δʢprior ൺ͸1ͱ͢Δʣ ͜ͷ evidence ͕ٞ࿦ͷओ໾ͱͳΔ 18

Slide 19

Slide 19 text

Bayesian evidence (Cont'd) ύϥϝλΛղ ͷपΓͰೋ࣍·Ͱల։ͯ͠ evidence ͸ղʹ͓͚Δଛࣦؔ਺ͱ log(ۂ཰/ਖ਼ଇԽ܎਺) Ͱදݱ ύϥϝλΛ ࣍ݩʹҰൠԽ͢Δͱ Hessian ͷ determinant ΑΓ 固有値 19

Slide 20

Slide 20 text

Bayesian evidence (Cont'd) ൺֱର৅͸ null model ( : Ϋϥε਺) ͞Βʹ Λಋೖͯ͠ ͜ͷ݁Ռ͸Ϟσϧͷ parametrization ʹґΒͣ broad minima ( ͕ খ͍͞) ͕ sharp minima ΑΓ΋Α͘ҰൠԽ͢Δ͜ͱΛࢧ࣋͢Δ 20

Slide 21

Slide 21 text

࣮ݧ: logistic regression Ͱͷ bayesian evidence logistic regression Ͱ MNIST ͷ {0,1} ൑ผ: 800 train, 10000 test ࠨ͸ random label Ͱӈ͸ਖ਼͍͠ label ҙຯ͋Δ৘ใΛ࣋ͭ label Ͱ͸ ͕ 0 ΛԼճΔ Ref: https://arxiv.org/abs/1710.06451 21

Slide 22

Slide 22 text

࣮ݧ: NN Ͱͷ generalization gap 800 hidden units + ReLU Ͱ൑ผ: 1000 train, ࢒Γ͸ test SGD w/ momentum 0.9, learning rate 1.0 Λ࢖༻ batch size ʹΑͬͯ൚Խੑೳʹ͕ࠩੜ͡Δ (generalization gap) Ref: https://arxiv.org/abs/1710.06451 22

Slide 23

Slide 23 text

SGD ʹ͓͚Δ "noise scale" ղੳతͳ࿮૊ΈͰٞ࿦͢ΔͨΊʹ SGD Λ֬཰ඍ෼ํఔࣜͱଊ͑Δ full batch ͱ batch ͷ͕ࠩॏཁͰ͋ͬͨ͜ͱʹ஫ҙ͠ɺޯ഑ʹΑΔ ύϥϝλߋ৽ͷࠩ෼ΛҎԼͷܗʹॻ͘ ͜͜Ͱɺ 23

Slide 24

Slide 24 text

SGD ʹ͓͚Δ "noise scale" (Cont'd) ظ଴஋͸ҎԼͷΑ͏ʹॻ͚Δ , ͜ΕΛ࢖͏ͱ , 24

Slide 25

Slide 25 text

SGD ʹ͓͚Δ "noise scale" (Cont'd) ࿈ଓԽ͞Εͨʢ Λ࿈ଓม਺ͱ͢Δʣ֬཰ඍ෼ํఔࣜͱൺֱ ʢ͍ΘΏΔ overdamped Langevin equationʣ ͜͜Ͱɺ ͸ noise ( , ) ͜ͷ ͸ dynamics ͷ༳Β͗Λنఆ͢Δྔ ཭ࢄతͳύϥϝλߋ৽ͷࣜͷ࿈ଓۙࣅΛऔͬͯൺֱΛ͢Δ 25

Slide 26

Slide 26 text

SGD ʹ͓͚Δ "noise scale" (Cont'd) ൺ ͕े෼খ͍͞ͱͯ͠ҎԼͷؔ܎Λ͚ͭΔ ྆ลΛࣗ৐ͯ͠ظ଴஋ΛऔΔ͜ͱͰɺҎԼͷؔ܎͕ࣜಘΒΕΔ 26

Slide 27

Slide 27 text

࣮ݧ: "noise scale" ͱ൚Խੑೳͷؔ܎ "noise scale" ͕ղͷ൚ԽੑΛنఆ ద੾ͳ ͕൚Խ͞Εͨղ΁ಋ͘ʢେ͖͗͢Ε͹࿈ଓۙࣅෆ੒ཱʣ Ref: https://arxiv.org/abs/1710.06451 27

Slide 28

Slide 28 text

momentum ͷ෇༩ momentum ΛೖΕΔ৔߹͸࣍ͷϥϯδϡόϯํఔࣜΛղੳ ز෼ٕ޼తͳܭࢉΛܦͯҎԼ͕ࣔ͞ΕΔ 28

Slide 29

Slide 29 text

࣮ݧ: ඇৗʹେ͖͍ batch ͱֶश཰Ͱͷ ImageNet ֶश େن໛ֶशͰ΋੒ཱʢ߹Θֶͤͯशॳظʹ warmup ΋࢖༻ʣ Ref: https://arxiv.org/abs/1706.02677 29

Slide 30

Slide 30 text

ਂ૚ֶशͷ൚Խੑೳʹؔ͢Δ·ͱΊ • optimizer ͱͯ͠͸ SGD ͕ཧ࿦తʹ΋࣮ݧతʹ΋ྑͦ͞͏ • ؔ࿈͢Δ༷ʑͳ࿦จ͕ग़͍ͯͯཧղ͕ਐΜͰ͍Δ saddle point ʹऩଋ͢Δ֬཰͸θϩ ref ͨͩ͠ saddle point ۙ๣͔Βͷ୤ग़ʹ͸ࢦ਺࣌ؒඞཁ ref • ͳͥਂ૚ֶश͕༏Ε͍ͯΔͷ͔ʁ΋ཧղ͕ਐΜͰ͍͖ͦ͏ ࠷ۙͰ͸ ͜Ε ͱ͔࿩୊ʹͳ͍ͬͯͨ Ref: https://arxiv.org/abs/1602.04915, https://arxiv.org/abs/1705.10412, https://arxiv.org/abs/1802.04474 30

Slide 31

Slide 31 text

GAN ͷऩଋੑʹؔ͢Δཧղͷਐల 31

Slide 32

Slide 32 text

GAN ͱ͸ σʔλ෼෍ੜ੒Ϟσϧ ͱ൑ผϞσϧ Λڝ߹ֶͤͯ͞श ֶशͨ͠ Λ༻͍ͯཚ਺͔Βੜ੒͞Εͨը૾͕ҎԼ ྑ࣭ͳը૾΋ੜ੒Մೳʹͳ͕ͬͨɺ҆ఆతͳֶशͷऩଋ͕ࠔ೉ Ref: https://arxiv.org/abs/1406.2661, https://arxiv.org/abs/1710.10196 32

Slide 33

Slide 33 text

ڵຯΛ࣋ͬͨจݙ ήʔϜཧ࿦తͳղੳ΍ฏߧ఺ۙ๣Ͱͷ gradient flow ͷղੳͳͲ ਖ਼ଇԽ߲ͱͯ͠͸ඍ෼ਖ਼ଇԽ߲͕׆༂ (double back prop.) • Gradient descent GAN optimization is locally stable • The Numerics of GANs • Improved Training of Wasserstein GANs • Stabilizing Training of Generative Adversarial Networks through Regularization 33

Slide 34

Slide 34 text

ڵຯΛ࣋ͬͨจݙʢCont'dʣ • GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium generator ͱ discriminator Ͱಠཱʹֶश཰Λ࣋ͨͤ nash ۉߧ • Approximation and Convergence Properties of Generative Adversarial Learning ΑΓந৅తͳ࿮૊ΈͰ֤छ GAN ΛऔΓѻ͍ɺऩଋՄೳੑ΍֤ ख๏ؒͷ૬ରతͳڧ͞Λࣔͨ͠ 34

Slide 35

Slide 35 text

Ҏ߱ͷ࿩ جຊతʹҎԼͷ࿦จΛ঺հʢҰ෦ଞͷ࿦จ͔ΒਤΛҾ༻͢Δʣ ଞͷ࿦จͰ΋ຊ࣭తʹ͔ͳΓ͍ۙٞ࿦Λ͍ͯͨ͠Γ͢Δ͕ɺ notation ͸݁ߏόϥόϥͳͷͰཁ஫ҙ • Gradient descent GAN optimization is locally stable ଞͷ࿦จͷ͍͔ͭ͘ʹؔͯ͠͸Ҏલʹ ͜Μͳൃද Λͯ͠·͢ ͜Μͳൃද: https://speakerdeck.com/diracdiego/20180127-nips-paper-reading 35

Slide 36

Slide 36 text

GAN ͷऩଋੑͷݱঢ় • ฏߧ఺ʢnash ۉߧͷҙຯʣ͕ଘࡏ͢Δ͔ඇࣗ໌ zero-sum game ͱ౳Ձͳͷ͸ಛఆͷ৔߹Ͱ͋Δ͜ͱʹ΋஫ҙ • ฏߧ఺͕ଘࡏ͢Δͱͯ͠ऩଋ͢Δͷ͔ඇࣗ໌ • ࣮༻্͸ֶश͕ෆ҆ఆʢhyperparameterʹහײʣ ͜͜Ͱ͸ฏߧ఺ͷଘࡏ͸Ծఆ͠ɺͦͷۙ๣Ͱऩଋ͢Δ͜ͱΛࣔ͢ 36

Slide 37

Slide 37 text

GAN ͷֶशͷఆࣜԽ ໨తؔ਺͸ҎԼ ͜͜Ͱ ͸ concave (original GAN ͸ ) discriminator ͷ஋Ҭ͸ Ͱ͋Δ͜ͱʹ஫ҙ ղੳతͳٞ࿦Λ͢ΔͨΊʹύϥϝλߋ৽ࣜΛ࿈ଓԽͯ͠දݱ 37

Slide 38

Slide 38 text

GAN ͷֶशͷఆࣜԽ (Cont'd) Ҏ߱Ͱ͸໨తؔ਺ΛҎԼͷΑ͏ʹѻ͏ͷͰ஫ҙ 38

Slide 39

Slide 39 text

GAN ͷֶश҆ఆੑΛࣔ͢ͷ͸ͳͥ೉͍͔͠ʁ ฏߧ఺Ͱ͸ min-max game ͳͷͰ໨తؔ਺͸ convex-concave Ͱ͋Δͱخ͍͠ ͔͠͠໨తؔ਺͸ concave-concaveʂઢܗϞσϧͰ͸ ͜Ε͸ ͷ concave ੑʹΑΓͲͷύϥϝλʹؔͯ͠΋ concave 39

Slide 40

Slide 40 text

GAN ͷֶश҆ఆੑΛࣔ͢ͷ͸ͳͥ೉͍͔͠ʁ (Cont'd) ઢܗ͚ͩͰͳ͘ɺଟ߲ࣜ΍ (WGAN) ͷͱ͖΋ੜ͡Δ Ref: https://www.cs.cmu.edu/~vaishnan/nips17_oral.pdf 40

Slide 41

Slide 41 text

GAN ͷֶशͷ҆ఆੑΛࣔͨ͢ΊͷΞϓϩʔν • ඇઢܗྗֶܥ ͕ղੳͷର৅ • gradient flow ͷ҆ఆੑٞ࿦ͷͨΊʹ͸੍ޚܥͷཧ࿦͕༗༻ • େҬతʹ͸೉͍͠ͷͰฏߧ఺ۙ๣ʹݶΔ͜ͱͰઢܗԽͯٞ͠࿦ Hartman-Grobman ఆཧ: ૒ۂܕෆಈ఺ͷۙ๣ͰઢܗԽՄೳ • ͷฏߧ఺ͰͷϠίϏΞϯݻ༗஋ͷ࣮෦͕ෛͰ͋Ε͹҆ఆ 41

Slide 42

Slide 42 text

GAN ͷֶशͷ҆ఆੑΛࣔͨ͢ΊͷΞϓϩʔν (Cont'd) ࠷΋؆୯ͳ৔߹ͱͯ͠1࣍ݩͷܥΛߟ͑ͯΈΔ ϠίϏΞϯ͸ Ͱݻ༗஋͕ Ͱ࣮෦͕ෛ ͜ͷྗֶܥ͸ݪ఺ʹऩଋͯ҆͠ఆʢઁಈͷԼͰෆมʣ 42

Slide 43

Slide 43 text

ฏߧ఺ۙ๣Ͱͷ GAN ͷֶश҆ఆੑ ओͨΔղੳର৅͸ϠίϏΞϯ ͜Εͷݻ༗஋ͷ࣮෦͕ෛʹͳΔ͜ͱΛࣔͤΕ͹Α͍ (1,1) block ͸ Ͱ concave ੑʹΑΓ negative definte ͔͜͜Β͸ԾఆΛೖΕΔ͜ͱͰݻ༗஋࣮෦ͷෛੑΛ୲อ͢Δ 43

Slide 44

Slide 44 text

ฏߧ఺ۙ๣Ͱͷ GAN ͷֶश҆ఆੑ (Cont'd) • Ծఆ1: and • Ծఆ2: and • Ծఆ3: ͸ Discriminator space Ͱ ͸ G ͰہॴҰఆ • Ծఆ4: s.t. 44

Slide 45

Slide 45 text

ฏߧ఺ۙ๣Ͱͷ GAN ͷֶश҆ఆੑ (Cont'd) straightforward ͳܭࢉʹΑΓҎԼ͕ࣔͤΔ ͨͩ͠ɺ ͸ҎԼͰఆٛ 45

Slide 46

Slide 46 text

ฏߧ఺ۙ๣Ͱͷ GAN ͷֶश҆ఆੑ (Cont'd) ߦྻ ͸ ͕ negative definite Ͱ ͕ full column rank ͳΒݻ༗஋࣮෦ͷෛੑΛࣔͤΔʢ ʹ஫ҙʣ ূ໌͸ݪ࿦จΛࢀরʢݻ༗஋ํఔࣜΛ੔ཧͯࣔ͢͠ʣ GANͷฏߧ఺पΓͷ҆ఆੑ͕ࣔͤͨʢਖ਼֬ʹ͸ࢦ਺తʹऩଋʣ 46

Slide 47

Slide 47 text

҆ఆੑΛߴΊΔͨΊͷਖ਼ଇԽ߲ͷߟҊ ʮϠίϏΞϯݻ༗஋ͷ࣮෦͕ෛͰ͋Δʯͱ͍͏͜ͱ͕ॏཁͩͬͨ ͦΕΛ enhance ͢Δͷ͕ generator ʹର͢Δ double back prop. ͜ͷਖ਼ଇԽʹΑΓϠίϏΞϯ͸࣍ͷΑ͏ʹมԽ 47

Slide 48

Slide 48 text

҆ఆੑΛߴΊΔͨΊͷਖ਼ଇԽ߲ͷߟҊ (Cont'd) (2,2) block ͕ negative definite ͳͷͰ҆ఆੑ͕૿͢ ύϥϝλ ͕খ͚͞Ε͹͜Ε·Ͱͷٞ࿦͕յΕͳ͍͜ͱ΋ࣔͤΔ ྨࣅٞ࿦͸ The Numerics of GANs Ͱ΋ͳ͞Ε͍ͯΔ ಋೖ͞Εͨඍ෼ਖ਼ଇԽ߲͸จ຺͸ҟͳΔ͕༷ʑͳ࿦จͰొ৔ 48

Slide 49

Slide 49 text

࣮ݧɿਖ਼ଇԽ߲ʹΑΔղͷऩଋ Ref: https://arxiv.org/abs/1706.04156 49

Slide 50

Slide 50 text

࣮ݧɿmode collapse Ref: https://arxiv.org/abs/1706.04156 50

Slide 51

Slide 51 text

࣮ݧɿֶश࣌ͷ҆ఆੑ ہॴత"Ҏ্"ͷ҆ఆੑ͕ಘΒΕ͍ͯΔΑ͏ʹ΋ݟ͑Δ Ref: https://arxiv.org/abs/1705.10461 51

Slide 52

Slide 52 text

GAN ͷऩଋੑʹؔ͢Δ·ͱΊ • ฏߧ఺पΓͷٞ࿦ͳͲ͸ৄࡉʹͳ͞ΕΔΑ͏ʹͳ͖ͬͯͨ ཧ࿦తʹऩଋΛอূ͢Δʹ͸ͦΕͳΓʹڧ͍Ծఆ͕ඞཁ • ༷ʑͳ࿦จ͕ग़͍ͯΔ͕ɺ͍ۙ͠ߟ͑ͷ΋ͷ΋ଟ͍ • زԿతͳख๏ͰେҬతͳٞ࿦Λͨ͠Γ΋ग़͖ͯͦ͏ ྫ͑͹ GAN ͷධՁͷ৽͍͠ํ޲ੑͱͯ͠ ͜Ε ͱ͔ • ෺ཧ΍਺ֶͱͷڥքྖҬͰͰ͖Δ͜ͱ͸ଟͦ͏ 52

Slide 53

Slide 53 text

ਂ૚ֶशͷ৽ͨͳํ޲ੑ 53

Slide 54

Slide 54 text

ڵຯΛ࣋ͬͨจݙ ৽ͨͳల։ • Dynamic Routing Between Capsules χϡʔϩϯͷvectorҰൠԽͱͦͷؔ܎ੑΛಘΔ࢓૊ΈΛఏҊ • Deep Sets ೖྗͱͯ͠ཁૉͷॱ൪ʹґΒͳ͍ू߹Λѻ͑ΔϞσϧΛߏங • Bayesian GAN BayesianͰGANΛऔΓѻ͏͜ͱͰɺ֤छֶशςΫχοΫ͕ෆཁ 54

Slide 55

Slide 55 text

ڵຯΛ࣋ͬͨจݙ (Cont'd) زԿֶతͳ؍఺͔Βͷൃల • Sobolev Training for Neural Networks ֤૚ͷඍ෼஋΋ֶशʹ࢖༻͢ΔΑ͏ఆࣜԽɺৠཹͳͲͰ࢖͑Δ • Principles of Riemannian Geometry in Neural Networks ϦʔϚϯزԿͰఆࣜԽ͠ɺӈLie܈ͷ࡞༻Ͱ back prop. Λදݱ • Riemannian approach to batch normalization BNΛϦʔϚϯزԿͷ࿮૊ΈͰఆࣜԽ 55

Slide 56

Slide 56 text

ڵຯΛ࣋ͬͨจݙ (Cont'd) ߋͳΔόϥΤςΟ • Attention Is All You Need ࠶ؼߏ଄Λ࢖ΘͣattentionͷΈͰྑ͍݁ՌΛ࣮ݱ • Deep Hyperspherical Learning CNNͷ৞ΈࠐΈΛٿ໘্ͷԋࢉͱͯ͠ఆࣜԽɺྑ͍ऩଋੑ • GibbsNet: Iterative Adversarial Inference for Deep Graphical Models ಉ࣌֬཰ ͷϞσϦϯά 56

Slide 57

Slide 57 text

Ҏ߱ͷ࿩ ݸਓతʹ໘ന͔ͬͨ΋ͷͱͯ͠ҎԼͷೋͭͷ࿦จΛ঺հ • Dynamic Routing Between Capsules (CapsNet) • Deep Sets CapsNet ͸ pooling ΛΑΓΑ͍΋ͷ΁ͱվળ͠Α͏ͱ͍͏΋ͷ Deep Sets ͸ू߹Λೖྗͱͯ͠ѻ͑ΔϞσϧΛ࡞Δͱ͍͏΋ͷ 57

Slide 58

Slide 58 text

CapsNet: ϞσϧʢMNIST༻ʣ σʔλͷಛ௃ؒͷؔ܎ੑΛΑΓଊ͑ΔͨΊʹ neuron ΛϕΫτϧԽ Լਤͷ PrimaryCaps ͷ 8 ͱ DigitCaps ͷ 16 ͕Χϓηϧ࣍ݩ Ref: https://arxiv.org/abs/1710.09829 58

Slide 59

Slide 59 text

CapsNet: Χϓηϧͷೖग़ྗ ग़ྗ (খ͍͞΋ͷΛ௵͢): ೖྗ ( ͸1ͭલͷग़ྗ): ͜͜Ͱ ͸ back prop. Ͱֶश͞ΕΔॏΈͰɺ ͸Կ͔͠ΒͰఆΊΔΧϓηϧؒ݁߹ 59

Slide 60

Slide 60 text

CapsNet: routing algorithm Χϓηϧؒ݁߹ Λೖग़ྗͷ alignment ͔Βಈతʹࢉग़ Ref: https://arxiv.org/abs/1710.09829 60

Slide 61

Slide 61 text

CapsNet: ݁ہΧϓηϧͰԿ͕͔ͨͬͨ͠ͷ͔ʁ • ໰୊ҙࣝ͸ pooling ʹ͓͚Δ৘ใͷ૕ࣦ • ࠷େ஋ͷ routing ͔Β entity ͷؔ܎Λߟྀ͢Δ΋ͷ΁֦ு • ϕΫτϧ֦ுʹΑͬͯ޲͖৘ใͰ entity ؒͷ alignment Λදݱ • ͜ΕʹΑͬͯෆมੑͰͳ͘౳ՁੑΛ࣋ͨͤΔ͜ͱΛࢼΈͨ routing algorithm ͸ҰͭͷಛఆͷΞϧΰϦζϜʹ͗͢ͳ͍ Χϓηϧࣗମ͸2011೥ʹ ͜ͷ࿦จ Ͱಋೖ͞Ε͍ͯΔ ͜ͷ࿦จ: http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf 61

Slide 62

Slide 62 text

CapsNet: ໨తؔ਺ Ϋϥε ʹର͢Δ margin loss ΛҎԼͰఆٛ ͜͜Ͱɺ ͸Χϓηϧग़ྗͰ ͸ೖྗ͕Ϋϥε ͳΒ 1 ΛऔΔ ଞ͸ύϥϝλͰ 62

Slide 63

Slide 63 text

CapsNet: ਖ਼ଇԽͱͯ͠ͷ reconstruction DigitCaps ͔Βݩը૾Λ෮ݩͨ͠ࡍͷ MSE Λਖ਼ଇԽͱͯ͠࢖༻ ※ஶऀʹฉ͍ͨΒͲͪΒ͔ͱ͍͏ՄࢹԽ༻ͷߏ଄ͱͷ͜ͱ Ref: https://arxiv.org/abs/1710.09829 63

Slide 64

Slide 64 text

CapsNet: ࠶ߏங࣮ݧ Ref: https://arxiv.org/abs/1710.09829 64

Slide 65

Slide 65 text

CapsNet: ෼ྨ࣮ݧ MultiMNIST ͸ MNIST ը૾Λ2ͭ૊Έ߹Θͤͯ multi-label Λղ͘ Baseline ͸ܭࢉίετ͕ಉఔ౓ͷ CNN (ύϥϝλ਺͸4ഒ͘Β͍) Ref: https://arxiv.org/abs/1710.09829 65

Slide 66

Slide 66 text

CapsNet: multi digit ͷ࠶ߏங࣮ݧ R: reconstruction, L: label, P: prediciton ࠨ: ྑ͍ྫ, த: RͰ׶͑ͯҧ͏਺ࣈΛ࢖͏ྫ, ӈ: ༧ଌΛؒҧ͑Δྫ 66

Slide 67

Slide 67 text

Deep Sets : Ϟσϧ ஔ׵ෆมͱஔ׵ಉ૬ͱ͍͏֓೦ʹجͮ͘ϞσϦϯά ͜͜ ʹॴײΛॻ͍͍͕ͯͨɺ໘ന͍͚Ͳ࿦จͷઆ໌͸ͪͱෆ਌੾ Ref: https://www.facebook.com/nipsfoundation/videos/1555553784535855/, ͜͜: https://github.com/yoheikikuta/paper-reading/issues/6 67

Slide 68

Slide 68 text

Deep Sets : Ϟσϧ (Cont'd) ೖྗσʔλͷஔ׵ෆมੑΛอͭϞσϧͷߏஙํ๏ 伴͸໨తؔ਺Λ࡞Δʹ͸ ͕ཁٻ͞ΕΔ͜ͱ Ref: https://arxiv.org/abs/1703.06114 68

Slide 69

Slide 69 text

Deep Sets : Ϟσϧ (Cont'd) ໨తؔ਺Λ࡞Δʹ͸ ͕ཁٻ͞ΕΔͱ͍͏ҙຯ͸ʁ ॱ൪ʹґΒͳ͍ྔΛ࡞Ζ͏ͱࢥͬͨΒɺφΠʔϒʹ͸Ճࢉ ࣮ࡍͦΕ͸ਖ਼͘͠ɺੑ࣭Λյ͞ͳ͍ൣғͰม׵ ͕ڐ͞ΕΔ ڵຯͱͯ͠͸ਂ૚ֶशͷߏ଄ͰͦΕ͕࡞ΕΔ͔ʁ → ౴͑͸ yes Ͱ Λ೚ҙଟ߲ࣜΛۙࣅ͢Δؔ਺ͱ͢Δ 69

Slide 70

Slide 70 text

Deep Sets : Ϟσϧ (Cont'd) ஔ׵ಉ૬ͷ NN ͷϞσϧ ͷߏஙํ๏ ͜ͷ ͷཁૉ͸ஔ׵ର৅Ͱɺಛ௃ྔ࣍ݩ͕ ͷ৔߹ʢ֦ுՄʣ Ref: https://arxiv.org/abs/1703.06114 70

Slide 71

Slide 71 text

Deep Sets : ࣮ݧ ໘ന͍ͷ͸৽͍͠λΠϓͷ࣮ݧ͕Ͱ͖Δͱ͍͏఺ • ೖྗσʔλͷ࿨Λܭࢉ • ޫ౓৘ใ͔ΒۜՏͷ੺ํภҠΛࢉग़ • gaussian ͔Βੜ੒ͨ͠σʔλͰΤϯτϩϐʔͳͲΛଌΔ • point cloud ͷ෼ྨɺू߹ͷ֦ுɺset anomaly detectionɺ... 71

Slide 72

Slide 72 text

Deep Sets : MNISTͷ࿨Λग़ྗ͢Δ࣮ݧ ೖྗΛ MNIST ͷ਺ࣈʢࠨʣ΋͘͠͸ը૾ʢӈʣʹͯ͠࿨Λࢉग़ ͷΑ͏ʹஔ׵ෆม Ref: https://arxiv.org/abs/1703.06114 72

Slide 73

Slide 73 text

Deep Sets : ۜՏͷ࣭ྔ͔Β੺ํภҠͷग़ྗ͢Δ࣮ݧ ۜՏΫϥελʹ͸ෳ਺ۜՏ͕ଘࡏɺͦΕͧΕʹޫ౓৘ใ (17࣍ݩ) ֤ΫϥελΛ set ͱͯ͠ɺ੺ํภҠΛࢉग़ (Ӊ஦࿦తʹॏཁ) ※oral ൃද࣌ʹ͸ 0.019 ͱݴ͍ͬͯͨͷͰڪΒ͘ޙͰվળ Ref: https://arxiv.org/abs/1703.06114 73

Slide 74

Slide 74 text

Deep Sets : set anomaly detection ͷ࣮ݧ CelebAͰ࣮ݧɺtest Ͱ 75% ͷਫ਼౓ʢஔ׵ಉ૬૚ͳͩ͠ͱ 6.3%ʣ Ref: https://arxiv.org/abs/1703.06114 74

Slide 75

Slide 75 text

ਂ૚ֶशͷ৽ͨͳํ޲ੑʹؔ͢Δ·ͱΊ • ଟ༷ͳ؍఺͔Β৽ͨͳϞσϦϯά͕໛ࡧ͞Ε͍ͯΔ NN ͷ֦ு زԿֶత؍఺ΛऔΓೖΕͨൃల Bayesian ͳͲଞ෼໺ͱͷ༥߹ • GANʹଓ͘Α͏ͳিܸతͳൃ໌ʹظ଴ 75

Slide 76

Slide 76 text

·ͱΊ 76

Slide 77

Slide 77 text

·ͱΊ • ਂ૚ֶशͷ੎͍͸ͱͲ·Δ͜ͱΛ஌Βͳ͍ • ൚ԽੑೳɺGANऩଋੑɺ৽ͨͳϞσϧͷํ޲ੑɺʹؔͯ͠঺հ • τϐοΫ͸ਂ͘޿͘ͳΓɺ࣮ʹڵຯਂ͍ • ൚ԽੑೳͷΑ͏ͳجૅతͳτϐοΫʹ΋࠶஫໨ • ผྖҬͱަΘΔ෦෼͕େ͖͘ͳΓ৽ͨͳଆ໘͕ग़͖͍ͯͯΔ 77