20180306_NIPS2017_DeepLearning

A182964bc0a261a5fc8bb207d660c743?s=47 yoppe
March 06, 2018

 20180306_NIPS2017_DeepLearning

NIPS2017 ( https://nips.cc/Conferences/2017 ) に参加して、Deep Learning 関連で面白いと思った話題。

A182964bc0a261a5fc8bb207d660c743?s=128

yoppe

March 06, 2018
Tweet

Transcript

  1. NIPS2017ͷਂ૚ֶशؔ࿈ͰڵຯΛ࣋ͬͨ΋ͷ 20180306 ٠ా ངฏ (@yohei_kikuta) Ref: https://nips.cc/Conferences/2017

  2. ໨࣍ • NIPS2017 ʹ͓͚Δਂ૚ֶश • ਂ૚ֶशͷ൚Խੑೳʹؔ͢Δཧղͷਐల • GAN ͷऩଋੑʹؔ͢Δཧղͷਐల •

    ਂ૚ֶशͷ৽ͨͳํ޲ੑ • ·ͱΊ 2
  3. NIPS2017 ʹ͓͚Δਂ૚ֶश 3

  4. ਂ૚ֶशؔ࿈ͷൃද਺ʢ਺͸֓ࢉʣ • Tutorials: શ 9 ݅த 3 ݅ • Invited

    talks: શ 7 ݅த 2 ݅ • Orals: શ 41 ݅த 8 ݅ • Posters: શ 679 ݅த ໿200 ݅ • Workshops: શ 53 ݅த 5 ݅ʢؚ Deep Learning in λΠτϧʣ 4
  5. ਂ૚ֶशؔ࿈ͷൃද: Tutorials • Tutorials: શ 9 ݅த 3 ݅ •

    Deep Learning: Practice and Trends • Deep Probabilistic Modeling with Gaussian Processes • Geometric Deep Learning on Graphs and Manifolds 5
  6. ਂ૚ֶशؔ࿈ͷൃද: Invited talks • Invited talks: શ 7 ݅த 2

    ݅ • Deep Learning for Robotics • On Bayesian Deep Learning and Deep Bayesian Learning 6
  7. ਂ૚ֶशؔ࿈ͷൃද: Orals • Orals: શ 41 ݅த 8 ݅ •

    TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning • Train longer, generalize better: closing the generalization gap in large batch training of neural networks • End-to-End Differentiable Proving • Gradient descent GAN optimization is locally stable 7
  8. ਂ૚ֶशؔ࿈ͷൃද: Orals (Cont'd) • Orals: શ 41 ݅த 8 ݅

    • Imagination-Augmented Agents for Deep Reinforcement Learning • Masked Autoregressive Flow for Density Estimation • Deep Sets • From Bayesian Sparsity to Gated Recurrent Nets 8
  9. ਂ૚ֶशؔ࿈ͷൃද: Posters • Posters: શ 679 ݅த ໿200 ݅ •

    ଟ͗ͯ͢ྻڍͰ͖ͳ͍ ! • ͪͳΈʹಠஅͱภݟͰ਺ΛΧ΢ϯτ͍ͯ͠·͢ • ৄࡉ͸ NIPSͷϖʔδ ΛνΣοΫͯ͠Լ͍͞ NIPSͷϖʔδ: https://nips.cc/Conferences/2017/Schedule?type=Poster 9
  10. ਂ૚ֶशؔ࿈ͷൃද: Workshops • Workshops: શ 53 ݅த 5 ݅ʢؚ Deep

    Learning in λΠτϧʣ • Deep Learning for Physical Sciences • Deep Learning at Supercomputer Scale • Deep Learning: Bridging Theory and Practice • Bayesian Deep Learning • Interpreting, Explaining and Visualizing Deep Learning - Now what? 10
  11. ਂ૚ֶशݚڀͷํ޲ੑʹؔ͢Δॴײ ਂ૚ֶशͷ੎͍͸૿͠ଓ͚͍ͯΔ • τϨϯυ΋ଟ༷Խ • ൚ԽੑೳͷཧղɺGANͷོ੝ɺ৽ͨͳϞσϦϯάͷํ޲ੑ • େن໛෼ࢄֶशɺϕΠδΞϯɺϩϘςΟΫεɺ... • ͜ͷࢿྉͰ͸্هҰͭ໨ʹ͍͔ؔͯͭ͘͠ͷ࿦จΛ঺հ

    11
  12. ਂ૚ֶशͷ൚Խੑೳʹؔ͢Δཧղͷਐల 12

  13. ൚Խੑೳͱ͸ • ֶशͰ࢖͍ͬͯͳ͍σʔλʹର͢ΔϞσϧͷੑೳ • ਂ૚ֶशͷొ৔Ͱ൚Խੑೳʹ࠶౓஫໨͕ू·͍ͬͯΔ • (ύϥϝλ਺) >> (σʔλ਺)ͷΑ͏ͳ "φϯηϯε"

    ͳϞσϧ • ͦΕͰ΋ѹ౗త൚ԽੑೳΛ࣋ͪɺͦͷੑ࣭Λ໌Β͔ʹ͍ͨ͠ • ͜ͷ࿩୊͸ਂ૚ֶश͚ͩʹݶΒͳ͍ 13
  14. ڵຯΛ࣋ͬͨจݙʢNIPS2017ʣ • Train longer, generalize better: closing the generalization gap

    in large batch training of neural networks ൚Խੑೳͱֶश཰ͱόοναΠζͷؔ܎Λߟ࡯ɺGBN ͷఏҊ • Exploring Generalization in Deep Learning ൚ԽੑೳࢦඪΛݕ౼ɺPAC-Bayesian ʹجͮ͘ sharpness ΛఏҊ • Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks SGD Λϥϯδϡόϯํఔࣜͱղऍ͠Թ౓ͷμΠφϛΫεΛղੳ 14
  15. ؔ࿈͢ΔจݙʢNIPS2017Ҏ֎ʣ • A Bayesian Perspective on Generalization and Stochastic Gradient

    Descent "noise scale" ͱֶश཰΍όοναΠζͷؔ܎ࣜΛಋग़ • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour ਫ਼౓Λམͱͣ͞ʹେ͖ͳϛχόοναΠζͰͷֶशΛ࣮ݱ • Understanding deep learning requires rethinking generalization ਂ૚ֶशʢ͚ͩʹؔΘΒͣઢܗܥͰ͑͞ʂʣͷ൚ԽੑೳΛཧղ ͢ΔͨΊʹ৽͍͠࿮૊Έ͕ඞཁͰ͋Δ͜ͱΛఏݴ 15
  16. Ҏ߱ͷ࿩ NIPS ࿦จͰ͸ͳ͍͕ɺ͍ۙ಺༰ͰΑΓܥ౷తʹཧղ͕Ͱ͖ΔͨΊ ҎԼͷೋͭͷ࿦จΛઆ໌͢Δ • A Bayesian Perspective on Generalization

    and Stochastic Gradient Descent • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour ൚Խੑೳࢦඪʹؔͯ͠͸Ҏલʹ ͜Μͳൃද Λͯ͠·͢ ͜Μͳൃද: https://speakerdeck.com/diracdiego/some-understanding-of-generalization-in-deep-learing 16
  17. Bayesian evidence ؆୯ͳ৔߹ͱͯ͠1࣍ݩύϥϝλ Λߟ͑ɺࣄޙ֬཰Λߟ͑Δ ͜͜Ͱɺ ͸ೖྗɺ ͸ϥϕϧɺ ͸Ϟσϧ ໬౓Λ exponentiate

    ͯ͠ cross entropy Ͱදݱ 17
  18. Bayesian evidence (Cont'd) gaussian prior Λߟ͑Ε͹ɺະ஌ϥϕϧ ͷ༧ଌ͸࣍ࣜ Ϟσϧൺֱ͸࣍ࣜͷӈลҰ߲໨Λൺ΂Δʢprior ൺ͸1ͱ͢Δʣ ͜ͷ

    evidence ͕ٞ࿦ͷओ໾ͱͳΔ 18
  19. Bayesian evidence (Cont'd) ύϥϝλΛղ ͷपΓͰೋ࣍·Ͱల։ͯ͠ evidence ͸ղʹ͓͚Δଛࣦؔ਺ͱ log(ۂ཰/ਖ਼ଇԽ܎਺) Ͱදݱ ύϥϝλΛ

    ࣍ݩʹҰൠԽ͢Δͱ Hessian ͷ determinant ΑΓ 固有値 19
  20. Bayesian evidence (Cont'd) ൺֱର৅͸ null model ( : Ϋϥε਺) ͞Βʹ

    Λಋೖͯ͠ ͜ͷ݁Ռ͸Ϟσϧͷ parametrization ʹґΒͣ broad minima ( ͕ খ͍͞) ͕ sharp minima ΑΓ΋Α͘ҰൠԽ͢Δ͜ͱΛࢧ࣋͢Δ 20
  21. ࣮ݧ: logistic regression Ͱͷ bayesian evidence logistic regression Ͱ MNIST

    ͷ {0,1} ൑ผ: 800 train, 10000 test ࠨ͸ random label Ͱӈ͸ਖ਼͍͠ label ҙຯ͋Δ৘ใΛ࣋ͭ label Ͱ͸ ͕ 0 ΛԼճΔ Ref: https://arxiv.org/abs/1710.06451 21
  22. ࣮ݧ: NN Ͱͷ generalization gap 800 hidden units + ReLU

    Ͱ൑ผ: 1000 train, ࢒Γ͸ test SGD w/ momentum 0.9, learning rate 1.0 Λ࢖༻ batch size ʹΑͬͯ൚Խੑೳʹ͕ࠩੜ͡Δ (generalization gap) Ref: https://arxiv.org/abs/1710.06451 22
  23. SGD ʹ͓͚Δ "noise scale" ղੳతͳ࿮૊ΈͰٞ࿦͢ΔͨΊʹ SGD Λ֬཰ඍ෼ํఔࣜͱଊ͑Δ full batch ͱ

    batch ͷ͕ࠩॏཁͰ͋ͬͨ͜ͱʹ஫ҙ͠ɺޯ഑ʹΑΔ ύϥϝλߋ৽ͷࠩ෼ΛҎԼͷܗʹॻ͘ ͜͜Ͱɺ 23
  24. SGD ʹ͓͚Δ "noise scale" (Cont'd) ظ଴஋͸ҎԼͷΑ͏ʹॻ͚Δ , ͜ΕΛ࢖͏ͱ , 24

  25. SGD ʹ͓͚Δ "noise scale" (Cont'd) ࿈ଓԽ͞Εͨʢ Λ࿈ଓม਺ͱ͢Δʣ֬཰ඍ෼ํఔࣜͱൺֱ ʢ͍ΘΏΔ overdamped Langevin

    equationʣ ͜͜Ͱɺ ͸ noise ( , ) ͜ͷ ͸ dynamics ͷ༳Β͗Λنఆ͢Δྔ ཭ࢄతͳύϥϝλߋ৽ͷࣜͷ࿈ଓۙࣅΛऔͬͯൺֱΛ͢Δ 25
  26. SGD ʹ͓͚Δ "noise scale" (Cont'd) ൺ ͕े෼খ͍͞ͱͯ͠ҎԼͷؔ܎Λ͚ͭΔ ྆ลΛࣗ৐ͯ͠ظ଴஋ΛऔΔ͜ͱͰɺҎԼͷؔ܎͕ࣜಘΒΕΔ 26

  27. ࣮ݧ: "noise scale" ͱ൚Խੑೳͷؔ܎ "noise scale" ͕ղͷ൚ԽੑΛنఆ ద੾ͳ ͕൚Խ͞Εͨղ΁ಋ͘ʢେ͖͗͢Ε͹࿈ଓۙࣅෆ੒ཱʣ Ref:

    https://arxiv.org/abs/1710.06451 27
  28. momentum ͷ෇༩ momentum ΛೖΕΔ৔߹͸࣍ͷϥϯδϡόϯํఔࣜΛղੳ ز෼ٕ޼తͳܭࢉΛܦͯҎԼ͕ࣔ͞ΕΔ 28

  29. ࣮ݧ: ඇৗʹେ͖͍ batch ͱֶश཰Ͱͷ ImageNet ֶश େن໛ֶशͰ΋੒ཱʢ߹Θֶͤͯशॳظʹ warmup ΋࢖༻ʣ Ref:

    https://arxiv.org/abs/1706.02677 29
  30. ਂ૚ֶशͷ൚Խੑೳʹؔ͢Δ·ͱΊ • optimizer ͱͯ͠͸ SGD ͕ཧ࿦తʹ΋࣮ݧతʹ΋ྑͦ͞͏ • ؔ࿈͢Δ༷ʑͳ࿦จ͕ग़͍ͯͯཧղ͕ਐΜͰ͍Δ saddle point

    ʹऩଋ͢Δ֬཰͸θϩ ref ͨͩ͠ saddle point ۙ๣͔Βͷ୤ग़ʹ͸ࢦ਺࣌ؒඞཁ ref • ͳͥਂ૚ֶश͕༏Ε͍ͯΔͷ͔ʁ΋ཧղ͕ਐΜͰ͍͖ͦ͏ ࠷ۙͰ͸ ͜Ε ͱ͔࿩୊ʹͳ͍ͬͯͨ Ref: https://arxiv.org/abs/1602.04915, https://arxiv.org/abs/1705.10412, https://arxiv.org/abs/1802.04474 30
  31. GAN ͷऩଋੑʹؔ͢Δཧղͷਐల 31

  32. GAN ͱ͸ σʔλ෼෍ੜ੒Ϟσϧ ͱ൑ผϞσϧ Λڝ߹ֶͤͯ͞श ֶशͨ͠ Λ༻͍ͯཚ਺͔Βੜ੒͞Εͨը૾͕ҎԼ ྑ࣭ͳը૾΋ੜ੒Մೳʹͳ͕ͬͨɺ҆ఆతͳֶशͷऩଋ͕ࠔ೉ Ref: https://arxiv.org/abs/1406.2661,

    https://arxiv.org/abs/1710.10196 32
  33. ڵຯΛ࣋ͬͨจݙ ήʔϜཧ࿦తͳղੳ΍ฏߧ఺ۙ๣Ͱͷ gradient flow ͷղੳͳͲ ਖ਼ଇԽ߲ͱͯ͠͸ඍ෼ਖ਼ଇԽ߲͕׆༂ (double back prop.) •

    Gradient descent GAN optimization is locally stable • The Numerics of GANs • Improved Training of Wasserstein GANs • Stabilizing Training of Generative Adversarial Networks through Regularization 33
  34. ڵຯΛ࣋ͬͨจݙʢCont'dʣ • GANs Trained by a Two Time-Scale Update Rule

    Converge to a Local Nash Equilibrium generator ͱ discriminator Ͱಠཱʹֶश཰Λ࣋ͨͤ nash ۉߧ • Approximation and Convergence Properties of Generative Adversarial Learning ΑΓந৅తͳ࿮૊ΈͰ֤छ GAN ΛऔΓѻ͍ɺऩଋՄೳੑ΍֤ ख๏ؒͷ૬ରతͳڧ͞Λࣔͨ͠ 34
  35. Ҏ߱ͷ࿩ جຊతʹҎԼͷ࿦จΛ঺հʢҰ෦ଞͷ࿦จ͔ΒਤΛҾ༻͢Δʣ ଞͷ࿦จͰ΋ຊ࣭తʹ͔ͳΓ͍ۙٞ࿦Λ͍ͯͨ͠Γ͢Δ͕ɺ notation ͸݁ߏόϥόϥͳͷͰཁ஫ҙ • Gradient descent GAN optimization

    is locally stable ଞͷ࿦จͷ͍͔ͭ͘ʹؔͯ͠͸Ҏલʹ ͜Μͳൃද Λͯ͠·͢ ͜Μͳൃද: https://speakerdeck.com/diracdiego/20180127-nips-paper-reading 35
  36. GAN ͷऩଋੑͷݱঢ় • ฏߧ఺ʢnash ۉߧͷҙຯʣ͕ଘࡏ͢Δ͔ඇࣗ໌ zero-sum game ͱ౳Ձͳͷ͸ಛఆͷ৔߹Ͱ͋Δ͜ͱʹ΋஫ҙ • ฏߧ఺͕ଘࡏ͢Δͱͯ͠ऩଋ͢Δͷ͔ඇࣗ໌

    • ࣮༻্͸ֶश͕ෆ҆ఆʢhyperparameterʹහײʣ ͜͜Ͱ͸ฏߧ఺ͷଘࡏ͸Ծఆ͠ɺͦͷۙ๣Ͱऩଋ͢Δ͜ͱΛࣔ͢ 36
  37. GAN ͷֶशͷఆࣜԽ ໨తؔ਺͸ҎԼ ͜͜Ͱ ͸ concave (original GAN ͸ )

    discriminator ͷ஋Ҭ͸ Ͱ͋Δ͜ͱʹ஫ҙ ղੳతͳٞ࿦Λ͢ΔͨΊʹύϥϝλߋ৽ࣜΛ࿈ଓԽͯ͠දݱ 37
  38. GAN ͷֶशͷఆࣜԽ (Cont'd) Ҏ߱Ͱ͸໨తؔ਺ΛҎԼͷΑ͏ʹѻ͏ͷͰ஫ҙ 38

  39. GAN ͷֶश҆ఆੑΛࣔ͢ͷ͸ͳͥ೉͍͔͠ʁ ฏߧ఺Ͱ͸ min-max game ͳͷͰ໨తؔ਺͸ convex-concave Ͱ͋Δͱخ͍͠ ͔͠͠໨తؔ਺͸ concave-concaveʂઢܗϞσϧͰ͸

    ͜Ε͸ ͷ concave ੑʹΑΓͲͷύϥϝλʹؔͯ͠΋ concave 39
  40. GAN ͷֶश҆ఆੑΛࣔ͢ͷ͸ͳͥ೉͍͔͠ʁ (Cont'd) ઢܗ͚ͩͰͳ͘ɺଟ߲ࣜ΍ (WGAN) ͷͱ͖΋ੜ͡Δ Ref: https://www.cs.cmu.edu/~vaishnan/nips17_oral.pdf 40

  41. GAN ͷֶशͷ҆ఆੑΛࣔͨ͢ΊͷΞϓϩʔν • ඇઢܗྗֶܥ ͕ղੳͷର৅ • gradient flow ͷ҆ఆੑٞ࿦ͷͨΊʹ͸੍ޚܥͷཧ࿦͕༗༻ •

    େҬతʹ͸೉͍͠ͷͰฏߧ఺ۙ๣ʹݶΔ͜ͱͰઢܗԽͯٞ͠࿦ Hartman-Grobman ఆཧ: ૒ۂܕෆಈ఺ͷۙ๣ͰઢܗԽՄೳ • ͷฏߧ఺ͰͷϠίϏΞϯݻ༗஋ͷ࣮෦͕ෛͰ͋Ε͹҆ఆ 41
  42. GAN ͷֶशͷ҆ఆੑΛࣔͨ͢ΊͷΞϓϩʔν (Cont'd) ࠷΋؆୯ͳ৔߹ͱͯ͠1࣍ݩͷܥΛߟ͑ͯΈΔ ϠίϏΞϯ͸ Ͱݻ༗஋͕ Ͱ࣮෦͕ෛ ͜ͷྗֶܥ͸ݪ఺ʹऩଋͯ҆͠ఆʢઁಈͷԼͰෆมʣ 42

  43. ฏߧ఺ۙ๣Ͱͷ GAN ͷֶश҆ఆੑ ओͨΔղੳର৅͸ϠίϏΞϯ ͜Εͷݻ༗஋ͷ࣮෦͕ෛʹͳΔ͜ͱΛࣔͤΕ͹Α͍ (1,1) block ͸ Ͱ concave

    ੑʹΑΓ negative definte ͔͜͜Β͸ԾఆΛೖΕΔ͜ͱͰݻ༗஋࣮෦ͷෛੑΛ୲อ͢Δ 43
  44. ฏߧ఺ۙ๣Ͱͷ GAN ͷֶश҆ఆੑ (Cont'd) • Ծఆ1: and • Ծఆ2: and

    • Ծఆ3: ͸ Discriminator space Ͱ ͸ G ͰہॴҰఆ • Ծఆ4: s.t. 44
  45. ฏߧ఺ۙ๣Ͱͷ GAN ͷֶश҆ఆੑ (Cont'd) straightforward ͳܭࢉʹΑΓҎԼ͕ࣔͤΔ ͨͩ͠ɺ ͸ҎԼͰఆٛ 45

  46. ฏߧ఺ۙ๣Ͱͷ GAN ͷֶश҆ఆੑ (Cont'd) ߦྻ ͸ ͕ negative definite Ͱ

    ͕ full column rank ͳΒݻ༗஋࣮෦ͷෛੑΛࣔͤΔʢ ʹ஫ҙʣ ূ໌͸ݪ࿦จΛࢀরʢݻ༗஋ํఔࣜΛ੔ཧͯࣔ͢͠ʣ GANͷฏߧ఺पΓͷ҆ఆੑ͕ࣔͤͨʢਖ਼֬ʹ͸ࢦ਺తʹऩଋʣ 46
  47. ҆ఆੑΛߴΊΔͨΊͷਖ਼ଇԽ߲ͷߟҊ ʮϠίϏΞϯݻ༗஋ͷ࣮෦͕ෛͰ͋Δʯͱ͍͏͜ͱ͕ॏཁͩͬͨ ͦΕΛ enhance ͢Δͷ͕ generator ʹର͢Δ double back prop.

    ͜ͷਖ਼ଇԽʹΑΓϠίϏΞϯ͸࣍ͷΑ͏ʹมԽ 47
  48. ҆ఆੑΛߴΊΔͨΊͷਖ਼ଇԽ߲ͷߟҊ (Cont'd) (2,2) block ͕ negative definite ͳͷͰ҆ఆੑ͕૿͢ ύϥϝλ ͕খ͚͞Ε͹͜Ε·Ͱͷٞ࿦͕յΕͳ͍͜ͱ΋ࣔͤΔ

    ྨࣅٞ࿦͸ The Numerics of GANs Ͱ΋ͳ͞Ε͍ͯΔ ಋೖ͞Εͨඍ෼ਖ਼ଇԽ߲͸จ຺͸ҟͳΔ͕༷ʑͳ࿦จͰొ৔ 48
  49. ࣮ݧɿਖ਼ଇԽ߲ʹΑΔղͷऩଋ Ref: https://arxiv.org/abs/1706.04156 49

  50. ࣮ݧɿmode collapse Ref: https://arxiv.org/abs/1706.04156 50

  51. ࣮ݧɿֶश࣌ͷ҆ఆੑ ہॴత"Ҏ্"ͷ҆ఆੑ͕ಘΒΕ͍ͯΔΑ͏ʹ΋ݟ͑Δ Ref: https://arxiv.org/abs/1705.10461 51

  52. GAN ͷऩଋੑʹؔ͢Δ·ͱΊ • ฏߧ఺पΓͷٞ࿦ͳͲ͸ৄࡉʹͳ͞ΕΔΑ͏ʹͳ͖ͬͯͨ ཧ࿦తʹऩଋΛอূ͢Δʹ͸ͦΕͳΓʹڧ͍Ծఆ͕ඞཁ • ༷ʑͳ࿦จ͕ग़͍ͯΔ͕ɺ͍ۙ͠ߟ͑ͷ΋ͷ΋ଟ͍ • زԿతͳख๏ͰେҬతͳٞ࿦Λͨ͠Γ΋ग़͖ͯͦ͏ ྫ͑͹

    GAN ͷධՁͷ৽͍͠ํ޲ੑͱͯ͠ ͜Ε ͱ͔ • ෺ཧ΍਺ֶͱͷڥքྖҬͰͰ͖Δ͜ͱ͸ଟͦ͏ 52
  53. ਂ૚ֶशͷ৽ͨͳํ޲ੑ 53

  54. ڵຯΛ࣋ͬͨจݙ ৽ͨͳల։ • Dynamic Routing Between Capsules χϡʔϩϯͷvectorҰൠԽͱͦͷؔ܎ੑΛಘΔ࢓૊ΈΛఏҊ • Deep

    Sets ೖྗͱͯ͠ཁૉͷॱ൪ʹґΒͳ͍ू߹Λѻ͑ΔϞσϧΛߏங • Bayesian GAN BayesianͰGANΛऔΓѻ͏͜ͱͰɺ֤छֶशςΫχοΫ͕ෆཁ 54
  55. ڵຯΛ࣋ͬͨจݙ (Cont'd) زԿֶతͳ؍఺͔Βͷൃల • Sobolev Training for Neural Networks ֤૚ͷඍ෼஋΋ֶशʹ࢖༻͢ΔΑ͏ఆࣜԽɺৠཹͳͲͰ࢖͑Δ

    • Principles of Riemannian Geometry in Neural Networks ϦʔϚϯزԿͰఆࣜԽ͠ɺӈLie܈ͷ࡞༻Ͱ back prop. Λදݱ • Riemannian approach to batch normalization BNΛϦʔϚϯزԿͷ࿮૊ΈͰఆࣜԽ 55
  56. ڵຯΛ࣋ͬͨจݙ (Cont'd) ߋͳΔόϥΤςΟ • Attention Is All You Need ࠶ؼߏ଄Λ࢖ΘͣattentionͷΈͰྑ͍݁ՌΛ࣮ݱ

    • Deep Hyperspherical Learning CNNͷ৞ΈࠐΈΛٿ໘্ͷԋࢉͱͯ͠ఆࣜԽɺྑ͍ऩଋੑ • GibbsNet: Iterative Adversarial Inference for Deep Graphical Models ಉ࣌֬཰ ͷϞσϦϯά 56
  57. Ҏ߱ͷ࿩ ݸਓతʹ໘ന͔ͬͨ΋ͷͱͯ͠ҎԼͷೋͭͷ࿦จΛ঺հ • Dynamic Routing Between Capsules (CapsNet) • Deep

    Sets CapsNet ͸ pooling ΛΑΓΑ͍΋ͷ΁ͱվળ͠Α͏ͱ͍͏΋ͷ Deep Sets ͸ू߹Λೖྗͱͯ͠ѻ͑ΔϞσϧΛ࡞Δͱ͍͏΋ͷ 57
  58. CapsNet: ϞσϧʢMNIST༻ʣ σʔλͷಛ௃ؒͷؔ܎ੑΛΑΓଊ͑ΔͨΊʹ neuron ΛϕΫτϧԽ Լਤͷ PrimaryCaps ͷ 8 ͱ

    DigitCaps ͷ 16 ͕Χϓηϧ࣍ݩ Ref: https://arxiv.org/abs/1710.09829 58
  59. CapsNet: Χϓηϧͷೖग़ྗ ग़ྗ (খ͍͞΋ͷΛ௵͢): ೖྗ ( ͸1ͭલͷग़ྗ): ͜͜Ͱ ͸ back

    prop. Ͱֶश͞ΕΔॏΈͰɺ ͸Կ͔͠ΒͰఆΊΔΧϓηϧؒ݁߹ 59
  60. CapsNet: routing algorithm Χϓηϧؒ݁߹ Λೖग़ྗͷ alignment ͔Βಈతʹࢉग़ Ref: https://arxiv.org/abs/1710.09829 60

  61. CapsNet: ݁ہΧϓηϧͰԿ͕͔ͨͬͨ͠ͷ͔ʁ • ໰୊ҙࣝ͸ pooling ʹ͓͚Δ৘ใͷ૕ࣦ • ࠷େ஋ͷ routing ͔Β

    entity ͷؔ܎Λߟྀ͢Δ΋ͷ΁֦ு • ϕΫτϧ֦ுʹΑͬͯ޲͖৘ใͰ entity ؒͷ alignment Λදݱ • ͜ΕʹΑͬͯෆมੑͰͳ͘౳ՁੑΛ࣋ͨͤΔ͜ͱΛࢼΈͨ routing algorithm ͸ҰͭͷಛఆͷΞϧΰϦζϜʹ͗͢ͳ͍ Χϓηϧࣗମ͸2011೥ʹ ͜ͷ࿦จ Ͱಋೖ͞Ε͍ͯΔ ͜ͷ࿦จ: http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf 61
  62. CapsNet: ໨తؔ਺ Ϋϥε ʹର͢Δ margin loss ΛҎԼͰఆٛ ͜͜Ͱɺ ͸Χϓηϧग़ྗͰ ͸ೖྗ͕Ϋϥε

    ͳΒ 1 ΛऔΔ ଞ͸ύϥϝλͰ 62
  63. CapsNet: ਖ਼ଇԽͱͯ͠ͷ reconstruction DigitCaps ͔Βݩը૾Λ෮ݩͨ͠ࡍͷ MSE Λਖ਼ଇԽͱͯ͠࢖༻ ※ஶऀʹฉ͍ͨΒͲͪΒ͔ͱ͍͏ՄࢹԽ༻ͷߏ଄ͱͷ͜ͱ Ref: https://arxiv.org/abs/1710.09829

    63
  64. CapsNet: ࠶ߏங࣮ݧ Ref: https://arxiv.org/abs/1710.09829 64

  65. CapsNet: ෼ྨ࣮ݧ MultiMNIST ͸ MNIST ը૾Λ2ͭ૊Έ߹Θͤͯ multi-label Λղ͘ Baseline ͸ܭࢉίετ͕ಉఔ౓ͷ

    CNN (ύϥϝλ਺͸4ഒ͘Β͍) Ref: https://arxiv.org/abs/1710.09829 65
  66. CapsNet: multi digit ͷ࠶ߏங࣮ݧ R: reconstruction, L: label, P: prediciton

    ࠨ: ྑ͍ྫ, த: RͰ׶͑ͯҧ͏਺ࣈΛ࢖͏ྫ, ӈ: ༧ଌΛؒҧ͑Δྫ 66
  67. Deep Sets : Ϟσϧ ஔ׵ෆมͱஔ׵ಉ૬ͱ͍͏֓೦ʹجͮ͘ϞσϦϯά ͜͜ ʹॴײΛॻ͍͍͕ͯͨɺ໘ന͍͚Ͳ࿦จͷઆ໌͸ͪͱෆ਌੾ Ref: https://www.facebook.com/nipsfoundation/videos/1555553784535855/, ͜͜:

    https://github.com/yoheikikuta/paper-reading/issues/6 67
  68. Deep Sets : Ϟσϧ (Cont'd) ೖྗσʔλͷஔ׵ෆมੑΛอͭϞσϧͷߏஙํ๏ 伴͸໨తؔ਺Λ࡞Δʹ͸ ͕ཁٻ͞ΕΔ͜ͱ Ref: https://arxiv.org/abs/1703.06114

    68
  69. Deep Sets : Ϟσϧ (Cont'd) ໨తؔ਺Λ࡞Δʹ͸ ͕ཁٻ͞ΕΔͱ͍͏ҙຯ͸ʁ ॱ൪ʹґΒͳ͍ྔΛ࡞Ζ͏ͱࢥͬͨΒɺφΠʔϒʹ͸Ճࢉ ࣮ࡍͦΕ͸ਖ਼͘͠ɺੑ࣭Λյ͞ͳ͍ൣғͰม׵ ͕ڐ͞ΕΔ

    ڵຯͱͯ͠͸ਂ૚ֶशͷߏ଄ͰͦΕ͕࡞ΕΔ͔ʁ → ౴͑͸ yes Ͱ Λ೚ҙଟ߲ࣜΛۙࣅ͢Δؔ਺ͱ͢Δ 69
  70. Deep Sets : Ϟσϧ (Cont'd) ஔ׵ಉ૬ͷ NN ͷϞσϧ ͷߏஙํ๏ ͜ͷ

    ͷཁૉ͸ஔ׵ର৅Ͱɺಛ௃ྔ࣍ݩ͕ ͷ৔߹ʢ֦ுՄʣ Ref: https://arxiv.org/abs/1703.06114 70
  71. Deep Sets : ࣮ݧ ໘ന͍ͷ͸৽͍͠λΠϓͷ࣮ݧ͕Ͱ͖Δͱ͍͏఺ • ೖྗσʔλͷ࿨Λܭࢉ • ޫ౓৘ใ͔ΒۜՏͷ੺ํภҠΛࢉग़ •

    gaussian ͔Βੜ੒ͨ͠σʔλͰΤϯτϩϐʔͳͲΛଌΔ • point cloud ͷ෼ྨɺू߹ͷ֦ுɺset anomaly detectionɺ... 71
  72. Deep Sets : MNISTͷ࿨Λग़ྗ͢Δ࣮ݧ ೖྗΛ MNIST ͷ਺ࣈʢࠨʣ΋͘͠͸ը૾ʢӈʣʹͯ͠࿨Λࢉग़ ͷΑ͏ʹஔ׵ෆม Ref: https://arxiv.org/abs/1703.06114

    72
  73. Deep Sets : ۜՏͷ࣭ྔ͔Β੺ํภҠͷग़ྗ͢Δ࣮ݧ ۜՏΫϥελʹ͸ෳ਺ۜՏ͕ଘࡏɺͦΕͧΕʹޫ౓৘ใ (17࣍ݩ) ֤ΫϥελΛ set ͱͯ͠ɺ੺ํภҠΛࢉग़ (Ӊ஦࿦తʹॏཁ)

    ※oral ൃද࣌ʹ͸ 0.019 ͱݴ͍ͬͯͨͷͰڪΒ͘ޙͰվળ Ref: https://arxiv.org/abs/1703.06114 73
  74. Deep Sets : set anomaly detection ͷ࣮ݧ CelebAͰ࣮ݧɺtest Ͱ 75%

    ͷਫ਼౓ʢஔ׵ಉ૬૚ͳͩ͠ͱ 6.3%ʣ Ref: https://arxiv.org/abs/1703.06114 74
  75. ਂ૚ֶशͷ৽ͨͳํ޲ੑʹؔ͢Δ·ͱΊ • ଟ༷ͳ؍఺͔Β৽ͨͳϞσϦϯά͕໛ࡧ͞Ε͍ͯΔ NN ͷ֦ு زԿֶత؍఺ΛऔΓೖΕͨൃల Bayesian ͳͲଞ෼໺ͱͷ༥߹ • GANʹଓ͘Α͏ͳিܸతͳൃ໌ʹظ଴

    75
  76. ·ͱΊ 76

  77. ·ͱΊ • ਂ૚ֶशͷ੎͍͸ͱͲ·Δ͜ͱΛ஌Βͳ͍ • ൚ԽੑೳɺGANऩଋੑɺ৽ͨͳϞσϧͷํ޲ੑɺʹؔͯ͠঺հ • τϐοΫ͸ਂ͘޿͘ͳΓɺ࣮ʹڵຯਂ͍ • ൚ԽੑೳͷΑ͏ͳجૅతͳτϐοΫʹ΋࠶஫໨ •

    ผྖҬͱަΘΔ෦෼͕େ͖͘ͳΓ৽ͨͳଆ໘͕ग़͖͍ͯͯΔ 77