ਂֶशؔ࿈ͷൃද: Tutorials
• Tutorials: શ 9 ݅த 3 ݅
• Deep Learning: Practice and Trends
• Deep Probabilistic Modeling with Gaussian Processes
• Geometric Deep Learning on Graphs and Manifolds
5
Slide 6
Slide 6 text
ਂֶशؔ࿈ͷൃද: Invited talks
• Invited talks: શ 7 ݅த 2 ݅
• Deep Learning for Robotics
• On Bayesian Deep Learning and Deep Bayesian Learning
6
Slide 7
Slide 7 text
ਂֶशؔ࿈ͷൃද: Orals
• Orals: શ 41 ݅த 8 ݅
• TernGrad: Ternary Gradients to Reduce Communication in
Distributed Deep Learning
• Train longer, generalize better: closing the generalization gap
in large batch training of neural networks
• End-to-End Differentiable Proving
• Gradient descent GAN optimization is locally stable
7
Slide 8
Slide 8 text
ਂֶशؔ࿈ͷൃද: Orals (Cont'd)
• Orals: શ 41 ݅த 8 ݅
• Imagination-Augmented Agents for Deep Reinforcement
Learning
• Masked Autoregressive Flow for Density Estimation
• Deep Sets
• From Bayesian Sparsity to Gated Recurrent Nets
8
ਂֶशؔ࿈ͷൃද: Workshops
• Workshops: શ 53 ݅த 5 ݅ʢؚ Deep Learning in λΠτϧʣ
• Deep Learning for Physical Sciences
• Deep Learning at Supercomputer Scale
• Deep Learning: Bridging Theory and Practice
• Bayesian Deep Learning
• Interpreting, Explaining and Visualizing Deep Learning - Now
what?
10
ڵຯΛ࣋ͬͨจݙʢNIPS2017ʣ
• Train longer, generalize better: closing the generalization gap in
large batch training of neural networks
൚ԽੑೳͱֶशͱόοναΠζͷؔΛߟɺGBN ͷఏҊ
• Exploring Generalization in Deep Learning
൚ԽੑೳࢦඪΛݕ౼ɺPAC-Bayesian ʹجͮ͘ sharpness ΛఏҊ
• Langevin Dynamics with Continuous Tempering for Training
Deep Neural Networks
SGD Λϥϯδϡόϯํఔࣜͱղऍ͠ԹͷμΠφϛΫεΛղੳ
14
Slide 15
Slide 15 text
ؔ࿈͢ΔจݙʢNIPS2017Ҏ֎ʣ
• A Bayesian Perspective on Generalization and Stochastic
Gradient Descent
"noise scale" ͱֶशόοναΠζͷؔࣜΛಋग़
• Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
ਫ਼Λམͱͣ͞ʹେ͖ͳϛχόοναΠζͰͷֶशΛ࣮ݱ
• Understanding deep learning requires rethinking generalization
ਂֶशʢ͚ͩʹؔΘΒͣઢܗܥͰ͑͞ʂʣͷ൚ԽੑೳΛཧղ
͢ΔͨΊʹ৽͍͠Έ͕ඞཁͰ͋Δ͜ͱΛఏݴ
15
Slide 16
Slide 16 text
Ҏ߱ͷ
NIPS จͰͳ͍͕ɺ͍ۙ༰ͰΑΓܥ౷తʹཧղ͕Ͱ͖ΔͨΊ
ҎԼͷೋͭͷจΛઆ໌͢Δ
• A Bayesian Perspective on Generalization and Stochastic
Gradient Descent
• Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
൚Խੑೳࢦඪʹؔͯ͠Ҏલʹ ͜Μͳൃද Λͯ͠·͢
͜Μͳൃද: https://speakerdeck.com/diracdiego/some-understanding-of-generalization-in-deep-learing 16
ڵຯΛ࣋ͬͨจݙ
ήʔϜཧతͳղੳฏߧۙͰͷ gradient flow ͷղੳͳͲ
ਖ਼ଇԽ߲ͱͯ͠ඍਖ਼ଇԽ߲͕׆༂ (double back prop.)
• Gradient descent GAN optimization is locally stable
• The Numerics of GANs
• Improved Training of Wasserstein GANs
• Stabilizing Training of Generative Adversarial Networks through
Regularization
33
Slide 34
Slide 34 text
ڵຯΛ࣋ͬͨจݙʢCont'dʣ
• GANs Trained by a Two Time-Scale Update Rule Converge to a
Local Nash Equilibrium
generator ͱ discriminator ͰಠཱʹֶशΛ࣋ͨͤ nash ۉߧ
• Approximation and Convergence Properties of Generative
Adversarial Learning
ΑΓநతͳΈͰ֤छ GAN ΛऔΓѻ͍ɺऩଋՄೳੑ֤
ख๏ؒͷ૬ରతͳڧ͞Λࣔͨ͠
34
Slide 35
Slide 35 text
Ҏ߱ͷ
جຊతʹҎԼͷจΛհʢҰ෦ଞͷจ͔ΒਤΛҾ༻͢Δʣ
ଞͷจͰຊ࣭తʹ͔ͳΓ͍ۙٞΛ͍ͯͨ͠Γ͢Δ͕ɺ
notation ݁ߏόϥόϥͳͷͰཁҙ
• Gradient descent GAN optimization is locally stable
ଞͷจͷ͍͔ͭ͘ʹؔͯ͠Ҏલʹ ͜Μͳൃද Λͯ͠·͢
͜Μͳൃද: https://speakerdeck.com/diracdiego/20180127-nips-paper-reading 35
Slide 36
Slide 36 text
GAN ͷऩଋੑͷݱঢ়
• ฏߧʢnash ۉߧͷҙຯʣ͕ଘࡏ͢Δ͔ඇࣗ໌
zero-sum game ͱՁͳͷಛఆͷ߹Ͱ͋Δ͜ͱʹҙ
• ฏߧ͕ଘࡏ͢Δͱͯ͠ऩଋ͢Δͷ͔ඇࣗ໌
• ࣮༻্ֶश͕ෆ҆ఆʢhyperparameterʹහײʣ
͜͜ͰฏߧͷଘࡏԾఆ͠ɺͦͷۙͰऩଋ͢Δ͜ͱΛࣔ͢
36
Slide 37
Slide 37 text
GAN ͷֶशͷఆࣜԽ
తؔҎԼ
͜͜Ͱ concave (original GAN )
discriminator ͷҬ Ͱ͋Δ͜ͱʹҙ
ղੳతͳٞΛ͢ΔͨΊʹύϥϝλߋ৽ࣜΛ࿈ଓԽͯ͠දݱ
37
Slide 38
Slide 38 text
GAN ͷֶशͷఆࣜԽ (Cont'd)
Ҏ߱ͰతؔΛҎԼͷΑ͏ʹѻ͏ͷͰҙ
38
Slide 39
Slide 39 text
GAN ͷֶश҆ఆੑΛࣔ͢ͷͳ͍͔ͥ͠ʁ
ฏߧͰ
min-max game ͳͷͰతؔ convex-concave Ͱ͋Δͱخ͍͠
͔͠͠తؔ concave-concaveʂઢܗϞσϧͰ
͜Ε ͷ concave ੑʹΑΓͲͷύϥϝλʹؔͯ͠ concave
39
Slide 40
Slide 40 text
GAN ͷֶश҆ఆੑΛࣔ͢ͷͳ͍͔ͥ͠ʁ (Cont'd)
ઢܗ͚ͩͰͳ͘ɺଟ߲ࣜ (WGAN) ͷͱ͖ੜ͡Δ
Ref: https://www.cs.cmu.edu/~vaishnan/nips17_oral.pdf 40
GAN ͷऩଋੑʹؔ͢Δ·ͱΊ
• ฏߧपΓͷٞͳͲৄࡉʹͳ͞ΕΔΑ͏ʹͳ͖ͬͯͨ
ཧతʹऩଋΛอূ͢ΔʹͦΕͳΓʹڧ͍Ծఆ͕ඞཁ
• ༷ʑͳจ͕ग़͍ͯΔ͕ɺ͍ۙ͠ߟ͑ͷͷଟ͍
• زԿతͳख๏ͰେҬతͳٞΛͨ͠Γग़͖ͯͦ͏
ྫ͑ GAN ͷධՁͷ৽͍͠ํੑͱͯ͠ ͜Ε ͱ͔
• ཧֶͱͷڥքྖҬͰͰ͖Δ͜ͱଟͦ͏
52
Slide 53
Slide 53 text
ਂֶशͷ৽ͨͳํੑ
53
Slide 54
Slide 54 text
ڵຯΛ࣋ͬͨจݙ
৽ͨͳల։
• Dynamic Routing Between Capsules
χϡʔϩϯͷvectorҰൠԽͱͦͷؔੑΛಘΔΈΛఏҊ
• Deep Sets
ೖྗͱͯ͠ཁૉͷॱ൪ʹґΒͳ͍ू߹Λѻ͑ΔϞσϧΛߏங
• Bayesian GAN
BayesianͰGANΛऔΓѻ͏͜ͱͰɺ֤छֶशςΫχοΫ͕ෆཁ
54
Slide 55
Slide 55 text
ڵຯΛ࣋ͬͨจݙ (Cont'd)
زԿֶతͳ؍͔Βͷൃల
• Sobolev Training for Neural Networks
֤ͷඍֶशʹ༻͢ΔΑ͏ఆࣜԽɺৠཹͳͲͰ͑Δ
• Principles of Riemannian Geometry in Neural Networks
ϦʔϚϯزԿͰఆࣜԽ͠ɺӈLie܈ͷ࡞༻Ͱ back prop. Λදݱ
• Riemannian approach to batch normalization
BNΛϦʔϚϯزԿͷΈͰఆࣜԽ
55
Slide 56
Slide 56 text
ڵຯΛ࣋ͬͨจݙ (Cont'd)
ߋͳΔόϥΤςΟ
• Attention Is All You Need
࠶ؼߏΛΘͣattentionͷΈͰྑ͍݁ՌΛ࣮ݱ
• Deep Hyperspherical Learning
CNNͷΈࠐΈΛٿ໘্ͷԋࢉͱͯ͠ఆࣜԽɺྑ͍ऩଋੑ
• GibbsNet: Iterative Adversarial Inference for Deep Graphical
Models ಉ࣌֬ ͷϞσϦϯά
56
Slide 57
Slide 57 text
Ҏ߱ͷ
ݸਓతʹ໘ന͔ͬͨͷͱͯ͠ҎԼͷೋͭͷจΛհ
• Dynamic Routing Between Capsules (CapsNet)
• Deep Sets
CapsNet pooling ΛΑΓΑ͍ͷͱվળ͠Α͏ͱ͍͏ͷ
Deep Sets ू߹Λೖྗͱͯ͠ѻ͑ΔϞσϧΛ࡞Δͱ͍͏ͷ
57