Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Learning 6.3-6.4

Deep Learning 6.3-6.4

PDFでダウンロードするとリンクをクリックできます.

2ab3dc02a9448f246bab64174b19dc1e?s=128

Kento Nozawa

May 23, 2017
Tweet

More Decks by Kento Nozawa

Other Decks in Research

Transcript

  1. DEEP LEARNING 6.3: Hidden Units 6.4: Architecture Design nzw

  2. 6.3 Hidden Units

  3. 6.3 Hidden Units h = g( z ) = g(WT

    x + b )
  4. ӅΕ૚ͷ׆ੑԽؔ਺ g • ΑΓ༏Εͨ׆ੑԽؔ਺Λݟ͚ͭΕ͹࿦จʹ • validation error౳Λݟܾͯఆ • g͸element-wiseʹద༻ •

    ྫ֎: softmax h = g ( zj) = g ( X i ( wj,i ⇥ xi) + bj)
  5. ׆ੑԽؔ਺ͷඍ෼ • ޯ഑Լ๏ʹΑΔߋ৽৚݅ɿඍ෼Մೳͳؔ਺ • ྫ֎ɿ g(x) = relu(x) ͷ x=0

    • ࣮༻্͸໰୊ʹͳΒͳ͍ • ޯ഑͕0ʹͳΔ఺͸ೖྗʹདྷͳ͍ • ࣮૷্Ͳ͔ͬͪͷඍ෼஋Λฦ͢
  6. • ໎ͬͨΒ͜Ε • ϝϦοτ • ඍ෼஋͕ఆ਺ɿ0 or 1 • ΄΅ઢܗͰɼ࠷దԽ͠΍͍͢

    • σϝϦοτ • ෛͷ৔߹ɼޯ഑͕ͳ͍ • ֦ுͨؔ͠਺ • ELU, PReLU, Leakly ReLU 6.3.1 Rectified Linear UnitsʢReLUʣ
  7. Maxout units • ׆ੑԽؔ਺ࣗମΛֶशɿ೚ҙͷತؔ਺ͷۙࣅ • 1Ϣχοτʹ͖ͭkݸͷύϥϝʔλ • k=2, d=1, m=1,

    w=[1, 0], b=[0, 0] → ReLU • Dropoutͱͷซ༻Λਪ঑ʢఏҊ࿦จʣ • ReLU&Dropout: ͲͬͪͰ0ʹͳΔ͔ᐆດ • Catastrophic forgettingΛ؇࿨ [Ian Goodfellow, et al., 2015]͔Β
  8. Catastrophic forgetting ୯ҰͷDNNͷϞσϧXΛߟ͑Δ 1. λεΫAΛֶश 2. λεΫAͷॏΈͷ··ɼผλεΫBΛֶश 3. λεΫAʹ͓͚Δੑೳ͸ग़ͳ͍ •

    Aͷͱ͖ͷॏΈΛ๨Ε͍ͯΔ • ॏཁ౓ͷߴ͍ॏΈͷֶश཰͸Լ͛Δ͜ͱͰ๷͙ • J. Kirkpatrick et al., Overcoming Catastrophic Forgetting in Neural Networks. PNAS, 2017.
  9. 6.3.2 Logistic Sigmoid and Hyperbolic Tangent • ޯ഑ϕʔεͷֶश๏Ͱ͸ඍົͳؔ਺ • ೖྗ஋͕0෇͚ۙͩහײ

    • ͦΕҎ֎͸ανΔ: ޯ഑0ʹͳΓ΍͍͢ • 2୒ͳΒsigmoidΑΓtanh͕͍͍ • ཧ༝: 0෇ۙͰ܏͖1
  10. 6.3.3 Other Hidden Units • MNISTΛ෼ྨ͢ΔMLPsͷ׆ੑԽؔ਺ͱͯ͠cosine • ࢼͨ͠Βreluͱಉ͘͡Β͍ • 11૚͘Β͍ॏͶΔͱޯ഑͕ফ͑·ͨ͠

    • CIFAR10ͷCNNͩͱReLUʹෛ͚Δ • Notebooks: MNIST, CNN, CNN&BN • ͳͥ࿦จͰग़͍ͯͳ͍ʁ • ReLU΍ͦͷվྑʹউͨͳ͍ͱ͍͚ͳ͍
  11. 6.4 Architecture Design

  12. Architecture • ૚ͷਂ͞ • Ϣχοτ਺ • Ϣχοτؒͷ݁߹ • Convolution/Pooling/Skip-connection (Residual

    Block) • GoogleͷAutoMLɿߏ଄΍׆ੑؔ਺ͷࣗಈܾఆɹ • Barret Zoph and Quoc Le. Neural Architecture Search with Reinforcement Learning. In Proc. ICLR, 2017. • ڧԽֶश • GPU਺: 800 • Esteban Real et al., Large-Scale Evolution of Image Classifiers. In Proc. ICML, 2017. • ਐԽతΞϧΰϦζϜ • Ϟσϧ਺: 1000
  13. 6.4.1 Universal Approximation Properties and Depth • Universal approximation theorem

    [1989] • squashing activation functionͰ1૚Ҏ্ͷதؒ૚Λ΋ͭNNs • ࣮਺ۭؒͷ෦෼ू߹Ͱ཈͑ΒΕ͔ͯͭดͨ͡࿈ଓؔ਺ΛۙࣅͰ͖Δ • ଌ౓͕Ͱͯ͘ΔͷͰɼຊॻͰ͸ৄࡉͳ࿩͸লུ • ʮۙࣅͰ͖Δʯ≠ʮֶशͰ͖Δʯ 1. ࠷దԽͰύϥϝʔλΛݟ͚ͭΔอূ͕ͳ͍ 2. ֶशΞϧΰϦζϜ͕ޡͬͨؔ਺ΛબͿ (ྫ: աֶश) • Ϣχοτ਺͕๲େʹඞཁ • ਂ͍ํ͕ύϥϝʔλ਺͸গͳ͘ࡁΉ: See Fig. 6.5 — 6.7.
  14. Universal Approximation Theoremʹؔ࿈ͯ͠ • ࠷ۙͷ൚Խʹؔ͢Δ࿦จ • Chiyuan Zhang et al.,

    Understanding Deep Learning Requires Rethinking Generalization. In Proc. ICLR, 2017. Best Paper. • ڭࢣϥϕϧΛshuffleͯ͠΋loss͸ͪΌΜͱԼ͕Δ