Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Goによる勾配降下法 - 理論と実践 - / gradient-descent-in-golang

Goによる勾配降下法 - 理論と実践 - / gradient-descent-in-golang

プログラマのための数学勉強会@福岡 #5
http://maths4pg-fuk.connpass.com/event/34164/

monochromegane

August 05, 2016
Tweet

More Decks by monochromegane

Other Decks in Programming

Transcript

  1. ͋ͯͣͬΆ͏ʁ ͦΕͩͱऴΘΒͳ͍ͷͰɺٻΊͨ܏͖ ΛݩʹYΛ૿΍͠ʢݮΒ͠ʣͯΛ܁ Γฦ͢ x := x d dx f

    ( x ) ಋؔ਺ͷූ߸͕ෛͰ͋Ε͹ɺYΛ૿΍͠ɺ ಋؔ਺ͷූ߸͕ਖ਼Ͱ͋Ε͹ɺYΛݮΒ͢ɻ
  2. ֶश཰ ֶश཰Б͸Yͷߋ৽౓߹͍Λௐ੔͢ Δɻ x := x ⌘ d dx f

    ( x ) େ͖͗͢ΔͱYͷҠಈྔ͕૿͑ͯɺऩ ଋ͠ͳ͍৔߹΍ൃࢄͯ͠͠·͏৔߹͕ ͋Δɻ খ͗͢͞ΔͱYͷҠಈྔ͕ݮΓɺ܁Γ ฦ͠ճ਺͕૿͑ΔՄೳੑ͕͋Δɻ
  3. ໨తؔ਺ wτϨʔχϯάର৅ͷσʔλʹର͢ΔϞσϧͱͷޡࠩΛఆٛͨ͠΋ͷ ٻΊΔύϥϝλΛВͱஔ͘ E ( ✓ ) = 1 2

    n X i=1 ( yi f✓( xi))2 τϨʔχϯάσʔλ Z ͱ͋Δ࣌఺ͷύϥϝλВΛ ࢖ͬͨϞσϧ͔Βࢉग़͞Εͨ༧ଌ஋ͷࠩʢޡࠩʣ શͯͷτϨʔχϯάσʔλʹର͢Δޡࠩͷೋ৐࿨
  4. ଟ߲ࣜճؼ ໨తؔ਺ E ( ✓ ) = 1 2 n

    X i=1 ( yi f✓( xi))2 f✓( x ) = ✓0 + ✓1x + ✓2x 2 + ✓3x 3 ΛϞσϧ ͷύϥϝλͰ͋ΔВ    ʹରͯ͠ ภඍ෼Λߦͬͨಋؔ਺Λ༻͍ͯύϥϝ λͷߋ৽Λߦ͏
  5. ଟ߲ࣜճؼ ໨తؔ਺ E ( ✓ ) = 1 2 n

    X i=1 ( yi f✓( xi))2 f✓( x ) = ✓0 + ✓1x + ✓2x 2 + ✓3x 3 ΛϞσϧ ͷύϥϝλͰ͋ΔВ    ʹରͯ͠ ภඍ෼Λߦͬͨಋؔ਺Λ༻͍ͯύϥϝ λͷߋ৽Λߦ͏ ✓0 := ✓0 ⌘ n X i=1 ( f✓( xi) yi) ✓1 := ✓1 ⌘ n X i=1 ( f✓( xi) yi) xi ✓2 := ✓2 ⌘ n X i=1 ( f✓( xi) yi) x 2 i ✓3 := ✓3 ⌘ n X i=1 ( f✓( xi) yi) x 3 i ύϥϝλߋ৽ࣜ В@ʹ͍ͭͯภඍ෼ В@ʹ͍ͭͯภඍ෼ В@ʹ͍ͭͯภඍ෼ В@ʹ͍ͭͯภඍ෼
  6. ࠷ٸ߱Լ๏ʹΑΔଟ߲ࣜճؼ(PMBOH // fθ(x) Ϟσϧ func PredictionFunction(x float64, thetas []float64) float64

    { result := 0.0 for i, theta := range thetas { result += theta * math.Pow(x, float64(i)) } return result } // E(θ) ໨తؔ਺ func ObjectiveFunction(trainings DataSet, thetas []float64) float64 { result := 0.0 for _, training := range trainings { result += math.Pow((training.Y - PredictionFunction(training.X, thetas)), 2) } return result / 2.0 }
  7. ࠷ٸ߱Լ๏ʹΑΔଟ߲ࣜճؼ(PMBOH // ύϥϝλ͝ͱͷޯ഑ func gradient(dataset DataSet, thetas []float64, index int,

    batchSize int) float64 { result := 0.0 for _, data := range dataset[0:batchSize] { result += ((PredictionFunction(data.X, thetas) - data.Y) * math.Pow(data.X, float64(index))) } return result } ✓0 := ✓0 ⌘ n X i=1 ( f✓( xi) yi) ✓1 := ✓1 ⌘ n X i=1 ( f✓( xi) yi) xi ✓2 := ✓2 ⌘ n X i=1 ( f✓( xi) yi) x 2 i ✓3 := ✓3 ⌘ n X i=1 ( f✓( xi) yi) x 3 i
  8. ࠷ٸ߱Լ๏ʹΑΔଟ߲ࣜճؼ(PMBOH // learning (update parameters) for i := 0; i

    < opt.Epoch; i++ { // update parameter by gradient descent org_thetas := make([]float64, cap(thetas)) copy(org_thetas, thetas) shuffled := dataset.Shuffle() for j, _ := range thetas { // compute gradient gradient := gradient(shuffled, org_thetas, j, batchSize) // update parameter thetas[j] = org_thetas[j] - (opt.LearingRate * gradient) } }
  9. 4(%ʹΑΔଟ߲ࣜճؼ ύϥϝλߋ৽ࣜ ޡࠩೋ৐࿨Λ࢖ΘͣɺϥϯμϜʹબ୒ ͨ͠σʔλΛ༻͍ͯύϥϝλߋ৽Λߦ ͏ ✓0 := ✓0 ⌘ 1

    X i=1 ( f✓( xi) yi) ✓1 := ✓1 ⌘ 1 X i=1 ( f✓( xi) yi) xi ✓2 := ✓2 ⌘ 1 X i=1 ( f✓( xi) yi) x 2 i ✓3 := ✓3 ⌘ 1 X i=1 ( f✓( xi) yi) x 3 i J͔Β·Ͱɻͻͱ͚ͭͩͷ࿨ ൪໨ͷτϨʔχϯάηοτݻఆͰ ֶश͢ΔͷͰ͸ͳ͘ɺຖճγϟοϑ ϧ্ͨ͠Ͱͷઌ಄σʔλΛ࢖ͬͯύ ϥϝλߋ৽Λߦ͏
  10. NJOJCBUDI4(% 㱡#MFO USBJOHJOHTFU ͱͳΔ όοναΠζΛఆΊͯύϥϝλߋ৽Λ ߦ͏͜ͱͰ࠷ٸ߱Լ๏ͱ֬཰తޯ഑߱ Լ๏ͷ͍͍ͱ͜औΓΛૂ͏ɻ ֬཰తޯ഑߱Լ๏͸#ͷಛघܕͱ ݴ͑Δɻ J͔ΒϛχόοναΠζ·Ͱͷ࿨

    ຖճγϟοϑϧ্ͨ͠Ͱઌ಄͔Βϛ χόοναΠζ·ͰͷσʔλΛ࢖ͬ ͯύϥϝλߋ৽Λߦ͏ ✓0 := ✓0 ⌘ B X i=1 ( f✓( xi) yi) ✓1 := ✓1 ⌘ B X i=1 ( f✓( xi) yi) xi ✓2 := ✓2 ⌘ B X i=1 ( f✓( xi) yi) x 2 i ✓3 := ✓3 ⌘ B X i=1 ( f✓( xi) yi) x 3 i
  11. .PNFOUVN ύϥϝλߋ৽ʹϞϝϯλϜʢ׳ੑʣͷ ߟ͑ํΛऔΓೖΕΔ͜ͱͰऩଋΛૣΊ Δɻ ϞϝϯλϜͷͳ͍4(% ϞϝϯλϜͷ͋Δ4(% vk = vk 1

    + ⌘rE(✓) ✓k = ✓k 1 vk લճ·Ͱͷޯ഑ҠಈΛ׳ੑͱͯ͠ྦྷੵ ͢Δɻͭ·Γಉ͡ํ޲΁ͷҠಈͰ͋Ε ͹׳ੑ͸૿Ճ͠ɺํ޲Λม͑ΔҠಈͰ ͋Ε͹ݱ৅ͤ͞Δɻ ޯ഑ ϞϝϯλϜͷྦྷੵ .PNFOUVNBOE-FBSOJOH3BUF"EBQUBUJPO IUUQTXXXXJMMBNFUUFFEVdHPSSDMBTTFTDTNPNSBUFIUNM
  12. .PNFOUVNʹΑΔ࠷దԽ(PMBOH for j, _ := range thetas { // compute

    gradient gradient := gradient(shuffled, org_thetas, j, batchSize) // Use momentum if momentum option is passed velocities[j] = opt.Momentum*velocities[j] -(opt.LearingRate * gradient) // update parameter thetas[j] = org_thetas[j] + velocities[j] } vk = vk 1 + ⌘rE(✓) ✓k = ✓k 1 vk
  13. "EB(SBE Ϟσϧͷֶशͷࡍʹɺֶश཰ΛࣗಈͰ ௐ੔͢Δख๏ͷͻͱͭɻ Gk = Gk 1 + (rE(✓k 1))2

    ✓k = ✓k 1 ⌘ p Gk 1 + ✏ rE(✓k 1) ॳظֶश཰БΛޯ഑ͷઈର஋ͷྦྷੵͰ ׂͬͨ΋ͷΛֶश཰ͱͯ͠࢖͏ ϝϦοτ ֤ύϥϝλ͝ͱʹֶश཰͕ௐ੔Ͱ͖ Δɻ มԽͷগͳ͍ύϥϝλʹରͯ͠͸େ ֶ͖͘श͠ɺมԽ͕ଟ͍ύϥϝλʹ ରͯ͠͸গֶͮͭ͠श͍ͯ͘͠ σϝϦοτ ޯ഑ͷྦྷੵΛ෼฼ͱ͢ΔҎ্ɺֶश ͕ਐΉͱֶश཰͸ඇৗʹখ͘͞ͳͬ ͯ͠·͏ ˠॳظֶश཰Λେ͖Ίʹઃఆ͢Δ
  14. "EB(SBEʹΑΔ࠷దԽ(PMBOH for j, _ := range thetas { ~~~~ //

    optimize by AdaGrad gradients[j] += math.Pow(gradient, 2) learningRate := opt.LearingRate / (math.Sqrt(gradients[j] + opt.Epsilon)) update = -(learningRate * gradient) ~~~~ } Gk = Gk 1 + (rE(✓k 1))2 ✓k = ✓k 1 ⌘ p Gk 1 + ✏ rE(✓k 1)
  15. "EB%FMUB Ϟσϧͷֶशͷࡍʹɺֶश཰ΛࣗಈͰ ௐ੔͢Δख๏ͷͻͱͭɻ E ⇥ g2 ⇤ t = E

    ⇥ g2 ⇤ t 1 + (1 )g2 t ✓t = q E [ ✓2]t 1 + ✏ p E [g2]t + ✏ gt E ⇥ ✓2 ⇤ t = E ⇥ ✓2 ⇤ t 1 + (1 ) ✓2 t ✓t+1 = ✓t + ✓t ޯ഑ΛݮਰฏۉԽͯ͠஝ੵ ઌఔٻΊͨ஋Λ࢖ͬͯύϥϝ λߋ৽஋ͷݮਰฏۉ஝ੵ ௚ۙͷޯ഑ͱύϥϝλߋ৽஋ ͔Βֶश཰ΛٻΊͯ৽͍͠ύ ϥϝλߋ৽஋ΛಘΔ ύϥϝλߋ৽
  16. "EB%FMUBʹΑΔ࠷దԽ(PMBOH for j, _ := range thetas { ~~~~ //

    optimize by AdaDelta gradients[j] = (opt.DecayRate * gradients[j]) + (1.0- opt.DecayRate)*math.Pow(gradient, 2) update = -(math.Sqrt(updates[j]+opt.Epsilon) / math.Sqrt(gradients[j] +opt.Epsilon)) * gradient updates[j] = (opt.DecayRate * updates[j]) + (1.0- opt.DecayRate)*math.Pow(update, 2) ~~~~ } E ⇥ g2 ⇤ t = E ⇥ g2 ⇤ t 1 + (1 )g2 t ✓t = q E [ ✓2]t 1 + ✏ p E [g2]t + ✏ gt E ⇥ ✓2 ⇤ t = E ⇥ ✓2 ⇤ t 1 + (1 ) ✓2 t ✓t+1 = ✓t + ✓t
  17. $PEF $ go run cmd/gradient_descent/main.go \ -eta 0.075 \ -m

    3 \ -epoch 40000 \ -algorithm sgd \ -momentum 0.9 w(PݴޠʹΑΔޯ഑߱Լ๏ͷαϯϓϧ࣮૷ΛҎԼʹஔ͍͍ͯ·͢ wIUUQTHJUIVCDPNNPOPDISPNFHBOFHSBEJFOU@EFTDFOU 6TBHF