Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Goによる勾配降下法 - 理論と実践 - / gradient-descent-in-golang

Goによる勾配降下法 - 理論と実践 - / gradient-descent-in-golang

プログラマのための数学勉強会@福岡 #5
http://maths4pg-fuk.connpass.com/event/34164/

monochromegane

August 05, 2016
Tweet

More Decks by monochromegane

Other Decks in Programming

Transcript

  1. ࡾ୐༔հ(.01&1"#0JOD
    ϓϩάϥϚͷͨΊͷ਺ֶษڧձ!෱Ԭ
    (PʹΑΔޯ഑߱Լ๏
    ཧ࿦ͱ࣮ફ

    View full-size slide

  2. ϓϦϯγύϧΤϯδχΞ
    ࡾ୐༔հ!NPOPDISPNFHBOF
    NJOOFࣄۀ෦
    IUUQCMPHNPOPDISPNFHBOFDPN

    View full-size slide

  3. ໨࣍
    wޯ഑߱Լ๏ͱ͸
    w࠷ٸ߱Լ๏
    w֬཰తޯ഑߱Լ๏
    wޯ഑߱Լ๏ͷ࠷దԽ
    w·ͱΊ

    View full-size slide

  4. ޯ഑߱Լ๏ͱ͸

    View full-size slide

  5. ޯ഑߱Լ๏ͱ͸
    wػցֶशʹ͓͍ͯϞσϧʹରֶͯ͠शΛਐΊΔͨΊͷख๏ͷͻͱͭɻ
    wτϨʔχϯάର৅ͷσʔλʹରͯ͠Ϟσϧͱͷޡ͕ࠩ࠷খʹͳΔΑ͏ʹϞσϧ
    ಺ͷύϥϝλΛߋ৽͍ͯ͘͜͠ͱɻ
    wύϥϝλߋ৽͸ɺޡࠩΛఆٛͨؔ͠਺Λඍ෼ͯ͠࠷খʹ͚ۙͮΔૢ࡞Λ܁Γฦ
    ͢͜ͱͰߦ͏ɻ

    View full-size slide

  6. ͳΔ΄Ͳʁʁ

    View full-size slide

  7. ྫ͑͹ɺ͜͜ʹ
    ޡࠩΛఆٛͨؔ͠਺ͱͯ͠
    ͕͋Δͱ͢Δɻ
    ͜ΕΛ࠷খԽ͢ΔYͷ஋͕ٻ·Δ
    ޡ͕ࠩ࠷খʹͳΔͱߟ͑Δɻ
    f
    (
    x
    ) = (
    x
    1)2

    View full-size slide

  8. ͭ·Γɺ
    ͻͨ͢Βඍ෼ͯ͠܏͖͕ʹͳΔͱ͜
    ΖΛ୳͢ɻ

    View full-size slide

  9. ͋ͯͣͬΆ͏ʁ
    ͦΕͩͱऴΘΒͳ͍ͷͰɺٻΊͨ܏͖
    ΛݩʹYΛ૿΍͠ʢݮΒ͠ʣͯΛ܁
    Γฦ͢
    x
    :=
    x
    d
    dx
    f
    (
    x
    )
    ಋؔ਺ͷූ߸͕ෛͰ͋Ε͹ɺYΛ૿΍͠ɺ
    ಋؔ਺ͷූ߸͕ਖ਼Ͱ͋Ε͹ɺYΛݮΒ͢ɻ

    View full-size slide

  10. ֶश཰
    ֶश཰Б͸Yͷߋ৽౓߹͍Λௐ੔͢
    Δɻ
    x
    :=
    x ⌘
    d
    dx
    f
    (
    x
    )
    େ͖͗͢ΔͱYͷҠಈྔ͕૿͑ͯɺऩ
    ଋ͠ͳ͍৔߹΍ൃࢄͯ͠͠·͏৔߹͕
    ͋Δɻ
    খ͗͢͞ΔͱYͷҠಈྔ͕ݮΓɺ܁Γ
    ฦ͠ճ਺͕૿͑ΔՄೳੑ͕͋Δɻ

    View full-size slide

  11. ໨తؔ਺
    wτϨʔχϯάର৅ͷσʔλʹର͢ΔϞσϧͱͷޡࠩΛఆٛͨ͠΋ͷ
    ٻΊΔύϥϝλΛВͱஔ͘
    E
    (

    ) =
    1
    2
    n
    X
    i=1
    (
    yi f✓(
    xi))2
    τϨʔχϯάσʔλ Z
    ͱ͋Δ࣌఺ͷύϥϝλВΛ
    ࢖ͬͨϞσϧ͔Βࢉग़͞Εͨ༧ଌ஋ͷࠩʢޡࠩʣ
    શͯͷτϨʔχϯάσʔλʹର͢Δޡࠩͷೋ৐࿨

    View full-size slide

  12. ໨తؔ਺
    w͋ͱ͸ɺޡࠩΛఆٛͨؔ͠਺Ͱ͋Δ໨తؔ਺Λύϥϝλʹରͯ͠ඍ෼ͯ͠ޡࠩ
    Λ࠷খʹ͍͚ͯ͠͹Α͍
    ˠ࠷ٸ߱Լ๏

    View full-size slide

  13. ࠷ٸ߱Լ๏
    - gradient descent, GD -

    View full-size slide

  14. ଟ߲ࣜճؼ
    τϨʔχϯάηοτ
    ਖ਼ݭؔ਺Λσʔλੜ੒ݩͱͯ͠ඪ४ภ
    ࠩͷཚ਺ΛՃ͑ͨ΋ͷ
    Ϟσϧ
    ࣍ͷଟ߲ࣜΛ༻͍ͯ༧ଌ
    f✓(
    x
    ) =
    ✓0 +
    ✓1x
    +
    ✓2x
    2 +
    ✓3x
    3

    View full-size slide

  15. ଟ߲ࣜճؼ
    ໨తؔ਺
    E
    (

    ) =
    1
    2
    n
    X
    i=1
    (
    yi f✓(
    xi))2
    f✓(
    x
    ) =
    ✓0 +
    ✓1x
    +
    ✓2x
    2 +
    ✓3x
    3
    ΛϞσϧ
    ͷύϥϝλͰ͋ΔВ

    ʹରͯ͠
    ภඍ෼Λߦͬͨಋؔ਺Λ༻͍ͯύϥϝ
    λͷߋ৽Λߦ͏

    View full-size slide

  16. ଟ߲ࣜճؼ
    ໨తؔ਺
    E
    (

    ) =
    1
    2
    n
    X
    i=1
    (
    yi f✓(
    xi))2
    f✓(
    x
    ) =
    ✓0 +
    ✓1x
    +
    ✓2x
    2 +
    ✓3x
    3
    ΛϞσϧ
    ͷύϥϝλͰ͋ΔВ

    ʹରͯ͠
    ภඍ෼Λߦͬͨಋؔ਺Λ༻͍ͯύϥϝ
    λͷߋ৽Λߦ͏
    ✓0 :=
    ✓0 ⌘
    n
    X
    i=1
    (
    f✓(
    xi)
    yi)
    ✓1 :=
    ✓1 ⌘
    n
    X
    i=1
    (
    f✓(
    xi)
    yi)
    xi
    ✓2 :=
    ✓2 ⌘
    n
    X
    i=1
    (
    f✓(
    xi)
    yi)
    x
    2
    i
    ✓3 :=
    ✓3 ⌘
    n
    X
    i=1
    (
    f✓(
    xi)
    yi)
    x
    3
    i
    ύϥϝλߋ৽ࣜ
    В@ʹ͍ͭͯภඍ෼
    В@ʹ͍ͭͯภඍ෼
    В@ʹ͍ͭͯภඍ෼
    В@ʹ͍ͭͯภඍ෼

    View full-size slide

  17. ࠷ٸ߱Լ๏ʹΑΔଟ߲ࣜճؼ(PMBOH
    // fθ(x) Ϟσϧ
    func PredictionFunction(x float64, thetas []float64) float64 {
    result := 0.0
    for i, theta := range thetas {
    result += theta * math.Pow(x, float64(i))
    }
    return result
    }
    // E(θ) ໨తؔ਺
    func ObjectiveFunction(trainings DataSet, thetas []float64) float64 {
    result := 0.0
    for _, training := range trainings {
    result += math.Pow((training.Y - PredictionFunction(training.X, thetas)), 2)
    }
    return result / 2.0
    }

    View full-size slide

  18. ࠷ٸ߱Լ๏ʹΑΔଟ߲ࣜճؼ(PMBOH
    // ύϥϝλ͝ͱͷޯ഑
    func gradient(dataset DataSet, thetas []float64, index int, batchSize int)
    float64 {
    result := 0.0
    for _, data := range dataset[0:batchSize] {
    result += ((PredictionFunction(data.X, thetas) - data.Y) * math.Pow(data.X,
    float64(index)))
    }
    return result
    } ✓0 :=
    ✓0 ⌘
    n
    X
    i=1
    (
    f✓(
    xi)
    yi)
    ✓1 :=
    ✓1 ⌘
    n
    X
    i=1
    (
    f✓(
    xi)
    yi)
    xi
    ✓2 :=
    ✓2 ⌘
    n
    X
    i=1
    (
    f✓(
    xi)
    yi)
    x
    2
    i
    ✓3 :=
    ✓3 ⌘
    n
    X
    i=1
    (
    f✓(
    xi)
    yi)
    x
    3
    i

    View full-size slide

  19. ࠷ٸ߱Լ๏ʹΑΔଟ߲ࣜճؼ(PMBOH
    // learning (update parameters)
    for i := 0; i < opt.Epoch; i++ {
    // update parameter by gradient descent
    org_thetas := make([]float64, cap(thetas))
    copy(org_thetas, thetas)
    shuffled := dataset.Shuffle()
    for j, _ := range thetas {
    // compute gradient
    gradient := gradient(shuffled, org_thetas, j, batchSize)
    // update parameter
    thetas[j] = org_thetas[j] - (opt.LearingRate * gradient)
    }
    }

    View full-size slide

  20. ࠷ٸ߱Լ๏ʹΑΔଟ߲ࣜճؼ

    View full-size slide

  21. ֬཰తޯ഑߱Լ๏
    - stochastic gradient descent, SGD -

    View full-size slide

  22. ࠷ٸ߱Լ๏ͷ՝୊

    View full-size slide

  23. ࠷ٸ߱Լ๏ͷ՝୊
    wύϥϝλߋ৽ຖͷޡࠩͷܭࢉʹશτϨʔχϯάηοτͷ߹ܭ͕ඞཁʹͳΔ
    wˠτϨʔχϯάηοτ͕ͱͯ΋େ͖͍৔߹ʹܭࢉྔ͕๲େʹͳͬͯ͠·͏
    E
    (

    ) =
    1
    2
    n
    X
    i=1
    (
    yi f✓(
    xi))2
    wશτϨʔχϯάηοτΛ࢖͏ͨΊ࣮֬ʹޯ഑ΛԼͬͯ͠·͏
    wˠہॴղʹั·ΔՄೳੑ͕ߴ͍

    View full-size slide

  24. ֬཰తޯ഑߱Լ๏
    - stochastic gradient descent, SGD -

    View full-size slide

  25. 4(%ʹΑΔଟ߲ࣜճؼ
    ύϥϝλߋ৽ࣜ
    ޡࠩೋ৐࿨Λ࢖ΘͣɺϥϯμϜʹબ୒
    ͨ͠σʔλΛ༻͍ͯύϥϝλߋ৽Λߦ
    ͏
    ✓0 :=
    ✓0 ⌘
    1
    X
    i=1
    (
    f✓(
    xi)
    yi)
    ✓1 :=
    ✓1 ⌘
    1
    X
    i=1
    (
    f✓(
    xi)
    yi)
    xi
    ✓2 :=
    ✓2 ⌘
    1
    X
    i=1
    (
    f✓(
    xi)
    yi)
    x
    2
    i
    ✓3 :=
    ✓3 ⌘
    1
    X
    i=1
    (
    f✓(
    xi)
    yi)
    x
    3
    i
    J͔Β·Ͱɻͻͱ͚ͭͩͷ࿨
    ൪໨ͷτϨʔχϯάηοτݻఆͰ
    ֶश͢ΔͷͰ͸ͳ͘ɺຖճγϟοϑ
    ϧ্ͨ͠Ͱͷઌ಄σʔλΛ࢖ͬͯύ
    ϥϝλߋ৽Λߦ͏

    View full-size slide

  26. ֬཰తޯ഑߱Լ๏ʹΑΔଟ߲ࣜճؼ(PMBOH
    // GD: batchSize=len(dataset), SGD: batchSize=1
    batchSize := len(dataset)
    if opt.Algorithm == "sgd" {
    if opt.BatchSize == -1 {
    batchSize = 1
    }
    }

    View full-size slide

  27. ֬཰తޯ഑߱Լ๏ֶश཰ʹΑΔऩଋਪҠ

    View full-size slide

  28. ϛχόονޯ഑߱Լ๏
    - mini-batch gradient descent, mini-batch SGD -

    View full-size slide

  29. NJOJCBUDI4(%
    㱡#MFO USBJOHJOHTFU
    ͱͳΔ
    όοναΠζΛఆΊͯύϥϝλߋ৽Λ
    ߦ͏͜ͱͰ࠷ٸ߱Լ๏ͱ֬཰తޯ഑߱
    Լ๏ͷ͍͍ͱ͜औΓΛૂ͏ɻ
    ֬཰తޯ഑߱Լ๏͸#ͷಛघܕͱ
    ݴ͑Δɻ
    J͔ΒϛχόοναΠζ·Ͱͷ࿨
    ຖճγϟοϑϧ্ͨ͠Ͱઌ಄͔Βϛ
    χόοναΠζ·ͰͷσʔλΛ࢖ͬ
    ͯύϥϝλߋ৽Λߦ͏
    ✓0 :=
    ✓0 ⌘
    B
    X
    i=1
    (
    f✓(
    xi)
    yi)
    ✓1 :=
    ✓1 ⌘
    B
    X
    i=1
    (
    f✓(
    xi)
    yi)
    xi
    ✓2 :=
    ✓2 ⌘
    B
    X
    i=1
    (
    f✓(
    xi)
    yi)
    x
    2
    i
    ✓3 :=
    ✓3 ⌘
    B
    X
    i=1
    (
    f✓(
    xi)
    yi)
    x
    3
    i

    View full-size slide

  30. ޯ഑߱Լ๏ͷ࠷దԽ
    - optimization -

    View full-size slide

  31. ޯ഑߱Լ๏ͷऩଋΛૣΊΔ

    View full-size slide

  32. .PNFOUVN
    ύϥϝλߋ৽ʹϞϝϯλϜʢ׳ੑʣͷ
    ߟ͑ํΛऔΓೖΕΔ͜ͱͰऩଋΛૣΊ
    Δɻ
    ϞϝϯλϜͷͳ͍4(%
    ϞϝϯλϜͷ͋Δ4(%
    vk = vk 1 + ⌘rE(✓)
    ✓k = ✓k 1 vk
    લճ·Ͱͷޯ഑ҠಈΛ׳ੑͱͯ͠ྦྷੵ
    ͢Δɻͭ·Γಉ͡ํ޲΁ͷҠಈͰ͋Ε
    ͹׳ੑ͸૿Ճ͠ɺํ޲Λม͑ΔҠಈͰ
    ͋Ε͹ݱ৅ͤ͞Δɻ
    ޯ഑
    ϞϝϯλϜͷྦྷੵ
    .PNFOUVNBOE-FBSOJOH3BUF"EBQUBUJPO
    IUUQTXXXXJMMBNFUUFFEVdHPSSDMBTTFTDTNPNSBUFIUNM

    View full-size slide

  33. .PNFOUVNʹΑΔ࠷దԽ(PMBOH
    for j, _ := range thetas {
    // compute gradient
    gradient := gradient(shuffled, org_thetas, j, batchSize)
    // Use momentum if momentum option is passed
    velocities[j] = opt.Momentum*velocities[j] -(opt.LearingRate * gradient)
    // update parameter
    thetas[j] = org_thetas[j] + velocities[j]
    } vk = vk 1 + ⌘rE(✓)
    ✓k = ✓k 1 vk

    View full-size slide

  34. .PNFOUVNʹΑΔ࠷దԽֶश཰ʹΑΔऩଋਪҠ

    View full-size slide

  35. ֶश཰ΛࣗಈͰௐ੔͢Δ

    View full-size slide

  36. "EB(SBE
    Ϟσϧͷֶशͷࡍʹɺֶश཰ΛࣗಈͰ
    ௐ੔͢Δख๏ͷͻͱͭɻ
    Gk = Gk 1 + (rE(✓k 1))2
    ✓k = ✓k 1

    p
    Gk 1 + ✏
    rE(✓k 1)
    ॳظֶश཰БΛޯ഑ͷઈର஋ͷྦྷੵͰ
    ׂͬͨ΋ͷΛֶश཰ͱͯ͠࢖͏
    ϝϦοτ
    ֤ύϥϝλ͝ͱʹֶश཰͕ௐ੔Ͱ͖
    Δɻ
    มԽͷগͳ͍ύϥϝλʹରͯ͠͸େ
    ֶ͖͘श͠ɺมԽ͕ଟ͍ύϥϝλʹ
    ରͯ͠͸গֶͮͭ͠श͍ͯ͘͠
    σϝϦοτ
    ޯ഑ͷྦྷੵΛ෼฼ͱ͢ΔҎ্ɺֶश
    ͕ਐΉͱֶश཰͸ඇৗʹখ͘͞ͳͬ
    ͯ͠·͏
    ˠॳظֶश཰Λେ͖Ίʹઃఆ͢Δ

    View full-size slide

  37. "EB(SBEʹΑΔ࠷దԽ(PMBOH
    for j, _ := range thetas {
    ~~~~
    // optimize by AdaGrad
    gradients[j] += math.Pow(gradient, 2)
    learningRate := opt.LearingRate / (math.Sqrt(gradients[j] + opt.Epsilon))
    update = -(learningRate * gradient)
    ~~~~
    }
    Gk = Gk 1 + (rE(✓k 1))2
    ✓k = ✓k 1

    p
    Gk 1 + ✏
    rE(✓k 1)

    View full-size slide

  38. "EB(SBEʹΑΔ࠷దԽֶश཰ʹΑΔऩଋਪҠ

    View full-size slide

  39. ֶश཰ΛࣗಈͰௐ੔͢Δ

    View full-size slide

  40. "EB%FMUB
    Ϟσϧͷֶशͷࡍʹɺֶश཰ΛࣗಈͰ
    ௐ੔͢Δख๏ͷͻͱͭɻ ֶश཰ͷ୯ௐݮগΛճආ
    ୯७ʹޯ഑ͷ߹ܭΛ༻͍ΔͷͰ͸ͳ͘ɺޯ഑
    ΛݮਰฏۉԽ͢Δ͜ͱͰ௚ۙͷޯ഑ʹΑΔֶ
    श཰ͷࢉग़Λߦ͏ɻ
    ॳظֶश཰ͷઃఆ͕ෆཁ
    ·ͨॳظֶश཰Λύϥϝλߋ৽஋Λݮਰฏۉ
    Խͨ͠΋ͷʹஔ͖׵͑Δ
    E

    g2

    t
    = E

    g2

    t 1
    + (1 )g2
    t
    ✓t =
    q
    E [ ✓2]t 1
    + ✏
    p
    E [g2]t
    + ✏
    gt
    E

    ✓2

    t
    = E

    ✓2

    t 1
    + (1 ) ✓2
    t
    ✓t+1 = ✓t + ✓t

    View full-size slide

  41. "EB%FMUB
    Ϟσϧͷֶशͷࡍʹɺֶश཰ΛࣗಈͰ
    ௐ੔͢Δख๏ͷͻͱͭɻ
    E

    g2

    t
    = E

    g2

    t 1
    + (1 )g2
    t
    ✓t =
    q
    E [ ✓2]t 1
    + ✏
    p
    E [g2]t
    + ✏
    gt
    E

    ✓2

    t
    = E

    ✓2

    t 1
    + (1 ) ✓2
    t
    ✓t+1 = ✓t + ✓t
    ޯ഑ΛݮਰฏۉԽͯ͠஝ੵ
    ઌఔٻΊͨ஋Λ࢖ͬͯύϥϝ
    λߋ৽஋ͷݮਰฏۉ஝ੵ
    ௚ۙͷޯ഑ͱύϥϝλߋ৽஋
    ͔Βֶश཰ΛٻΊͯ৽͍͠ύ
    ϥϝλߋ৽஋ΛಘΔ
    ύϥϝλߋ৽

    View full-size slide

  42. "EB%FMUBʹΑΔ࠷దԽ(PMBOH
    for j, _ := range thetas {
    ~~~~
    // optimize by AdaDelta
    gradients[j] = (opt.DecayRate * gradients[j]) + (1.0-
    opt.DecayRate)*math.Pow(gradient, 2)
    update = -(math.Sqrt(updates[j]+opt.Epsilon) / math.Sqrt(gradients[j]
    +opt.Epsilon)) * gradient
    updates[j] = (opt.DecayRate * updates[j]) + (1.0-
    opt.DecayRate)*math.Pow(update, 2)
    ~~~~
    }
    E

    g2

    t
    = E

    g2

    t 1
    + (1 )g2
    t
    ✓t =
    q
    E [ ✓2]t 1
    + ✏
    p
    E [g2]t
    + ✏
    gt
    E

    ✓2

    t
    = E

    ✓2

    t 1
    + (1 ) ✓2
    t
    ✓t+1 = ✓t + ✓t

    View full-size slide

  43. "EB%FMUBʹΑΔ࠷దԽݮਰ཰ʹΑΔऩଋਪҠ

    View full-size slide

  44. "EB(SBEͱ"EB%FMUBͷֶश཰ͷਪҠ

    View full-size slide

  45. ֤ޯ഑߱Լ๏ͱ࠷దԽʹΑΔऩଋਪҠͷൺֱ

    View full-size slide

  46. ·ͱΊ
    wػցֶशͰ͸ޯ഑߱Լ๏ʹΑͬͯޡࠩΛ࠷খԽ͢Δ͜ͱͰϞσϧͷֶशΛਐΊ
    Δ
    wޯ഑߱Լ๏ɺ࠷దԽͷछྨ͸༷ʑ͕ͩɺτϨʔχϯάηοτʹదͨ͠΋ͷΛબ
    ͿͨΊʹ͸ɺΞϧΰϦζϜͷબ୒ɺϋΠύʔύϥϝʔλʔͷௐ੔ͱ͍ͬͨࢼߦ
    ࡨޡ͕ݱ࣌఺Ͱ͸ඞཁ
    w࠷৽ͷख๏͕ৗʹΑ͍ͱ͸ݶΒͳ͍ʜ
    wϋΠύʔύϥϝʔλʔ͸ͳ͘ͳΒͳ͍ʜ
    wࣗ෼Ͱ࣮૷͢Δͱཧղ͕ਂ·ͬͯ٢ʂ

    View full-size slide

  47. $PEF
    $ go run cmd/gradient_descent/main.go \
    -eta 0.075 \
    -m 3 \
    -epoch 40000 \
    -algorithm sgd \
    -momentum 0.9
    w(PݴޠʹΑΔޯ഑߱Լ๏ͷαϯϓϧ࣮૷ΛҎԼʹஔ͍͍ͯ·͢
    wIUUQTHJUIVCDPNNPOPDISPNFHBOFHSBEJFOU@EFTDFOU
    6TBHF

    View full-size slide

  48. ܅΋ϖύϘͰಇ͔ͳ͍͔ʁ
    ࠷৽ͷ࠾༻৘ใΛνΣοΫˠ !QC@SFDSVJU

    View full-size slide