monochromegane
August 05, 2016
5.6k

# Goによる勾配降下法 - 理論と実践 - / gradient-descent-in-golang

プログラマのための数学勉強会@福岡 #5
http://maths4pg-fuk.connpass.com/event/34164/

August 05, 2016

## Transcript

1. ࡾ୐༔հ(.01&1"#0JOD
ϓϩάϥϚͷͨΊͷ਺ֶษڧձ!෱Ԭ
(PʹΑΔޯ഑߱Լ๏
ཧ࿦ͱ࣮ફ

2. ϓϦϯγύϧΤϯδχΞ
ࡾ୐༔հ!NPOPDISPNFHBOF
NJOOFࣄۀ෦
IUUQCMPHNPOPDISPNFHBOFDPN

3. ໨࣍
wޯ഑߱Լ๏ͱ͸
w࠷ٸ߱Լ๏
w֬཰తޯ഑߱Լ๏
wޯ഑߱Լ๏ͷ࠷దԽ
w·ͱΊ

4. ޯ഑߱Լ๏ͱ͸

5. ޯ഑߱Լ๏ͱ͸
wػցֶशʹ͓͍ͯϞσϧʹରֶͯ͠शΛਐΊΔͨΊͷख๏ͷͻͱͭɻ
wτϨʔχϯάର৅ͷσʔλʹରͯ͠Ϟσϧͱͷޡ͕ࠩ࠷খʹͳΔΑ͏ʹϞσϧ
಺ͷύϥϝλΛߋ৽͍ͯ͘͜͠ͱɻ
wύϥϝλߋ৽͸ɺޡࠩΛఆٛͨؔ͠਺Λඍ෼ͯ͠࠷খʹ͚ۙͮΔૢ࡞Λ܁Γฦ
͢͜ͱͰߦ͏ɻ

6. ͳΔ΄Ͳʁʁ

7. ྫ͑͹ɺ͜͜ʹ
ޡࠩΛఆٛͨؔ͠਺ͱͯ͠
͕͋Δͱ͢Δɻ
͜ΕΛ࠷খԽ͢ΔYͷ஋͕ٻ·Δ
ޡ͕ࠩ࠷খʹͳΔͱߟ͑Δɻ
f
(
x
) = (
x
1)2

8. ͭ·Γɺ
ͻͨ͢Βඍ෼ͯ͠܏͖͕ʹͳΔͱ͜
ΖΛ୳͢ɻ

9. ͋ͯͣͬΆ͏ʁ
ͦΕͩͱऴΘΒͳ͍ͷͰɺٻΊͨ܏͖
ΛݩʹYΛ૿΍͠ʢݮΒ͠ʣͯΛ܁
Γฦ͢
x
:=
x
d
dx
f
(
x
)
ಋؔ਺ͷූ߸͕ෛͰ͋Ε͹ɺYΛ૿΍͠ɺ
ಋؔ਺ͷූ߸͕ਖ਼Ͱ͋Ε͹ɺYΛݮΒ͢ɻ

10. ֶश཰
ֶश཰Б͸Yͷߋ৽౓߹͍Λௐ੔͢
Δɻ
x
:=
x ⌘
d
dx
f
(
x
)
େ͖͗͢ΔͱYͷҠಈྔ͕૿͑ͯɺऩ
ଋ͠ͳ͍৔߹΍ൃࢄͯ͠͠·͏৔߹͕
͋Δɻ
খ͗͢͞ΔͱYͷҠಈྔ͕ݮΓɺ܁Γ
ฦ͠ճ਺͕૿͑ΔՄೳੑ͕͋Δɻ

11. ໨తؔ਺
wτϨʔχϯάର৅ͷσʔλʹର͢ΔϞσϧͱͷޡࠩΛఆٛͨ͠΋ͷ
ٻΊΔύϥϝλΛВͱஔ͘
E
(

) =
1
2
n
X
i=1
(
yi f✓(
xi))2
τϨʔχϯάσʔλ Z
ͱ͋Δ࣌఺ͷύϥϝλВΛ
࢖ͬͨϞσϧ͔Βࢉग़͞Εͨ༧ଌ஋ͷࠩʢޡࠩʣ
શͯͷτϨʔχϯάσʔλʹର͢Δޡࠩͷೋ৐࿨

12. ໨తؔ਺
w͋ͱ͸ɺޡࠩΛఆٛͨؔ͠਺Ͱ͋Δ໨తؔ਺Λύϥϝλʹରͯ͠ඍ෼ͯ͠ޡࠩ
Λ࠷খʹ͍͚ͯ͠͹Α͍
ˠ࠷ٸ߱Լ๏

13. ࠷ٸ߱Լ๏

14. ۩ମྫ

15. ଟ߲ࣜճؼ
τϨʔχϯάηοτ
ਖ਼ݭؔ਺Λσʔλੜ੒ݩͱͯ͠ඪ४ภ
ࠩͷཚ਺ΛՃ͑ͨ΋ͷ
Ϟσϧ
࣍ͷଟ߲ࣜΛ༻͍ͯ༧ଌ
f✓(
x
) =
✓0 +
✓1x
+
✓2x
2 +
✓3x
3

16. ଟ߲ࣜճؼ
໨తؔ਺
E
(

) =
1
2
n
X
i=1
(
yi f✓(
xi))2
f✓(
x
) =
✓0 +
✓1x
+
✓2x
2 +
✓3x
3
ΛϞσϧ
ͷύϥϝλͰ͋ΔВ

ʹରͯ͠
ภඍ෼Λߦͬͨಋؔ਺Λ༻͍ͯύϥϝ
λͷߋ৽Λߦ͏

17. ଟ߲ࣜճؼ
໨తؔ਺
E
(

) =
1
2
n
X
i=1
(
yi f✓(
xi))2
f✓(
x
) =
✓0 +
✓1x
+
✓2x
2 +
✓3x
3
ΛϞσϧ
ͷύϥϝλͰ͋ΔВ

ʹରͯ͠
ภඍ෼Λߦͬͨಋؔ਺Λ༻͍ͯύϥϝ
λͷߋ৽Λߦ͏
✓0 :=
✓0 ⌘
n
X
i=1
(
f✓(
xi)
yi)
✓1 :=
✓1 ⌘
n
X
i=1
(
f✓(
xi)
yi)
xi
✓2 :=
✓2 ⌘
n
X
i=1
(
f✓(
xi)
yi)
x
2
i
✓3 :=
✓3 ⌘
n
X
i=1
(
f✓(
xi)
yi)
x
3
i
ύϥϝλߋ৽ࣜ
В@ʹ͍ͭͯภඍ෼
В@ʹ͍ͭͯภඍ෼
В@ʹ͍ͭͯภඍ෼
В@ʹ͍ͭͯภඍ෼

18. ࠷ٸ߱Լ๏ʹΑΔଟ߲ࣜճؼ(PMBOH
// fθ(x) Ϟσϧ
func PredictionFunction(x float64, thetas []float64) float64 {
result := 0.0
for i, theta := range thetas {
result += theta * math.Pow(x, float64(i))
}
return result
}
// E(θ) ໨తؔ਺
func ObjectiveFunction(trainings DataSet, thetas []float64) float64 {
result := 0.0
for _, training := range trainings {
result += math.Pow((training.Y - PredictionFunction(training.X, thetas)), 2)
}
return result / 2.0
}

19. ࠷ٸ߱Լ๏ʹΑΔଟ߲ࣜճؼ(PMBOH
// ύϥϝλ͝ͱͷޯ഑
func gradient(dataset DataSet, thetas []float64, index int, batchSize int)
float64 {
result := 0.0
for _, data := range dataset[0:batchSize] {
result += ((PredictionFunction(data.X, thetas) - data.Y) * math.Pow(data.X,
float64(index)))
}
return result
} ✓0 :=
✓0 ⌘
n
X
i=1
(
f✓(
xi)
yi)
✓1 :=
✓1 ⌘
n
X
i=1
(
f✓(
xi)
yi)
xi
✓2 :=
✓2 ⌘
n
X
i=1
(
f✓(
xi)
yi)
x
2
i
✓3 :=
✓3 ⌘
n
X
i=1
(
f✓(
xi)
yi)
x
3
i

20. ࠷ٸ߱Լ๏ʹΑΔଟ߲ࣜճؼ(PMBOH
// learning (update parameters)
for i := 0; i < opt.Epoch; i++ {
// update parameter by gradient descent
org_thetas := make([]float64, cap(thetas))
copy(org_thetas, thetas)
shuffled := dataset.Shuffle()
for j, _ := range thetas {
// update parameter
thetas[j] = org_thetas[j] - (opt.LearingRate * gradient)
}
}

21. ࠷ٸ߱Լ๏ʹΑΔଟ߲ࣜճؼ

22. ֬཰తޯ഑߱Լ๏
- stochastic gradient descent, SGD -

23. ࠷ٸ߱Լ๏ͷ՝୊

24. ࠷ٸ߱Լ๏ͷ՝୊
wύϥϝλߋ৽ຖͷޡࠩͷܭࢉʹશτϨʔχϯάηοτͷ߹ܭ͕ඞཁʹͳΔ
wˠτϨʔχϯάηοτ͕ͱͯ΋େ͖͍৔߹ʹܭࢉྔ͕๲େʹͳͬͯ͠·͏
E
(

) =
1
2
n
X
i=1
(
yi f✓(
xi))2
wશτϨʔχϯάηοτΛ࢖͏ͨΊ࣮֬ʹޯ഑ΛԼͬͯ͠·͏
wˠہॴղʹั·ΔՄೳੑ͕ߴ͍

25. ֬཰తޯ഑߱Լ๏
- stochastic gradient descent, SGD -

26. 4(%ʹΑΔଟ߲ࣜճؼ
ύϥϝλߋ৽ࣜ
ޡࠩೋ৐࿨Λ࢖ΘͣɺϥϯμϜʹબ୒
ͨ͠σʔλΛ༻͍ͯύϥϝλߋ৽Λߦ
͏
✓0 :=
✓0 ⌘
1
X
i=1
(
f✓(
xi)
yi)
✓1 :=
✓1 ⌘
1
X
i=1
(
f✓(
xi)
yi)
xi
✓2 :=
✓2 ⌘
1
X
i=1
(
f✓(
xi)
yi)
x
2
i
✓3 :=
✓3 ⌘
1
X
i=1
(
f✓(
xi)
yi)
x
3
i
J͔Β·Ͱɻͻͱ͚ͭͩͷ࿨
൪໨ͷτϨʔχϯάηοτݻఆͰ
ֶश͢ΔͷͰ͸ͳ͘ɺຖճγϟοϑ
ϧ্ͨ͠Ͱͷઌ಄σʔλΛ࢖ͬͯύ
ϥϝλߋ৽Λߦ͏

27. ֬཰తޯ഑߱Լ๏ʹΑΔଟ߲ࣜճؼ(PMBOH
// GD: batchSize=len(dataset), SGD: batchSize=1
batchSize := len(dataset)
if opt.Algorithm == "sgd" {
if opt.BatchSize == -1 {
batchSize = 1
}
}

28. ֬཰తޯ഑߱Լ๏ֶश཰ʹΑΔऩଋਪҠ

29. ϛχόονޯ഑߱Լ๏
- mini-batch gradient descent, mini-batch SGD -

30. NJOJCBUDI4(%
㱡#MFO USBJOHJOHTFU
ͱͳΔ
όοναΠζΛఆΊͯύϥϝλߋ৽Λ
ߦ͏͜ͱͰ࠷ٸ߱Լ๏ͱ֬཰తޯ഑߱
Լ๏ͷ͍͍ͱ͜औΓΛૂ͏ɻ
֬཰తޯ഑߱Լ๏͸#ͷಛघܕͱ
ݴ͑Δɻ
J͔ΒϛχόοναΠζ·Ͱͷ࿨
ຖճγϟοϑϧ্ͨ͠Ͱઌ಄͔Βϛ
χόοναΠζ·ͰͷσʔλΛ࢖ͬ
ͯύϥϝλߋ৽Λߦ͏
✓0 :=
✓0 ⌘
B
X
i=1
(
f✓(
xi)
yi)
✓1 :=
✓1 ⌘
B
X
i=1
(
f✓(
xi)
yi)
xi
✓2 :=
✓2 ⌘
B
X
i=1
(
f✓(
xi)
yi)
x
2
i
✓3 :=
✓3 ⌘
B
X
i=1
(
f✓(
xi)
yi)
x
3
i

31. ޯ഑߱Լ๏ͷ࠷దԽ
- optimization -

32. .PNFOUVN

33. ޯ഑߱Լ๏ͷऩଋΛૣΊΔ

34. .PNFOUVN
ύϥϝλߋ৽ʹϞϝϯλϜʢ׳ੑʣͷ
ߟ͑ํΛऔΓೖΕΔ͜ͱͰऩଋΛૣΊ
Δɻ
ϞϝϯλϜͷͳ͍4(%
ϞϝϯλϜͷ͋Δ4(%
vk = vk 1 + ⌘rE(✓)
✓k = ✓k 1 vk
લճ·Ͱͷޯ഑ҠಈΛ׳ੑͱͯ͠ྦྷੵ
͢Δɻͭ·Γಉ͡ํ޲΁ͷҠಈͰ͋Ε
͹׳ੑ͸૿Ճ͠ɺํ޲Λม͑ΔҠಈͰ
͋Ε͹ݱ৅ͤ͞Δɻ
ޯ഑
ϞϝϯλϜͷྦྷੵ
.PNFOUVNBOE-FBSOJOH3BUF"EBQUBUJPO
IUUQTXXXXJMMBNFUUFFEVdHPSSDMBTTFTDTNPNSBUFIUNM

35. .PNFOUVNʹΑΔ࠷దԽ(PMBOH
for j, _ := range thetas {
// Use momentum if momentum option is passed
velocities[j] = opt.Momentum*velocities[j] -(opt.LearingRate * gradient)
// update parameter
thetas[j] = org_thetas[j] + velocities[j]
} vk = vk 1 + ⌘rE(✓)
✓k = ✓k 1 vk

36. .PNFOUVNʹΑΔ࠷దԽֶश཰ʹΑΔऩଋਪҠ

37. "EB(SBE

38. ֶश཰ΛࣗಈͰௐ੔͢Δ

39. "EB(SBE
Ϟσϧͷֶशͷࡍʹɺֶश཰ΛࣗಈͰ
ௐ੔͢Δख๏ͷͻͱͭɻ
Gk = Gk 1 + (rE(✓k 1))2
✓k = ✓k 1

p
Gk 1 + ✏
rE(✓k 1)
ॳظֶश཰БΛޯ഑ͷઈର஋ͷྦྷੵͰ
ׂͬͨ΋ͷΛֶश཰ͱͯ͠࢖͏
ϝϦοτ
֤ύϥϝλ͝ͱʹֶश཰͕ௐ੔Ͱ͖
Δɻ
มԽͷগͳ͍ύϥϝλʹରͯ͠͸େ
ֶ͖͘श͠ɺมԽ͕ଟ͍ύϥϝλʹ
ରͯ͠͸গֶͮͭ͠श͍ͯ͘͠
σϝϦοτ
ޯ഑ͷྦྷੵΛ෼฼ͱ͢ΔҎ্ɺֶश
͕ਐΉͱֶश཰͸ඇৗʹখ͘͞ͳͬ
ͯ͠·͏
ˠॳظֶश཰Λେ͖Ίʹઃఆ͢Δ

40. "EB(SBEʹΑΔ࠷దԽ(PMBOH
for j, _ := range thetas {
~~~~
learningRate := opt.LearingRate / (math.Sqrt(gradients[j] + opt.Epsilon))
~~~~
}
Gk = Gk 1 + (rE(✓k 1))2
✓k = ✓k 1

p
Gk 1 + ✏
rE(✓k 1)

41. "EB(SBEʹΑΔ࠷దԽֶश཰ʹΑΔऩଋਪҠ

42. "EB%FMUB

43. ֶश཰ΛࣗಈͰௐ੔͢Δ

44. "EB%FMUB
Ϟσϧͷֶशͷࡍʹɺֶश཰ΛࣗಈͰ
ௐ੔͢Δख๏ͷͻͱͭɻ ֶश཰ͷ୯ௐݮগΛճආ
୯७ʹޯ഑ͷ߹ܭΛ༻͍ΔͷͰ͸ͳ͘ɺޯ഑
ΛݮਰฏۉԽ͢Δ͜ͱͰ௚ۙͷޯ഑ʹΑΔֶ
श཰ͷࢉग़Λߦ͏ɻ
ॳظֶश཰ͷઃఆ͕ෆཁ
·ͨॳظֶश཰Λύϥϝλߋ৽஋Λݮਰฏۉ
Խͨ͠΋ͷʹஔ͖׵͑Δ
E

g2

t
= E

g2

t 1
+ (1 )g2
t
✓t =
q
E [ ✓2]t 1
+ ✏
p
E [g2]t
+ ✏
gt
E

✓2

t
= E

✓2

t 1
+ (1 ) ✓2
t
✓t+1 = ✓t + ✓t

45. "EB%FMUB
Ϟσϧͷֶशͷࡍʹɺֶश཰ΛࣗಈͰ
ௐ੔͢Δख๏ͷͻͱͭɻ
E

g2

t
= E

g2

t 1
+ (1 )g2
t
✓t =
q
E [ ✓2]t 1
+ ✏
p
E [g2]t
+ ✏
gt
E

✓2

t
= E

✓2

t 1
+ (1 ) ✓2
t
✓t+1 = ✓t + ✓t
ޯ഑ΛݮਰฏۉԽͯ͠஝ੵ
ઌఔٻΊͨ஋Λ࢖ͬͯύϥϝ
λߋ৽஋ͷݮਰฏۉ஝ੵ
௚ۙͷޯ഑ͱύϥϝλߋ৽஋
͔Βֶश཰ΛٻΊͯ৽͍͠ύ
ϥϝλߋ৽஋ΛಘΔ
ύϥϝλߋ৽

46. "EB%FMUBʹΑΔ࠷దԽ(PMBOH
for j, _ := range thetas {
~~~~
opt.DecayRate)*math.Pow(update, 2)
~~~~
}
E

g2

t
= E

g2

t 1
+ (1 )g2
t
✓t =
q
E [ ✓2]t 1
+ ✏
p
E [g2]t
+ ✏
gt
E

✓2

t
= E

✓2

t 1
+ (1 ) ✓2
t
✓t+1 = ✓t + ✓t

47. "EB%FMUBʹΑΔ࠷దԽݮਰ཰ʹΑΔऩଋਪҠ

48. "EB(SBEͱ"EB%FMUBͷֶश཰ͷਪҠ

49. ൺֱ

50. ֤ޯ഑߱Լ๏ͱ࠷దԽʹΑΔऩଋਪҠͷൺֱ

51. ·ͱΊ

52. ·ͱΊ
wػցֶशͰ͸ޯ഑߱Լ๏ʹΑͬͯޡࠩΛ࠷খԽ͢Δ͜ͱͰϞσϧͷֶशΛਐΊ
Δ
wޯ഑߱Լ๏ɺ࠷దԽͷछྨ͸༷ʑ͕ͩɺτϨʔχϯάηοτʹదͨ͠΋ͷΛબ
ͿͨΊʹ͸ɺΞϧΰϦζϜͷબ୒ɺϋΠύʔύϥϝʔλʔͷௐ੔ͱ͍ͬͨࢼߦ
ࡨޡ͕ݱ࣌఺Ͱ͸ඞཁ
w࠷৽ͷख๏͕ৗʹΑ͍ͱ͸ݶΒͳ͍ʜ
wϋΠύʔύϥϝʔλʔ͸ͳ͘ͳΒͳ͍ʜ
wࣗ෼Ͱ࣮૷͢Δͱཧղ͕ਂ·ͬͯ٢ʂ

53. \$PEF

54. \$PEF