Save 37% off PRO during our Black Friday Sale! »

Support Vector Machines

Support Vector Machines

ゼミ発表用のサポートベクターマシンの資料です.
主に以下の書籍を参考にしました:
Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O'reilly Media, 2017
C. M. ビショップ,パターン認識と機械学習 下 ベイズ理論による統計的予測(元田浩ほか訳),丸善出版,2012
赤穂昭太郎,カーネル多変量解析 非線形データ解析の新しい展開,岩波書店,2008

3c1b9d9be0fff4a8904e2a26d4122c0c?s=128

Takahiro Kawashima

June 06, 2019
Tweet

Transcript

  1. Support Vector Machines ઒ౡوେ June 6, 2019 ిؾ௨৴େֶ ঙ໺ݚڀࣨ M1

  2. ໨࣍ 1. SVM ͷ௚ײతཧղ SVM ͱ͸ ൃలܗ͍Ζ͍Ζ 2. SVM ͷ਺ཧ

    ϋʔυϚʔδϯ SVM ͷఆࣜԽ Lagrange ૒ର໰୊ ιϑτϚʔδϯ SVM ͱώϯδؔ਺ Χʔωϧ SVM ճؼͷͨΊͷ SVM 2
  3. SVM ͷ௚ײతཧղ

  4. SVM; Support Vector Machine ઢܗ SVM Ϛʔδϯ࠷େԽʹΑΓ 2 Ϋϥε෼ྨΛߦ͏ڭࢣ͋Γֶशख๏ ༷ʑͳൃలܗ

    • ιϑτϚʔδϯ SVMɿઢܗ෼཭ෆՄೳͳ෼ྨ໰୊ • Χʔωϧ SVMɿඇઢܗͰෳࡶͳ෼ྨڥք • αϙʔτϕΫτϧճؼɿճؼ໰୊΁ͷԠ༻ • ଟΫϥε SVMɿଟΫϥε෼ྨ໰୊ʢࠓճ͸঺հ͠ͳ͍ʣ ࠷దԽͷ͠΍͢͞ɼϩόετੑɼεύʔεੑͳͲͷ͏Ε͍͠ੑ࣭ 3
  5. ϋʔυϚʔδϯ SVM ͷزԿతղऍ ઢܗ෼཭Մೳͳ 2 Ϋϥε෼ྨ໰୊ マージンを最大化 支持超平面 分類超平面 サポート

    ベクトル αϙʔτϕΫτϧͷΈ͕ ෼ྨ௒ฏ໘Λܾఆ • εύʔεੑ • ϩόετੑ 4
  6. SVM ͷ஫ҙ఺ SVM ͸ಛ௃ྔͷεέʔϧʹහײ ˠ࢖͏લʹσʔλ͸ඪ४Խ͓ͯ͜͠͏ 0 1 2 3 4

    5 6 x0 0 20 40 60 80 x1 Unscaled 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x0 2 1 0 1 2 Scaled 5
  7. ϋʔυϚʔδϯ SVM ͷͭΒ͞ ࠨɿઢܗ෼཭ෆՄೳ ӈɿઢܗ෼཭͸Մೳ͕ͩ͜ΕͰΑ͍͔ʁ 0 1 2 3 4

    5 Petal length 0.0 0.5 1.0 1.5 2.0 Petal width Impossible! Outlier 0 1 2 3 4 5 Petal length 0.0 0.5 1.0 1.5 2.0 Outlier ઢܗ෼཭Մೳͱ͍͏৚݅Λ؇࿨͠ɼΑΓॊೈͳ෼ྨث΁ ˠιϑτϚʔδϯ SVM 6
  8. ιϑτϚʔδϯ SVM ͷزԿతղऍ ڭࢣσʔλʹର͢Δଟগͷޡ෼ྨ͸ڐͯ͠͠·͏ ˠઢܗ෼཭ෆՄೳͳσʔλʹ΋ରԠ 7

  9. ιϑτϚʔδϯ SVM ͷϋΠύʔύϥϝʔλґଘੑ ιϑτϚʔδϯ SVM Ͱ͸෼ྨฏ໘Λ௒͑ͨσʔλ΁േଇΛ՝͢ ˠേଇͷ (૬ରతͳ) ڧ͞Λ੍ޚ͢Δਖ਼ଇԽύϥϝʔλ C

    ͕ଘࡏ 4.00 4.25 4.50 4.75 5.00 5.25 5.50 5.75 6.00 Petal length 1.0 1.5 2.0 2.5 Petal width C=1 Iris-Virginica Iris-Versicolor 4.00 4.25 4.50 4.75 5.00 5.25 5.50 5.75 6.00 Petal length 1.0 1.5 2.0 2.5 C=100 C ͕େ͖͍ͱേଇ߲͕ڧ͘ͳΔ ˠ෼ྨฏ໘Λ௒͑ΔσʔλʹΑΓහײʹ 8
  10. ઢܗ SVM ͷͭΒ͞ Ͳ͏෼ྨฏ໘ΛҾ͘ʁ -4 -2 0 2 4 -4

    -2 0 2 4 R^2 data 9
  11. ઢܗ SVM ͷͭΒ͞ ΞΠσΞɿσʔλͷಛ௃Λߴ࣍ݩʹࣸ૾ͯ͠ઢܗ SVM Ͱ෼཭ ࠓճͷྫͰ͸ (x, y) →

    (x2, y2, xy) Ͱઢܗ෼཭Մೳʹ 10
  12. Χʔωϧ SVM Ұൠʹ͸ͲͷΑ͏ͳߴ࣍ݩࣸ૾͕ద੾͔Θ͔Βͳ͍ ˠΧʔωϧτϦοΫʹΑΓߴ࣍ݩͷಛ௃ۭؒ΁ඈ͹ͯ͠෼ྨ 1 0 1 2 x1 1.0

    0.5 0.0 0.5 1.0 1.5 x2 =0.1,C=0.001 ਺ཧతͳ࿩͸Ұ୴͋ͱ·Θ͠ Χʔωϧ๏ͳΔຐज़Λ࢖͑͹ඇઢܗͳؔ܎Λଊ͑ΒΕΔ 11
  13. SVR; Support Vector Regression SVM Λճؼ໰୊ʹ΋࢖͑ͳ͍͔ʁˠαϙʔτϕΫτϧճؼ 0.0 0.5 1.0 1.5

    2.0 x1 3 4 5 6 7 8 9 10 11 y =1.5 y 0.0 0.5 1.0 1.5 2.0 x1 3 4 5 6 7 8 9 10 11 =0.5 y ճؼؔ਺ ˆ y ͔Β ε Ҏ্཭ΕͨαϯϓϧͷΈޡࠩؔ਺ʹد༩ ʢਤͷ੺͍఺ʣ 12
  14. ඇઢܗճؼͷͨΊͷ SVR ෼ྨ໰୊ͱಉ༷ɼΧʔωϧԽʹΑΓඇઢܗԽՄೳ SVR ʹΑΔ 2 ࣍ͷճؼ 1.0 0.5 0.0

    0.5 1.0 x1 0.0 0.2 0.4 0.6 0.8 1.0 y degree=2,C=100, =0.1 y 1.0 0.5 0.0 0.5 1.0 x1 0.0 0.2 0.4 0.6 0.8 1.0 degree=2,C=0.01, =0.1 y 13
  15. SVM ͷ਺ཧ

  16. ϋʔυϚʔδϯ SVM ͷఆࣜԽ ઢܗ෼཭Մೳͳσʔλʹ࿩୊Λ໭͢ ೋ஋෼ྨͷϥϕϧɿy ∈ {−1, 1} ʢHands-On ຊͰ͸

    {0, 1}ʣ ࣝผؔ਺ɿf(x) = w⊤x + b = 0 ʢύϥϝʔλ w, b ͷઢܗࣝผثʣ ࢀߟɿฏ໘ͷํఔࣜ R3 ্ͷฏ໘ͷํఔࣜͷҰൠܗ͸ ax + by + cz + d = 0 Ͱ༩͑ΒΕΔ Ϛʔδϯ࠷େԽΛ͏·͘ఆࣜԽͯ͠࠷దԽ໰୊ʹམͱ͜͠Ή 14
  17. ߴߍ਺ֶͷ෮शɿ໘ͱ఺ͷڑ཭ ఺ͱ໘ͷڑ཭ͷެࣜ R3 ্ͷ఺ A(x0, y0, z0) ͱฏ໘ P :

    ax + by + cz + d = 0 ͷڑ཭͸ |ax0 + by0 + cz0 + d| √ a2 + b2 + c2 (1) B(X, Y, Z)ɿ఺ A ͔Βฏ໘ P ΁ͷਨઢͱ P ͱͷަ఺ − − → AB ͸ P ͷ๏ઢ (a, b, c)⊤ ͱฏߦͳͷͰ๏ઢͷ t ഒͱͯ͠ (x0 − X, y0 − Y, z0 − Z)⊤ = t(a, b, c)⊤ (2) ͰදͤΔ ͜ͷ྆ลʹ (a, b, c)⊤ ʹ͍ͭͯ಺ੵΛͱΓɼt ʹ͍ͭͯղ͖ ∥t(a, b, c)⊤∥2 ΛධՁ͢Δͱࣜ (1) ͕ಘΒΕΔ 15
  18. ෼ྨฏ໘ͱσʔλͷڑ཭ ෼ྨฏ໘ͱ xi ͱͷڑ཭ di ͸ di = |w⊤xi +

    b| ∥w∥2 yi ∈ {−1, 1} ͕ f(xi) ͷਖ਼ ෛΛද͢ͷͰ݁ہ di = yi(w⊤xi + b) ∥w∥2 (3) 16
  19. Ϛʔδϯ࠷େԽͷఆࣜԽ マージンを最大化 1. ࠷΋ di ͕খ͘͞ͳΔΑ͏ͳ xi ΛબΜͰ͖ͯ 2. di

    Λ w, b ʹؔͯ͠࠷େԽ ͜ͷૢ࡞ΛఆࣜԽ͢Δͱ max w,b {min i [di]} (4) 17
  20. Ϛʔδϯ࠷େԽ໰୊ͷ؆ུԽ ࣜ (4) Λมܗ͢Δͱ max w,b {min i [di]} =

    max w,b { min i [ yi(w⊤xi + b) ∥w∥2 ]} = max w,b { 1 ∥w∥2 min i [ yi(w⊤xi + b) ] } (5) ͜͜Ͱ di ͸೚ҙͷ࣮ఆ਺ c ʹ͍ͭͯ di = yi(w⊤xi + b) ∥w∥2 = cyi(w⊤xi + b) c∥w∥2 = yi(cw⊤xi + cb) ∥cw∥2 ˠύϥϝʔλ w, b Λఆ਺ഒͯ͠΋෼ྨฏ໘ͱσʔλͷڑ཭͸ෆม ˣ αϙʔτϕΫτϧʹ͍ͭͯ yi(w⊤xi + b) = 1 ͱͳΔΑ͏ w, b Λ ద౰ʹεέʔϦϯάͯ͠Α͍ 18
  21. Ϛʔδϯ࠷େԽ໰୊ͷ؆ུԽʢଓ͖ʣ લड़ͷεέʔϦϯάʹΑΓ mini{yi(w⊤xi + b)} = 1 ͱ͢Δͱ݁ہ ࣜ (5)

    ͸ max w,b { 1 ∥w∥2 min i [ yi(w⊤xi + b) ] } = max w,b 1 ∥w∥2 ↔ min w,b 1 2 ∥w∥2 2 (6) ΛεέʔϦϯάʹؔ͢Δෆ౳੍ࣜ໿ yi(w⊤xi + b) ≥ 1 (i = 1, . . . , n) (7) ͷԼͰղ͘ͱ͍͏࠷దԽ໰୊ʹམͱ͠ࠐ·ΕΔ ˠ KKT ৚݅ʹجͮ͘ Lagrange ૒ର໰୊΁ͷม׵Ͱ͞Βʹ؆୯Խ 19
  22. Karush-Kuhn-Tucker ৚݅ Minimize f(x) subject to gk (x) ≤ 0

    k = 1, . . . , K } (8) f, gk ͸ඍ෼Մೳͳತؔ਺ɽ͜͜Ͱ Lagrange ؔ਺ L(x, λ) = f(x) + K ∑ k=1 λk gk (x) (9) ʹؔͯ͠ KKT ৚݅        ∇L(x∗, λ∗) = ∇f(x∗) + K ∑ k=1 λ∗ k ∇gk (x∗) = 0 λ∗ k ≥ 0, gk (x∗) ≤ 0, λ∗ k gk (x∗) = 0 (for k = 1, . . . , K) Λຬͨ͢ x∗, λ∗ ͕ଘࡏ͢Δ͜ͱͱ x∗ ͕΋ͱͷ࠷దԽ໰୊ͷ࠷దղͰ͋ Δ͜ͱ͸౳Ձɽͱ͘ʹ λ∗ k gk (x∗) = 0 Λ૬ิੑ৚݅ͱݺͿ 1 1ݫີʹ͸੍໿ʹؔͯ͠·ͩԾఆ͕ඞཁʢ੍໿૝ఆʣ ɽৄ͍ٞ͠࿦͸ [5] ͳͲࢀর 20
  23. KKT ৚݅ʹؔ͢Δূ໌ KKT ৚݅ ⇒ x∗ ੍͕໿ԼͰ f(x) ͷ࠷దղͰ͋Δ͜ͱͷΈূ໌ h(x)

    := f(x) + K ∑ k=1 λ∗ k gk(x) (10) ͱ͓͘ʢλ∗ ͸ݻఆʣ ɽh ͸ತؔ਺ͷඇෛͷ࿨ͳͷͰತؔ਺ ˠ h ͸ ∇h(x∗) = 0 Λຬͨ͢ x∗ Ͱ࠷খʹ ˠෆ౳੍ࣜ໿ gk(x) ≤ 0 Λຬͨ͢೚ҙͷ x ʹؔͯ͠ҎԼ͕੒ཱɿ f(x∗) + K ∑ k=1 λ∗ k gk(x∗) ≤ f(x) + K ∑ k=1 λ∗ k gk(x) (11) ࠨลͷ࿨ɿ૬ิੑ৚݅ΑΓ 0 ӈลͷ࿨ɿλ∗ k ≥ 0 ͱ gk(x) ≤ 0 ΑΓ 0 ҎԼ ˠৗʹ f(x∗) ≤ f(x) Ͱ x∗ ͕࠷దղ 21
  24. Lagrange ૒ର໰୊΁ͷม׵ Ϛʔδϯ࠷େԽ໰୊͸ҎԼͷܗͰදݱͰ͖ͨʢ੍໿ͷූ߸ʹ஫ҙʣ Minimize 1 2 ∥w∥2 2 subject to

    − (yi(w⊤xi + b) − 1) ≤ 0 (i = 1, . . . , n)      (12) ͜ͷ Lagrange ؔ਺ L(w, b, λ) ͸ L(w, b, λ) = 1 2 ∥w∥2 2 − n ∑ i=1 λi(yi(w⊤xi + b) − 1) (13) ͜ΕΛ w, b ʹ͍ͭͯͦΕͧΕภඍ෼ͯ͠ 0 ͱ͓͘ 22
  25. Lagrange ૒ର໰୊΁ͷม׵ʢଓ͖ʣ L(w, b, λ) = 1 2 ∥w∥2 2

    − n ∑ i=1 λi(yi(w⊤xi + b) − 1) w ʹؔ͢Δภඍ෼͸ ∂L ∂w (w, b, λ) = w − n ∑ i=1 λiyixi = 0 ∴ w = n ∑ i=1 λiyixi (14) b ʹؔ͢Δภඍ෼͸ ∂L ∂b (w, b, λ) = − n ∑ i=1 λiyi = 0 ∴ n ∑ i=1 λiyi = 0 (15) 23
  26. Lagrange ૒ର໰୊΁ͷม׵ʢଓ͖ʣ ࣜ (14),(15) Λ Lagrange ؔ਺ (13) ʹ୅ೖ 1

    2 ∥w∥2 2 − ∑ i λi (yi (w⊤xi + b) − 1) = 1 2 ( ∑ i λi yi x⊤ i )( ∑ j λj yj xi ) − ∑ i λi (yi ( ∑ j λj yj x⊤ j xi + b) − 1) = 1 2 ∑ i ∑ j λi λj yi yj x⊤ i xj + ∑ i λi − ∑ i λi yi (x⊤ i ∑ j λj yj xj + b) = 1 2 ∑ i ∑ j λi λj yi yj x⊤ i xj + ∑ i λi − ∑ i ∑ j λi λj yi yj x⊤ i xj − b ∑ i λi yi = − 1 2 ∑ i ∑ j λi λj yi yj x⊤ i xj + ∑ i λi (16) ˠ Lagrange ؔ਺ L(w, b, λ) ͕ λ ͷΈͷؔ਺ʹͳͬͨʂ 24
  27. ϋʔυϚʔδϯ SVM ͷ·ͱΊ ϋʔυϚʔδϯ SVM ͷ Lagrange ૒ର໰୊ ઢܗ෼཭Մೳͳڭࢣσʔλʹର͢ΔϚʔδϯ࠷େԽ໰୊͸ Maximize

    − 1 2 n ∑ i=1 n ∑ j=1 λiλjyiyjx⊤ i xj + n ∑ i λi subject to λi ≥ 0 i = 1, . . . , n n ∑ i λiyi = 0                    (17) ͱ͍͏࠷దԽ໰୊ʹؼணͰ͖Δ 25
  28. ઢܗ෼཭ෆՄೳͳσʔλ ͜͜·Ͱઢܗ෼཭ՄೳੑΛԾఆ͕ͨ͠ɼҰൠʹ͸ઢܗ෼཭ෆՄೳ ઢܗ෼཭Մೳੑʹؔ͢Δ੍໿ yi(w⊤xi + b) ≥ 1 ΛͳΜͱ͔ͯ͠؇࿨͢Δ ˣ

    ޡࣝผʹ͸േଇΛ՝ͭͭ͠ɼ ϚʔδϯΛͳΔ΂͘େ͖͘ ͢Δ 26
  29. ιϑτϚʔδϯ SVM ͷఆࣜԽ σʔλ఺ͱ෼཭ฏ໘ͱͷҐஔؔ܎ f(xi) ͱϥϕϧ yi ʹؔͯ͠േଇ Λ༩͑Δޡࠩؔ਺ r(f(xi),

    yi) Λ͏·͘ఆٛ ϚʔδϯΛେ͖͘͢Δ໰୊͸ minw,b 1 2 ∥w∥2 2 ͱॻ͚ͨ ˠιϑτϚʔδϯ SVM Ͱղ͘࠷దԽ໰୊͸േଇ΋ߟྀͯ͠ min w,b { 1 2 ∥w∥2 2 + C n ∑ i=1 r(yi, f(xi)) } (18) C ͸േଇͷڧ͞Λ੍ޚ͢ΔϋΠύʔύϥϝʔλ ޡࠩؔ਺ r ͸Ͳ͏બ΂͹Α͍͔ʜʜʁ 27
  30. ޡࠩؔ਺ͷઃܭɿූ߸ؔ਺ͷ໰୊఺ ·ͣࢥ͍ͭ͘ޡࠩؔ਺͸ූ߸ؔ਺Λ༻͍ͨ rmisclass(f(x), y) = 1 − sgn[yf(x)] 2 (19)

    ෼ྨͰ͖͍ͯΕ͹ 0ɼͰ͖͍ͯͳ͚Ε͹ 1 Λग़ྗʢޡࣝผؔ਺ʣ -2 -1 0 1 2 0 1 2 3 misclass loss function yf(x) r_misclass(f(x), y) ඇತؔ਺ͳͷͰ࠷దԽ͕೉͍͠ 28
  31. ޡࠩؔ਺ͷઃܭɿώϯδϩε ୅ସͱͯ͠ತؔ਺Ͱ͋Δώϯδؔ਺ 2 Λ༻͍Δ rhinge(f(x), y) = max{0, 1 −

    yf(x)} (20) -2 -1 0 1 2 0 1 2 3 hinge loss function yf(x) r(f(x), y) misclass hinge • ৭෇͖ͷྖҬʹҐஔ͢ΔσʔλͷΈ͕د༩ʢεύʔεੑʣ • ೋ৐ޡࠩΑΓ΋ؔ਺ͷ૿Ճ͕؇΍͔ʢϩόετੑʣ 2ώϯδؔ਺ͷ͖͍͠஋΋࠷దԽ͢Δ ν-SVM ͱ͍͏ख๏΋ଘࡏ͢Δ 29
  32. ώϯδϩεͷ࠷খԽ rhinge(f(x), y) = max{0, 1 − yf(x)} = max{0,

    1 − y(w⊤x + b)} ώϯδؔ਺Λ࢖͏ͱύϥϝʔλ w, b ͷ࠷খԽ໰୊͸ min w,b { 1 2 ∥w∥2 2 + C n ∑ i=1 rhinge(f(xi), yi) } (21) min ͷத͸ತؔ਺ͷ࿨ͳͷͰತؔ਺ͷ·· ͔͠͠ rhinge ͸ 2 ͭͷઢܗؔ਺Λܨ͛ͨ΋ͷͰ৔߹෼͚͕ඞཁ ˣ ΋ͬͱѻ͍΍͍͢ܗʹมܗͰ͖Δͱخ͍͠ 30
  33. εϥοΫม਺ͷಋೖ εϥοΫม਺ ξi (i = 1, . . . ,

    n) ΛҎԼͰఆٛɿ ξi := rhinge(f(xi), yi) (22) -2 -1 0 1 2 0 1 2 3 hinge loss function yf(x) r_hinge(f(x), y) ξi ͸ҎԼͷෆ౳ࣜΛຬͨ ͢ ξi ͷ͏ͪ࠷খͷ΋ͷ { ξi ≥ 0 ξi ≥ 1 − yif(xi) ৭෇͖ͷྖҬ੍͕໿Λຬͨ͢ൣғ 31
  34. εϥοΫม਺ʹΑΔ࠷దԽؔ਺ͷॻ͖׵͑ ΋ͱͷ໨తؔ਺ min w,b { 1 2 ∥w∥2 2 +

    C n ∑ i=1 rhinge(f(xi), yi) } ͸εϥοΫม਺ ξ = (ξ1, . . . , ξn)⊤ Λ༻͍ͯෆ౳੍ࣜ໿Լͷ࠷దԽ ໰୊ Minimize 1 2 ∥w∥2 2 + C n ∑ i=1 ξi subject to ξi ≥ 0 ξi ≥ 1 − yif(xi)              (23) ͱॻ͖௚ͤΔ ˠ Lagrange ૒ର໰୊ʹΑΔ࠷దԽύϥϝʔλͷ࡟ݮ 32
  35. ιϑτϚʔδϯ SVM ͷ Lagrange ૒ର໰୊ Lagrange ؔ਺͸ະఆ৐਺ λ = (λ1,

    . . . , λn)⊤, µ = (µ1, . . . , µn)⊤ Λ༻͍ͯʢූ߸ʹ஫ҙʣ L(w, b, ξ, λ, µ) = 1 2 ∥w∥2 2 + C n ∑ i=1 ξi − n ∑ i=1 λi ξi − n ∑ i=1 µi (ξi − 1 + yi (w⊤xi + b)) (24) ∂L/∂w = 0, ∂L/∂b = 0, ∂L/∂ξi = 0 Λղ͍ͯ੔ཧ͢Δͱ Maximize − 1 2 n ∑ i=1 n ∑ j=1 µi µj yi yj x⊤ i xj + n ∑ i µi subject to 0 ≤ µi ≤ C i = 1, . . . , n n ∑ i µi yi = 0                  (25) ͱ µ ͷΈʹର͢Δ࠷దԽ໰୊ʹͳΔʢಋग़͸෇࿥Ͱʣ 33
  36. ιϑτϚʔδϯ SVM ·ͱΊ ιϑτϚʔδϯ SVM ͷ Lagrange ૒ର໰୊ Maximize −

    1 2 n ∑ i=1 n ∑ j=1 µiµjyiyjx⊤ i xj + n ∑ i µi subject to 0 ≤ µi ≤ C i = 1, . . . , n n ∑ i µiyi = 0                    (26) ࣮͸ϋʔυϚʔδϯ SVM ͷ 0 ≤ λi ͕ 0 ≤ µi ≤ C ʹͳ͚ͬͨͩ ˠ C → ∞ ͱͱΕ͹ϋʔυϚʔδϯ SVM ͱ౳Ձ C ͸ޡࠩؔ਺ r ͷڧ͞Λ੍ޚ͢ΔύϥϝʔλͩͬͨͷͰ௚ײతʹ ͸͋ͨΓ·͑ 34
  37. ઢܗ SVM ͷݶք ઢܗͷ෼ྨڥքͰ͸ෆద੾ͳ৔߹΋ଟ͍ -4 -2 0 2 4 -4

    -2 0 2 4 R^2 data ઢܗ SVM Λࣗવʹ֦ுͯ͠ͳΜͱ͔ඇઢܗؔ܎Λଊ͑ΒΕͳ ͍͔ʁ 35
  38. ඇઢܗ෼ྨڥք΁ͷΞΠσΞ ೖྗ x Λ͏·͘ߴ࣍ݩʹࣸ૾͢Δͱɼద੾ʹʢࣸ૾ઌͷۭؒͰʣ ઢܗͷ෼ྨڥք͕Ҿ͚Δ͔΋ 4 2 0 2 4

    x1 unseparable 1D data 4 2 0 2 4 x1 0 4 8 12 16 x2 separable 2D mapped data R ∋ x → (x, x2)⊤ ∈ R2 ͷࣸ૾Ͱઢܗ෼཭Մೳʹ 36
  39. ߴ࣍ݩۭؒͰͷ෼ྨ Ұൠʹ͸ͲͷΑ͏ͳߴ࣍ݩ΁ͷࣸ૾ʢಛ௃ࣸ૾ʣΛߟ͑Ε͹Α͍ ͷ͔͸Α͘Θ͔Βͳ͍ ˠͱΓ͋͑ͣ ϕ(x) ͱ͓͍ͯΈΔ ઢܗ SVM ͷ໨తؔ਺͸ −

    1 2 n ∑ i=1 n ∑ j=1 µiµjyiyjx⊤ i xj + n ∑ i µi ͩͬͨͷͰɼϕ ʹΑͬͯࣸ૾ͨ͠ઌʢಛ௃ۭؒʣͰͷ෼ྨͳΒ − 1 2 n ∑ i=1 n ∑ j=1 µiµjyiyjϕ(xi)⊤ϕ(xj) + n ∑ i µi 37
  40. ߴߍ਺ֶͷ෮शɿ಺ੵ ϢʔΫϦουۭؒͰͷ಺ੵ |a| ̸= 0, |b| ̸= 0 ͳΒʜʜ a

    · b = 0 ⇔ a⊥b ಺ੵ͸ϕΫτϧಉ࢜ͷྨࣅ౓ͷ ࢦඪͱଊ͑ΒΕΔ 38
  41. ΧʔωϧτϦοΫ ݁ہ 1. ద౰ʹσʔλͷಛ௃ࣸ૾ ϕ ΛܾΊΔ 2. ಛ௃্ۭؒͰͷσʔλಉ࢜ͷ಺ੵΛධՁ͢Δ 3. SVM

    ͷ෼ྨڥքΛҾ͘ ͱ͍͏खଓ͖ʹͳΔʁ SVM ͷ࠷దԽʹඞཁͳͷ͸಺ੵ͚ͩͰɼಛ௃ࣸ૾ ϕ ͷؔ਺ܗࣗ ମͷ৘ใ͸ཅʹ͸࢖Θͳ͍ ˣ ࣮͸ ϕ ͦͷ΋ͷΛ໌ࣔతʹ༩͑ͳͯ͘΋ɼͦͷ಺ੵ k(x, x′) := ϕ(x)⊤ϕ(x′) ʢΧʔωϧؔ਺ʣ͚ͩఆٛ͢Ε͹े෼ʂ ˠΧʔωϧτϦοΫͱݺͿ 39
  42. Χʔωϧ SVM ͷ Lagrange ૒ର໰୊ Χʔωϧ SVM ͷखॱ͸݁ہ 1. Χʔωϧؔ਺

    k(x, x′) = ϕ(x)⊤ϕ(x′) Λద౰ʹઃఆ 2. ҎԼͷ࠷େԽ໰୊Λ 0 ≤ µi ≤ C ͷ΋ͱͰղ͘ − 1 2 n ∑ i=1 n ∑ j=1 µiµjyiyjk(xi, xj) + n ∑ i µi ʢ͘Θ͍ٞ͠࿦͸෇࿥Ͱʣ ಛ௃্ۭؒͰͷࣝผฏ໘ͷࣜ͸ f(xi) = w⊤ϕ(xi) (27) ͳͷͰ 3ɼ࠷దԽύϥϝʔλ͸ಛ௃ۭؒͷ࣍ݩ਺͚ͩଘࡏͨ͠ ˠ͜Ε͕σʔλ਺ n ·Ͱݮ͍ͬͯΔ 3؆୯ͷͨΊόΠΞε߲͸ແࢹ͍ͯ͠Δ 40
  43. Χʔωϧؔ਺ͷઃఆ ͦΕͰ݁ہΧʔωϧؔ਺ k(x, x′) ʹ͸ԿΛ࢖͑͹͍͍ͷʁ ˠେ఍͸طʹੑ࣭͕Α͘஌ΒΕ͍ͯΔΧʔωϧؔ਺Λ༻͍Δ Χʔωϧؔ਺ͷྫ Ψ΢εΧʔωϧ k(x, x′)

    = exp(−β∥x − x′∥2 2 ) (β > 0) ଟ߲ࣜΧʔωϧ k(x, x′) = (c + x⊤x′)d (c ≥ 0, d ∈ N) Ψ΢εΧʔωϧ͸ແݶ࣍ݩʢʂʣ ɼଟ߲ࣜΧʔωϧ͸ d ࣍ݩ΁ͷ ಛ௃ࣸ૾ʹରԠ 41
  44. Χʔωϧؔ਺ͷઃఆ Ψ΢εΧʔωϧͷྫ (x, x′ ∈ Rɼԣ࣠͸ x − x′) -4

    -2 0 2 4 0.00 0.25 0.50 0.75 1.00 Gauss kernel x - x' similarity beta = 1.0 beta = 0.3 beta = 0.2 Χʔωϧ SVM ͷࣝผڥք͸ϋΠύʔύϥϝʔλʹେ͖͘ґଘ 42
  45. Χʔωϧ SVM ·ͱΊ Χʔωϧ SVM ͷ Lagrange ૒ର໰୊ Maximize −

    1 2 n ∑ i=1 n ∑ j=1 µiµjyiyjk(x, xj) + n ∑ i µi subject to 0 ≤ µi ≤ C i = 1, . . . , n n ∑ i µiyi = 0                    (28) ࣝผڥք͸Χʔωϧؔ਺ k(x, x′) ΍ͦͷϋΠύʔύϥϝʔλͷબ ͼํʹڧ͘ґଘ 43
  46. ճؼ໰୊΁ͷ֦ு ೋ৐ޡࠩͷ୅ΘΓʹ ε-ෆײԠؔ਺ rε(z) Λޡࠩؔ਺ʹՃ͑ͯճؼ -2 -1 0 1 2

    0 1 2 3 epsilon-insensitive function z r(z) insensitive square rε(z) =        −z − ε (z < −ε) 0 (−ε ≤ z < ε) z − ε (ε ≤ z) ε-ෆײԠؔ਺Ͱͷճؼ 4 ˠαϙʔτϕΫτϧճؼ ιϑτϚʔδϯ SVM ͱಉ༷εϥοΫม਺Λಋೖͯ͠࠷దԽ 4ιϑτϚʔδϯ SVM ಉ༷ ϵ ΋࠷దԽ͢Δ ν-SVR ͱ͍͏ख๏΋ଘࡏ͢Δ 44
  47. αϙʔτϕΫτϧճؼͷεϥοΫม਺ εϥοΫม਺ ξi (i = 1, . . . ,

    n) ΛҎԼͰఆٛɿ ξi := rε(yi − f(xi)) (29) -2 -1 0 1 2 0 1 2 3 epsilon-insensitive function y - f(x) r_eps(y - f(x)) ξi ͸ҎԼͷෆ౳ࣜΛຬͨ ͢ ξi ͷ͏ͪ࠷খͷ΋ͷ        ξi ≥ 0 ξi ≥ yi − f(xi) − ε ξi ≥ −(yi − f(xi)) − ε ৭෇͖ͷྖҬ੍͕໿Λຬͨ͢ൣғ 45
  48. αϙʔτϕΫτϧճؼͷओ໰୊ αϙʔτϕΫτϧճؼͰۂઢճؼΛߦ͏ʹ͸ΧʔωϧԽ͕ඞཁ ˠΧʔωϧ๏͸Ұൠʹදݱೳྗ͕ߴ͗ͯ͢աֶश͠΍͍͢ ˠ 2 ࣍ͷਖ਼ଇԽΛ͔͚Δ ࠷খԽ໰୊͸ min w,ξ n

    ∑ i=1 ξi + λ 2 ∥w∥2 2 ↔ min w,ξ C n ∑ i=1 ξi + 1 2 ∥w∥2 2 (30) Λલड़ͷεϥοΫม਺ ξi ͷ੍໿ͷ΋ͱͰղ͘ܗʹ (C = 1/λ) ˠΧʔωϧ SVM ͱ΄΅ಉ͡ܗࣜʢ੍໿ͷΈҟͳΔʣ 46
  49. αϙʔτϕΫτϧճؼͷ Lagrange ૒ର໰୊ Lagrange ૒ର໰୊ʹஔ͖׵͑ͯ੔ཧ͢Δͱ − 1 2 n ∑

    i=1 n ∑ j=1 (γ+ i − γ− i )(γ+ j − γ− j )Kij − n ∑ i=1 γ− i (yi + ε) − n ∑ i=1 γ+ i (−yi + ε) (31) Λ੍໿ 0 ≤ γ+ i + γ− i ≤ C (32) ͷ΋ͱͰ࠷େԽ͢Δ໰୊ʹͳΔ 5 5͜ͷಋग़͸ Representer ఆཧ͕ඞཁͳͷͰ෇࿥Ͱ 47
  50. αϙʔτϕΫτϧճؼͷ࣍਺ ճؼࣜͷ࠷େ࣍ݩΛݻఆ͍ͨ͠৔߹͸ଟ߲ࣜΧʔωϧ͕༗༻ 0 1 2 3 -10 -8 -6 -4

    -2 Polynomial Kernelized SVR poly2 poly3 48
  51. ࢀߟจݙ I [1] Aur´ elien G´ eron, Hands-On Machine Learning

    with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O’reilly Media, 2017 [2] C.M. Ϗγϣ οϓɼύλʔϯೝࣝͱػցֶश Լ ϕΠζཧ࿦ʹ ΑΔ౷ܭత༧ଌʢݩాߒ΄͔༁ʣ ɼؙળग़൛ɼ2012 ೥ [3] ੺ึতଠ࿠ɼΧʔωϧଟมྔղੳ ඇઢܗσʔλղੳͷ৽͍͠ ల։ɼؠ೾ॻళɼ2008 ೥ [4] ෱ਫ݈࣍ɼΧʔωϧ๏ೖ໳ -ਖ਼ఆ஋ΧʔωϧʹΑΔσʔλղ ੳ-ɼே૔ॻళɼ2010 ೥ [5] פ໺ળതɼ౔୩ོɼ࠷దԽͱม෼๏ɼؙળग़൛ɼ2014 49
  52. ιϑτϚʔδϯ SVM ͷ Lagrange ૒ର໰୊ͷಋग़ (1) Lagrange ؔ਺ L(w, b,

    ξ, λ, µ) ͸ L(w, b, ξ, λ, µ) = 1 2 ∥w∥2 2 + C n ∑ i=1 ξi − n ∑ i=1 λiξi − n ∑ i=1 µi(ξi − 1 + yi(w⊤xi + b)) (A.1) KKT ৚݅͸        µi, λi, ξi ≥ 0 ξi − 1 + yi(w⊤x + b) ≥ 0 µi(ξi − 1 + yi(w⊤x + b)) = 0 (A.2)
  53. ιϑτϚʔδϯ SVM ͷ Lagrange ૒ର໰୊ͷಋग़ (2) Lagrange ؔ਺Λ w, b,

    ξi Ͱภඍ෼ ∂L ∂w = w − n ∑ i=1 µiyixi = 0 ∴ w = n ∑ i=1 µiyixi (A.3) ∂L ∂b = n ∑ i=1 µiyi = 0 (A.4) ∂L ∂ξ = C − λi − µi = 0 (A.5) ࣜ (A.1) ͷ Lagrange ؔ਺ʹ͢΂ͯ୅ೖ
  54. ιϑτϚʔδϯ SVM ͷ Lagrange ૒ର໰୊ͷಋग़ (3) L = 1 2

    ( n ∑ i=1 µiyix⊤ i )   n ∑ j=1 µjyjxj   + C n ∑ i=1 ξi − n ∑ i=1 λiξi − n ∑ i=1 (C − λi)ξi − n ∑ i=1 µi  −1 + yi   n ∑ j=1 µjyix⊤ j xi + b     = 1 2 n ∑ i=1 n ∑ j=1 µiµjyiyjx⊤ i xi + n ∑ i=1 µi − n ∑ i=1 µiyi   n ∑ j=1 µjyix⊤ j xj + b   ∑ i µiyi = 0(ࣜ (A.5)) ΑΓ L = − 1 2 n ∑ i=1 n ∑ j=1 µiµjyiyjx⊤ i xj + n ∑ i=1 µi (A.6)
  55. ιϑτϚʔδϯ SVM ͷ Lagrange ૒ର໰୊ͷಋग़ (4) ࣜ (A.2) ΑΓ µi,

    λi ≥ 0ɼࣜ (A.5) ΑΓ λi + µi = C ͜ΕΒΛຬͨ͢ʹ͸ µi ͓Αͼ λi ͸ͲͪΒ΋ C ҎԼͰ͋Δඞཁ Αͬͯ݁ہࣜ (A.5) ͷ৚݅΋Ճ͑ͯ Minimize − 1 2 n ∑ i=1 n ∑ j=1 µiµjyiyjx⊤ i xj + n ∑ i µi subject to 0 ≤ µi ≤ C i = 1, . . . , n n ∑ i µiyi = 0                    (A.7)
  56. Χʔωϧ SVM ͷఆࣜԽ Χʔωϧ SVM ͷύϥϝʔλ࠷దԽͷओ໰୊͸ minw { 1 2

    ∥w∥2 2 + C ∑ n i=1 rhinge(yi, f(xi)) } (A.8) f(xi) = w⊤ϕ(xi) (A.9) Ͱ༩͑ΒΕΔɽιϑτϚʔδϯ SVM ͷಉ༷ʹ͜ͷ Lagrange ؔ਺ ͸εϥοΫม਺ ξi Λ༻͍ͯ L(w, ξ, λ, µ) = 1 2 ∥w∥2 2 + C n ∑ i=1 ξi − n ∑ i=1 λiξi − n ∑ i=1 µi(ξi − 1 + yi(w⊤ϕ(xi))) (A.10) ͱͳΔɽ Representer ఆཧΛ࢖ͬͯߴ࣍ݩͷ w Λ n ࣍ݩʹམͱ͜͠Ή
  57. Representer ఆཧ Representer ఆཧ ύϥϝʔλ w Λ΋ͭଛࣦؔ਺ʹ ∥w∥2 2 ͱ͍͏ਖ਼ଇԽΛՃ͑ͨ࠷

    దԽ໰୊ʹ͓͍ͯɼw ͷ࠷దղ ˆ w ͸ α = (α1, . . . , αn)⊤ Λ༻ ͍ͯ ˆ w = n ∑ i=1 αiϕ(xi) (A.11) ͱ͍͏ܗͰॻ͚Δɽ dɿಛ௃ۭؒͷ࣍ݩɼnɿσʔλ਺ n ͱ͘ʹ n < d ͷͱ͖ɼ࠷దԽ͢Δύϥϝʔλ͕ d ݸ͔Β n ݸʹ
  58. Representer ఆཧʢূ໌ʣ σʔλ (x1, . . . , xn)⊤ ͷಛ௃ϕΫτϧͷઢܗ࿨Λ

    w0 = n ∑ i=1 αiϕ(xi) (A.12) ͱॻ͘ɽ೚ҙͷ w ∈ Rd ͸ w0 ͱ͢΂ͯͷ ϕ(xi) ʹ௚ަ͢ΔϕΫ τϧ ξ ∈ Rd ʹΑΓ w = w0 + ξ (A.13) ͱॻ͚Δ ͜͜Ͱ w ͱ ϕ(xj) ͱͷ಺ੵ͸ɼ௚ަੑ ξ⊤ϕ(xj) = 0 ΑΓ w⊤ϕ(xj) = w⊤ 0 ϕ(xj) (A.14) Ͱ ξ ʹґଘ͠ͳ͍
  59. Representer ఆཧʢূ໌ଓ͖ʣ ਖ਼ଇԽ߲ ∥w∥2 2 Λ w0, ξ ʹ෼ղ͢Δͱ ∥w∥2

    2 = ∥w0∥2 2 + 2w⊤ 0 ξ + ∥ξ∥2 2 = ∥w0∥2 2 + ∥ξ∥2 2 (A.15) ͱͳΓɼ͜Ε͸ ξ = 0 ͷͱ͖ʹ࠷খ Αͬͯଛࣦؔ਺ͱਖ਼ଇԽ߲ͷ࿨Λ࠷খʹ͢Δ ˆ w ͸݁ہ ˆ w = w0 = n ∑ i=1 αiϕ(xi) (A.16) ͱಛ௃ۭؒʹࣸͨ͠σʔλͷઢܗ࿨Ͱॻ͚Δ
  60. Representer ఆཧʹΑΔ Lagrange ؔ਺ͷॻ͖׵͑ Representer ఆཧΛ౿·͑ͯ w = ∑ n

    i=1 αiϕ(xi) ͱ͓͘ͱɼΧʔω ϧ SVM ͷ Lagrange ؔ਺͸ 1 2 α⊤Kα + C n ∑ i=1 ξi − n ∑ i=1 λi ξi − n ∑ i=1 µi  ξi − 1 + yi   n ∑ j=1 αj ϕ(xj )⊤ϕ(xi )     = 1 2 α⊤Kα + C n ∑ i=1 ξi − n ∑ i=1 λi ξi − n ∑ i=1 µi  ξi − 1 + yi   n ∑ j=1 αj Kij     ͱॻ͚Δɽ͜͜Ͱ K ͸άϥϜߦྻ K =       k(x1, x1) k(x2, x1) · · · k(xn, x1) k(x1, x2) k(x2, x2) · · · k(xn, x2) . . . . . . ... . . . k(x1, xn) k(x2, xn) · · · k(xn, xn)      
  61. Χʔωϧ SVM ͷ Lagrange ૒ର໰୊ ͜Ε·Ͱͱಉ༷ʹ Lagrange ؔ਺Λ KKT ৚݅ʹΑΓ੔ཧ͢Δͱɼ

    Χʔωϧ SVM ͷ Lagrange ૒ର໰୊͸ Maximize − 1 2 n ∑ i=1 n ∑ j=1 µiµjyiyjϕ(xi)⊤ϕ(xj) + n ∑ i µi subject to 0 ≤ µi ≤ C i = 1, . . . , n n ∑ i µiyi = 0                    (A.17) ͱͳΔ
  62. αϙʔτϕΫτϧճؼͷ Lagrange ૒ର໰୊ͷಋग़ (1) αϙʔτϕΫτϧճؼͷओ໰୊͸ min w,ξ C n ∑

    i=1 ξi + 1 2 ∥w∥2 2 (A.18) ΛҎԼͷ੍໿ͷ΋ͱͰղ͘͜ͱɿ        ξi ≥ 0 ξi ≥ yi − f(xi) − ε ξi ≥ −(yi − f(xi)) − ε (A.19) Representer ఆཧͱάϥϜߦྻ K ΑΓɼ໨తؔ਺ͱ f(x) ͸ min α,ξ C n ∑ i=1 ξi + 1 2 α⊤Kα, f(x) = n ∑ i=1 αik(xi, x) (A.20)
  63. αϙʔτϕΫτϧճؼͷ Lagrange ૒ର໰୊ͷಋग़ (2) ରԠ͢Δ Lagrange ؔ਺͸ະఆ৐਺ β, γ+, γ−

    Λಋೖͯ͠ L(ξ, α, β, γ+, γ−) = C n ∑ i=1 ξi + 1 2 α⊤Kα − n ∑ i=1 βi ξi − n ∑ i=1 γ+ i  ξi + ε − yi + n ∑ j=1 αj Kij   − n ∑ i=1 γ− i  ξi + ε + yi − n ∑ j=1 αj Kij  (A.21) ξi ʹ͍ͭͯภඍ෼ͯ͠ 0 ͱ͓͘ͱ ∂L ∂ξi = C − βi − γ+ i − γ− i = 0 (A.22)
  64. αϙʔτϕΫτϧճؼͷ Lagrange ૒ର໰୊ͷಋग़ (3) αi ʹ͍ͭͯ΋ภඍ෼ͯ͠ 0 ͱ͓͘ͱ ∂L ∂αi

    = n ∑ j=1 αj Kij − n ∑ j=1 γ+ j Kij + n ∑ j=1 γ− j Kij = n ∑ j=1 (αj − γ+ j + γ− j )Kij = 0 (A.23) ͜ΕΛ i = 1, . . . , n ʹ͍ͭͯฒ΂Δͱ ˆ γ⊤K = 0, ˆ γ = (ˆ γ1 , . . . , ˆ γn )⊤, ˆ γj = αj − γ+ j + γ− j (A.24) ˆ γ⊤K = 0 ͷ྆ลʹ K−1 Λ͔͚ͯ 6 ࠶౓֤ཁૉʹ෼ղ͢Ε͹ αi = γ+ i − γ− i (A.25) 6Ұൠʹಛ௃ϕΫτϧͷ಺ੵͰఆٛͨ͠άϥϜߦྻ͸ਖ਼ఆ஋
  65. αϙʔτϕΫτϧճؼͷ Lagrange ૒ର໰୊ͷಋग़ (4) Lagrange ؔ਺ʹࣜ (A.22),(A.25) Λ୅ೖ͢Δͱ࠷ऴతʹ max γ+,γ−

    − 1 2 n ∑ i=1 n ∑ j=1 (γ+ i − γ− i )(γ+ j − γ− j )Kij − n ∑ i=1 γ− i (yi + ε) − n ∑ i=1 γ+ i (−yi + ε) (A.26) ͱ੔ཧͰ͖Δ ·ͨ KKT ৚݅ βi, γ+ i , γ− i ≥ 0 ͓Αͼࣜ (A.22) ͔Β੍໿ 0 ≤ γ+ i + γ− i ≤ C (A.27) ΛಘΔ