Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Target Encoding はなぜ有効なのか

Shuhei Goda
November 30, 2019

Target Encoding はなぜ有効なのか

Shuhei Goda

November 30, 2019
Tweet

More Decks by Shuhei Goda

Other Decks in Technology

Transcript

  1. ©2019 Wantedly, Inc. Self-Introduction •Shuhei Godaʢ߹ా पฏʣ •Wantedly, Inc. (since

    Sep 2019) •Recommendation Team https://www.wantedly.com/projects/375150 Kaggle Master hakubishinͱ͍͏໊લͰ twitter΍͍ͬͯ·͢ @jy_msc We are hiring !
  2. ©2019 Wantedly, Inc. ɾTarget Encoding͸ͳͥ༗ޮͳͷ͔ ɾKaggleͰͷఆ൪ख๏ͷ1ͭ ɾLabel EncodingͰ͸ͳ͘Target EncodingΛͨ͠ํ͕ྑ͍৔߹͕͋Δ ɾͳͥTarget

    Encoding͕ྑ͍݁ՌΛग़͢ͷ͔, ͦͷཧ༝Λઆ໌͍ͯ͠Δࢿྉ͕͋ ·Γݟ౰ͨΒͳ͍ ɾTarget Encoding͕༗ޮͰ͋Δཧ༝ʹ͍ͭͯ, ࣗ෼ͳΓͷղऍΛ঺հ About Talk
  3. ©2019 Wantedly, Inc. ɾҎԼͷΑ͏ͳσʔλΛ࢖ͬͯઆ໌͢Δ ɹɹɾ໨తม਺ y ͸࿈ଓ஋ ɹɹɾઆ໌ม਺ x ͸ਫ४਺4ͷΧςΰϦม਺

    x = {A, B, C, D} ɹɹɹɾE[y|x=A]=60, E[y|x=B]=20, E[y|x=C]=50, E[y|x=D]=10 ࢖༻͢Δαϯϓϧσʔλ
  4. ©2019 Wantedly, Inc. GBDTͷ෮श σʔληοτ: Ճ๏Ϟσϧ: ଛࣦؔ਺: ͸mຊ໨ͷ໦ͷ༿ͷweight, ͸໦ͷ༿ͷ਺, ͸໦ͷ਺Λද͢

    D = {(xi , yi )}n i=1 (xi ∈ Rm, yi ∈ R) ̂ yi = ΣM m=1 fm (xi ) = ΣM m=1 wm (xi ) L = Σn i=1 l( ̂ yi , yi ) + ΣM m=1 Ω(fm ) (Ω(f ) = γT + 1 2 λ∥w∥2) wm (x) T M
  5. ©2019 Wantedly, Inc. GBDTͷ෮श ໦͕mຊ໨ͷ࣌ͷଛࣦؔ਺: ͸, j൪໨ͷ༿ʹׂΓ౰ͯΒΕͨσʔλू߹ ͸, m-1ຊ໨·Ͱͷ༧ଌ݁ՌʹΑΔҰ֊ඍ෼ͱೋ֊ඍ෼ gradient:

    , hessian: L(m) = Σn i=1 l(yi , ̂ yi + fm (xi )) + Ω(fm ) ≃ Σn i=1 [gi fm (xi ) + 1 2 hi fm (xi )] + γT + 1 2 λΣT j=1 w2 j = ΣT j=1 [(Σi∈Ij gi )wj + 1 2 (Σi∈Ij hj + λ)w2 j + γT Ij gi , hi gi = ∂l(yi , ̂ y(m−1) i ) ∂ ̂ y(m−1) i hi = ∂2l(yi , ̂ y(m−1) i ) (∂ ̂ y(m−1) i )2
  6. ©2019 Wantedly, Inc. GBDTͷ෮श αϯϓϧׂ͕ΓৼΒΕͨ࣌ͷ༿ͷ࠷దͳweight͸ Ͱ͋Γ, ͦͷ࣌ͷଛࣦ஋͸ αϯϓϧΛ෼ׂͨ࣌͠ͷଛࣦͷݮΓํΛΈͯ, nodeຖʹ࠷దͳ෼ׂΛ୳͍ͯ͘͠ gain:

    w* j = − Σi∈Ij gi Σi∈Ij hi L(m) = − 1 2 ΣT j=1 (Σi∈Ij gi )2 Σi∈Ij hj + λ + γT Lbef − (Laf,left + Laf,right ) " #  $ % $ % " # Lbef Laf,left Laf,right gain (෼ׂલޙͷlossͷࠩ) ͕ େ͖͍΄Ͳྑ͍෼ׂ
  7. ©2019 Wantedly, Inc. GBDTͷ෮श ଛࣦؔ਺͕ MSE ͷ৔߹ ଛࣦؔ਺: gradient: ,

    hessian: ΑΓ ༿ j ͷ weight ͸, ༿ j ʹׂΓ౰ͯΒΕͨαϯϓϧͷ࢒ࠩฏۉͱͳΔ l(yi , ̂ yi ) = 1 2 (yi − ̂ yi )2 gi = ∂l(yi , ̂ y(m−1) i ) ∂ ̂ y(m−1) i = ̂ y(m−1) i − yi hi = ∂2l(yi , ̂ y(m−1) i ) (∂ ̂ y(m−1) i )2 = 1 w* j = − Σi∈Ij gi Σi∈Ij hi = − Σi∈Ij ( ̂ y(m−1) i − yi ) Σi∈Ij 1 ࢒ࠩ(ਅ஋ - m-1ຊ໨࣌఺ͷ༧ଌ஋)ͷ૯࿨ αϯϓϧͷ਺
  8. ©2019 Wantedly, Inc. GBDTͷઃఆ ɾγϯϓϧͳϞσϧͰߟ͑ͯΈΔ. ɹɾloss_func = ‘MAE' ɹɾeta =

    1 → εςοϓαΠζ ɹɾiteration = 1 → ࠷ॳͷ໦͚ͩߟ͑Δ ɹɾtree_method = ‘exact’ → ۪௚ʹશ୳ࡧ ɹɾbase_score = 0 → ॳظ஋͸0ελʔτ ɹɾlambda = 0 ɹɾgamma = 0
  9. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=1) w* left w* left

    w* left w* right w* right w* right L1 = − 48797 L2 = − 56913 L2 = − 49783 L2 = − 57093 L2,left = − 35522 L2,right = − 21391 L2,left = − 31832 L2,right = − 17951 L2,left = − 56097 L2,right = − 996 " # $ %
  10. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=1) " # $ %

    w* left w* left w* left w* right w* right w* right L1 = − 48797 L2 = − 56913 L2 = − 49783 L2 = − 57093 L2,left = − 35522 L2,right = − 21391 L2,left = − 31832 L2,right = − 17951 ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏ L2,left = − 56097 L2,right = − 996
  11. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=2) L2 = − 56097

    L3 = − 60111 L3 = − 56769 w* left w* right w* left w* right L3,left = − 35522 L3,right = − 24589 L3,left = − 31832 L3,right = − 24937 " # $ % % " # $ L1 = − 48797 L2,left = − 56097 L2,right = − 996
  12. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=2) L2 = − 56097

    " # $ % % " # $ L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3 = − 60111 L3 = − 56769 w* left w* right w* left w* right L3,left = − 35522 L3,right = − 24589 L3,left = − 31832 L3,right = − 24937 ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏
  13. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=3) L3 = − 24589

    L4 = − 29013 w* left w* right L4,left = − 4076 L4,right = − 24937 " # $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589
  14. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=3) L3 = − 24589

    L4 = − 29013 w* left w* right L4,left = − 4076 L4,right = − 24937 " # $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏
  15. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=3) " # $ %

    % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 $ # L4,left = − 4076 L4,right = − 24937 ෼ׂऴΘΓ
  16. ©2019 Wantedly, Inc. Target EncodingΛ࢖ͬͨ৔߹ (depth=1) L1 = − 48797

    L2,left = − 996 L2,right = − 56097 L2,left = − 4551 L2,right = − 59992 L2,left = − 21391 L2,right = − 35522 w* left w* right w* left w* right w* left w* right L2 = − 57093 L2 = − 64543 L2 = − 56913 " # $ %
  17. ©2019 Wantedly, Inc. Target EncodingΛ࢖ͬͨ৔߹ (depth=1) L1 = − 48797

    L2,left = − 996 L2,right = − 56097 L2,left = − 4551 L2,right = − 59992 L2,left = − 21391 L2,right = − 35522 w* left w* right w* left w* right w* left w* right ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏ L2 = − 57093 L2 = − 64543 L2 = − 56913 " # $ %
  18. ©2019 Wantedly, Inc. Target EncodingΛ࢖ͬͨ৔߹ (depth=2) " #  $

    % " $ # % L2,left = − 4551 L1 = − 48797 L2,left = − 4551 L2,right = − 59992 L2,right = − 59992 w* right w* left L′ 3,left = − 24937 L′ 3,right = − 35522 L3 = − 60459 w* right w* left L3,left = − 996 L3,right = − 4076 L3 = − 5072
  19. ©2019 Wantedly, Inc. Target EncodingΛ࢖ͬͨ৔߹ (depth=2) " #  $

    % " $ # % L2,left = − 4551 L1 = − 48797 L2,left = − 4551 L2,right = − 59992 L2,right = − 59992 w* right w* left L′ 3,left = − 24937 L′ 3,right = − 35522 L3 = − 60459 w* right w* left L3,left = − 996 L3,right = − 4076 L3 = − 5072 ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏ ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏
  20. ©2019 Wantedly, Inc. Target EncodingΛ࢖ͬͨ৔߹ (depth=2) " # $ %

    " $ # % L1 = − 48797 L2,left = − 4551 L2,right = − 59992 # % " $ L′ 3,left = − 24937 L′ 3,right = − 35522 L3,left = − 996 L3,right = − 4076 ෼ׂऴΘΓ
  21. ©2019 Wantedly, Inc. Label Encoding ͱ Target Encoding ͷൺֱ "

    #  $ % " $ # % L1 = − 48797 L2,left = − 4551 L2,right = − 59992 # % " $ L′ 3,left = − 24937 L′ 3,right = − 35522 L3,left = − 996 L3,right = − 4076 " #  $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 $ # L4,left = − 4076 L4,right = − 24937 Label EncodingͰ࡞ͬͨ໦ߏ଄ Target EncodingͰ࡞ͬͨ໦ߏ଄
  22. ©2019 Wantedly, Inc. ਂ͞ / iteration Λ૿΍͍͚ͯ͠͹Ϟσϧ͕ྑ͠ͳʹͯ͘͠ΕΔΜ͡Όͳ͍ʁ ɾ໌Β͔ʹྑ͍ͱΘ͔͍ͬͯΔ৘ใ͸໌ࣔతʹϞσϧʹ౉ͨ͠ํ͕ྑ͍ ɾLabel EncodingͰ΋Կͱ͔ͯ͘͠ΕΔ͔΋͠Εͳ͍͕,

    Ϟσϧ͕ෳࡶʹ ͳΓ΍͍͢. ਫ४਺͕૿͍͑ͯ͘΄Ͳ, ͦΕ͸ݱ࣮తͰ͸ͳ͍. ɾܦݧ্, ໌Β͔ʹޮ͘ͱ෼͔͍ͬͯΔ΋ͷ͸ֶशͷલஈ֊ͰରԠͨ͠ํ ͕ྑ͍. ɾ਺஋ಛ௃ྔͷinteractionͱಉ͡࿩
  23. ©2019 Wantedly, Inc. ɾTarget EncodingʹΑͬͯ, Ϟσϧ͕ΑΓγϯϓϧʹͳΔ ɾଛࣦؔ਺͕MSEͰ࢝ΊͷํͷiterationͰ͸, ࢒ࠩͷେ͖͍ॱʹιʔτ͢Δ͜ͱ Ͱޮ཰తͳ෼ׂΛ࣮ݱ͢Δ͜ͱ͕Ͱ͖Δ. ɾਫ४਺͕૿͑Δ΄Ͳ,

    Target EncodingͷޮՌ͕େ͖͘ͳΔ ɾLabel encodingͰTarget encodingͱಉ౳ͷ͜ͱΛ΍ΔͨΊʹ͸͋Δఔ౓ͷਂ͞ ͕ඞཁͰ, ͦΕ͸ਫ४਺͕૿͑Δ΄Ͳݱ࣮తͰͳ͍. ɾTarget Encodingͤͣͱ΋ϞσϧଆͰimplicitʹͰ͖Δ͔΋͠Εͳ͍͕, ໌Β͔ʹ ྑ͍ͱΘ͔͍ͬͯΔ΋ͷ͸ϞσϧʹೖΕΔલʹରԠͨ͠ํ͕ྑ͍. Summary