Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Target Encoding はなぜ有効なのか
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Shuhei Goda
November 30, 2019
Technology
11k
12
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Target Encoding はなぜ有効なのか
分析コンペLT会
https://kaggle-friends.connpass.com/event/154881/
Shuhei Goda
November 30, 2019
More Decks by Shuhei Goda
See All by Shuhei Goda
Turing × atmaCup #18 - 1st Place Solution
hakubishin3
0
1.3k
ジョブマッチングサービスにおける相互推薦システムの応用事例と課題
hakubishin3
3
1.2k
とある事業会社にとっての Kaggler の魅力
hakubishin3
9
3.2k
課題の解像度が荒かったことで意図した改善ができなかった話
hakubishin3
3
1.1k
Wantedly におけるマッチング体験を最大化させるための推薦システム
hakubishin3
4
1.4k
Recommendation Industry Talks #1 Opening
hakubishin3
1
470
会社訪問アプリ「Wantedly Visit」での シゴトに関する興味選択機能と推薦改善
hakubishin3
0
770
論文紹介: Improving Implicit Feedback-Based Recommendation through Multi-Behavior Alignment(Xin Xin et al., 2023)
hakubishin3
0
720
Feedback Prize - English Language Learning における擬似ラベルの品質向上の取り組み
hakubishin3
1
1.1k
Other Decks in Technology
See All in Technology
RSA暗号を手計算したくなること、ありますよね?? (20260615_orestudy6_rsa)
thousanda
0
410
エラーバジェットのアラートのタイミングを考える.pdf
kairim0
0
150
2026TECHFRESH畢業分享會 - 葬送的通靈師:化系統與用戶雜訊成行動訊號
line_developers_tw
PRO
0
1k
手塩にかけりゃいいってもんじゃない
ming_ayami
0
570
現地で盛り上がった WWDC26 Keynote
zozotech
PRO
1
250
Disciplined Vibes: Scaling AI-Assisted Engineering
sheharyar
0
140
フロンティアAIのゲート化と地政学リスク
nagatsu
0
140
LayerXにおけるセキュリティ管理の現在地と次の一手
tosho
0
180
AIネイティブな開発のサプライチェーンリスク対策 〜激動の開発現場でリスクに立ち向かう〜【ZennFes】
cscengineer
PRO
2
120
200個のGitHubリポジトリを横断調査したかった
icck
0
130
Oracle AI Database@Azure:サービス概要のご紹介
oracle4engineer
PRO
6
2k
FDE という解 ― 暗黙知と明示知をつなぐ、伴走型エンジニアリング ―
otanet
0
160
Featured
See All Featured
The Mindset for Success: Future Career Progression
greggifford
PRO
0
360
The browser strikes back
jonoalderson
0
1.2k
Crafting Experiences
bethany
1
180
Bash Introduction
62gerente
615
220k
Practical Orchestrator
shlominoach
191
11k
[SF Ruby Conf 2025] Rails X
palkan
2
1.1k
WCS-LA-2024
lcolladotor
0
630
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
2
570
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
160
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
480
The Art of Programming - Codeland 2020
erikaheidi
57
14k
Transcript
©2019 Wantedly, Inc. Target Encodingͳͥ༗ޮͳͷ͔ ੳίϯϖLTձ Nov 30, 2019 -
Shuhei Goda - @jy_msc
©2019 Wantedly, Inc. Self-Introduction •Shuhei Godaʢ߹ా पฏʣ •Wantedly, Inc. (since
Sep 2019) •Recommendation Team https://www.wantedly.com/projects/375150 Kaggle Master hakubishinͱ͍͏໊લͰ twitter͍ͬͯ·͢ @jy_msc We are hiring !
©2019 Wantedly, Inc. ɾTarget Encodingͳͥ༗ޮͳͷ͔ ɾKaggleͰͷఆ൪ख๏ͷ1ͭ ɾLabel EncodingͰͳ͘Target EncodingΛͨ͠ํ͕ྑ͍߹͕͋Δ ɾͳͥTarget
Encoding͕ྑ͍݁ՌΛग़͢ͷ͔, ͦͷཧ༝Λઆ໌͍ͯ͠Δࢿྉ͕͋ ·ΓݟͨΒͳ͍ ɾTarget Encoding͕༗ޮͰ͋Δཧ༝ʹ͍ͭͯ, ࣗͳΓͷղऍΛհ About Talk
©2019 Wantedly, Inc. ɾతมΛ༻͍ͯΧςΰϦมΛʹม͢Δख๏ ɾΧςΰϦมΛ֤ਫ४ʹ͓͚ΔతมͷظͰஔ͢Δ ɾҰൠతʹ, ਫ४͕ଟ͍΄Ͳߴ͍ޮՌ͕ظ͞ΕΔ Target Encodingͱ Target
EncodingΛѻ͏্Ͱͷҙ࣮ํ๏ KaggleຊͰ֬ೝ͍ͯͩ͘͠͞ !
©2019 Wantedly, Inc. ɾϞσϧΛ୯७Խͤ͞ΔΑ͏ͳޮՌΛ࣋ͭ ɹɹɾҎ߱, GBDTΛྫʹߟ͑ͯΈΔ ͳͥ༗ޮͳͷ͔
©2019 Wantedly, Inc. ɾҎԼͷΑ͏ͳσʔλΛͬͯઆ໌͢Δ ɹɹɾతม y ࿈ଓ ɹɹɾઆ໌ม x ਫ४4ͷΧςΰϦม
x = {A, B, C, D} ɹɹɹɾE[y|x=A]=60, E[y|x=B]=20, E[y|x=C]=50, E[y|x=D]=10 ༻͢Δαϯϓϧσʔλ
©2019 Wantedly, Inc. GBDTͷ෮श σʔληοτ: Ճ๏Ϟσϧ: ଛࣦؔ: mຊͷͷ༿ͷweight, ͷ༿ͷ, ͷΛද͢
D = {(xi , yi )}n i=1 (xi ∈ Rm, yi ∈ R) ̂ yi = ΣM m=1 fm (xi ) = ΣM m=1 wm (xi ) L = Σn i=1 l( ̂ yi , yi ) + ΣM m=1 Ω(fm ) (Ω(f ) = γT + 1 2 λ∥w∥2) wm (x) T M
©2019 Wantedly, Inc. GBDTͷ෮श ͕mຊͷ࣌ͷଛࣦؔ: , j൪ͷ༿ʹׂΓͯΒΕͨσʔλू߹ , m-1ຊ·Ͱͷ༧ଌ݁ՌʹΑΔҰ֊ඍͱೋ֊ඍ gradient:
, hessian: L(m) = Σn i=1 l(yi , ̂ yi + fm (xi )) + Ω(fm ) ≃ Σn i=1 [gi fm (xi ) + 1 2 hi fm (xi )] + γT + 1 2 λΣT j=1 w2 j = ΣT j=1 [(Σi∈Ij gi )wj + 1 2 (Σi∈Ij hj + λ)w2 j + γT Ij gi , hi gi = ∂l(yi , ̂ y(m−1) i ) ∂ ̂ y(m−1) i hi = ∂2l(yi , ̂ y(m−1) i ) (∂ ̂ y(m−1) i )2
©2019 Wantedly, Inc. GBDTͷ෮श αϯϓϧׂ͕ΓৼΒΕͨ࣌ͷ༿ͷ࠷దͳweight Ͱ͋Γ, ͦͷ࣌ͷଛࣦ αϯϓϧΛׂͨ࣌͠ͷଛࣦͷݮΓํΛΈͯ, nodeຖʹ࠷దͳׂΛ୳͍ͯ͘͠ gain:
w* j = − Σi∈Ij gi Σi∈Ij hi L(m) = − 1 2 ΣT j=1 (Σi∈Ij gi )2 Σi∈Ij hj + λ + γT Lbef − (Laf,left + Laf,right ) " # $ % $ % " # Lbef Laf,left Laf,right gain (ׂલޙͷlossͷࠩ) ͕ େ͖͍΄Ͳྑׂ͍
©2019 Wantedly, Inc. GBDTͷ෮श ଛࣦ͕ؔ MSE ͷ߹ ଛࣦؔ: gradient: ,
hessian: ΑΓ ༿ j ͷ weight , ༿ j ʹׂΓͯΒΕͨαϯϓϧͷࠩฏۉͱͳΔ l(yi , ̂ yi ) = 1 2 (yi − ̂ yi )2 gi = ∂l(yi , ̂ y(m−1) i ) ∂ ̂ y(m−1) i = ̂ y(m−1) i − yi hi = ∂2l(yi , ̂ y(m−1) i ) (∂ ̂ y(m−1) i )2 = 1 w* j = − Σi∈Ij gi Σi∈Ij hi = − Σi∈Ij ( ̂ y(m−1) i − yi ) Σi∈Ij 1 ࠩ(ਅ - m-1ຊ࣌ͷ༧ଌ)ͷ૯ αϯϓϧͷ
©2019 Wantedly, Inc. GBDTͷઃఆ ɾγϯϓϧͳϞσϧͰߟ͑ͯΈΔ. ɹɾloss_func = ‘MAE' ɹɾeta =
1 → εςοϓαΠζ ɹɾiteration = 1 → ࠷ॳͷ͚ͩߟ͑Δ ɹɾtree_method = ‘exact’ → ۪ʹશ୳ࡧ ɹɾbase_score = 0 → ॳظ0ελʔτ ɹɾlambda = 0 ɹɾgamma = 0
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ ɾΧςΰϦมΛΞϧϑΝϕοτॱʹLabel Encoding ɾಛྔͷେ͖͞ͰαϯϓϧΛιʔτ͢Δ ൵͍͠άϥϑʜ ιʔτ
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=1) w* left w* left
w* left w* right w* right w* right L1 = − 48797 L2 = − 56913 L2 = − 49783 L2 = − 57093 L2,left = − 35522 L2,right = − 21391 L2,left = − 31832 L2,right = − 17951 L2,left = − 56097 L2,right = − 996 " # $ %
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=1) " # $ %
w* left w* left w* left w* right w* right w* right L1 = − 48797 L2 = − 56913 L2 = − 49783 L2 = − 57093 L2,left = − 35522 L2,right = − 21391 L2,left = − 31832 L2,right = − 17951 ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏ L2,left = − 56097 L2,right = − 996
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=2) L2 = − 56097
L3 = − 60111 L3 = − 56769 w* left w* right w* left w* right L3,left = − 35522 L3,right = − 24589 L3,left = − 31832 L3,right = − 24937 " # $ % % " # $ L1 = − 48797 L2,left = − 56097 L2,right = − 996
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=2) L2 = − 56097
" # $ % % " # $ L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3 = − 60111 L3 = − 56769 w* left w* right w* left w* right L3,left = − 35522 L3,right = − 24589 L3,left = − 31832 L3,right = − 24937 ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=3) L3 = − 24589
L4 = − 29013 w* left w* right L4,left = − 4076 L4,right = − 24937 " # $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=3) L3 = − 24589
L4 = − 29013 w* left w* right L4,left = − 4076 L4,right = − 24937 " # $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=3) " # $ %
% " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 $ # L4,left = − 4076 L4,right = − 24937 ׂऴΘΓ
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ ɾΧςΰϦมΛTarget Encoding ɾಛྔͷେ͖͞ͰαϯϓϧΛιʔτ͢Δ ιʔτ
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ (depth=1) L1 = − 48797
L2,left = − 996 L2,right = − 56097 L2,left = − 4551 L2,right = − 59992 L2,left = − 21391 L2,right = − 35522 w* left w* right w* left w* right w* left w* right L2 = − 57093 L2 = − 64543 L2 = − 56913 " # $ %
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ (depth=1) L1 = − 48797
L2,left = − 996 L2,right = − 56097 L2,left = − 4551 L2,right = − 59992 L2,left = − 21391 L2,right = − 35522 w* left w* right w* left w* right w* left w* right ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏ L2 = − 57093 L2 = − 64543 L2 = − 56913 " # $ %
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ (depth=2) " # $
% " $ # % L2,left = − 4551 L1 = − 48797 L2,left = − 4551 L2,right = − 59992 L2,right = − 59992 w* right w* left L′ 3,left = − 24937 L′ 3,right = − 35522 L3 = − 60459 w* right w* left L3,left = − 996 L3,right = − 4076 L3 = − 5072
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ (depth=2) " # $
% " $ # % L2,left = − 4551 L1 = − 48797 L2,left = − 4551 L2,right = − 59992 L2,right = − 59992 w* right w* left L′ 3,left = − 24937 L′ 3,right = − 35522 L3 = − 60459 w* right w* left L3,left = − 996 L3,right = − 4076 L3 = − 5072 ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏ ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ (depth=2) " # $ %
" $ # % L1 = − 48797 L2,left = − 4551 L2,right = − 59992 # % " $ L′ 3,left = − 24937 L′ 3,right = − 35522 L3,left = − 996 L3,right = − 4076 ׂऴΘΓ
©2019 Wantedly, Inc. Label Encoding ͱ Target Encoding ͷൺֱ "
# $ % " $ # % L1 = − 48797 L2,left = − 4551 L2,right = − 59992 # % " $ L′ 3,left = − 24937 L′ 3,right = − 35522 L3,left = − 996 L3,right = − 4076 " # $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 $ # L4,left = − 4076 L4,right = − 24937 Label EncodingͰ࡞ͬͨߏ Target EncodingͰ࡞ͬͨߏ
©2019 Wantedly, Inc. (͔ͳΓዞҙతͳྫͰ͕ͨ͠) Target Encodingͷํ͕গ͠ޮྑͦ͞͏͡Όͳ͍Ͱ͔͢ʁ
©2019 Wantedly, Inc. Target EncodingԿΛͯ͘͠Ε͍ͯΔͷ͔ ɾߏΛΑΓγϯϓϧʹͳΔ ɾଛࣦ͕ؔMSEͰ࢝ΊͷํͷiterationͰ, ࠩ(gradient) ͷେ͖͕͞ ͍ۙਫ४ಉ࢜ΛΑΓ͍ۙҐஔʹஔ͢ΔΑ͏ͳޮՌΛ࣋ͭ.
→ׂ͞Εͨαϯϓϧ܈, ͦΕͧΕൺֱత͍ۙࠩΛ࣋ͭͷͰֶशޮ ͕ྑ͍
©2019 Wantedly, Inc. ΑΓਫ४͕૿͍͑ͯ͘ͱ ɾTarget EncodingͷޮՌਫ४͕૿͑Δ΄Ͳ࣮ײ͍͢͠ ɾࣄલʹ, ࠩͷେ͖͞ͰΧςΰϦΛιʔτͨ͠ํׂ͕ͷޮ͕ྑ͍.
©2019 Wantedly, Inc. ΑΓਫ४͕૿͍͑ͯ͘ͱ ɾTarget EncodingͷޮՌਫ४͕૿͑Δ΄Ͳ࣮ײ͍͢͠ ɾࣄલʹ, ࠩͷେ͖͞ͰΧςΰϦΛιʔτͨ͠ํׂ͕ͷޮ͕ྑ͍. w* right
w* left w* right w* left
©2019 Wantedly, Inc. શͯͷਫ४Λׂ͠Δ·Ͱʹඞཁͳਂ͞ ɾTarget Encodingͷํ͕ਂ͕͞ઙ͍, ΑΓߏ͕γϯϓϧʹ ɾҎԼਫ४100ͷΧςΰϦมΛׂͯ͠Έͨ࣌ͷߏ Label Encoding
Target Encoding
©2019 Wantedly, Inc. ֤ਂ࣌͞Ͱͷlossͷݮগྔ ɾTarget Encodingͷํ͕ޮతʹlossΛݮগ͍ͤͯ͞Δ ɾਫ४͕ଟ͍΄Ͳ, Label Encodingͱͷ͕ࠩେ͖͘ͳ͍ͬͯ͘.
©2019 Wantedly, Inc. ਂ͞ / iteration Λ૿͍͚ͯ͠Ϟσϧ͕ྑ͠ͳʹͯ͘͠ΕΔΜ͡Όͳ͍ʁ ɾ໌Β͔ʹྑ͍ͱΘ͔͍ͬͯΔใ໌ࣔతʹϞσϧʹͨ͠ํ͕ྑ͍ ɾLabel EncodingͰԿͱ͔ͯ͘͠ΕΔ͔͠Εͳ͍͕,
Ϟσϧ͕ෳࡶʹ ͳΓ͍͢. ਫ४͕૿͍͑ͯ͘΄Ͳ, ͦΕݱ࣮తͰͳ͍. ɾܦݧ্, ໌Β͔ʹޮ͘ͱ͔͍ͬͯΔͷֶशͷલஈ֊ͰରԠͨ͠ํ ͕ྑ͍. ɾಛྔͷinteractionͱಉ͡
©2019 Wantedly, Inc. ɾTarget EncodingʹΑͬͯ, Ϟσϧ͕ΑΓγϯϓϧʹͳΔ ɾଛࣦ͕ؔMSEͰ࢝ΊͷํͷiterationͰ, ࠩͷେ͖͍ॱʹιʔτ͢Δ͜ͱ ͰޮతͳׂΛ࣮ݱ͢Δ͜ͱ͕Ͱ͖Δ. ɾਫ४͕૿͑Δ΄Ͳ,
Target EncodingͷޮՌ͕େ͖͘ͳΔ ɾLabel encodingͰTarget encodingͱಉͷ͜ͱΛΔͨΊʹ͋Δఔͷਂ͞ ͕ඞཁͰ, ͦΕਫ४͕૿͑Δ΄Ͳݱ࣮తͰͳ͍. ɾTarget EncodingͤͣͱϞσϧଆͰimplicitʹͰ͖Δ͔͠Εͳ͍͕, ໌Β͔ʹ ྑ͍ͱΘ͔͍ͬͯΔͷϞσϧʹೖΕΔલʹରԠͨ͠ํ͕ྑ͍. Summary