Sansan×atmaCup#6 Solution

Sansan atmaCup #6

Self-Introduction n ༄ล े෢ʢTom Yanabeʣ T0m n ܚጯٛक़େֶେֶӃ ৘ใ޻ֶઐम
. n DeNAͱPKSHATechnologyͰΠϯλʔϯ n @ryuhenjubunomac n kaggle Expert n silver×2, TReNDS / Walmart Accuracy n 趣味 n サーフィン・キャンプ

About Competition Public/Private ׶ಆ৆ ֶੜ࿮ 6th 1st 2nd

Solution

Feature Engineering / Target Feature n ަ׵ઌاۀͷۀछׂ߹ n ͲΜͳۀछ͔Β΋Β͔ͬͨ౉͔ͨ͠ n
ӈਤʣ n ͋Δۀछ͕ަ׵͢Δۀछׂ߹ n ަ׵ઌاۀͷަ׵ઌاۀͷۀछׂ߹ n ަ׵૬ख͕ͲΜͳۀछͱަ׵ͯ͠Δ͔ n ӈਤʣ n ͋Δۀछ͕ަ׵͢Δاۀͷަ׵͢Δ n ۀछׂ߹

Feature Engineering / Target Feature n 'PME͝ͱʹ࡞੒Ͱ͸৘ใଛࣦ͕େ͖͍ͷͰ n ஸೡʹϦʔΫཁҼΛফ͠ͳ͕Β࡞੒ …
rank 1 2 3 4 … corporation A corporation A corporation E corporation B corporation S ͔͜͜Β LΛ૿΍͠·͘Δ k=3の例

Feature Engineering / Other n ަ׵ઌͰಉ࣌ʹొ࿥͞ΕΔۀछׂ߹ n ࣌ؒ৘ใʢλΠϛϯάʣ n ֤ۀछ೥ؒͰ࠷΋ަ׵͞ΕΔλΠϛϯάͰͷަ׵ྔ
n ΠϕϯτͳͲΛ૊ΈࠐΈ͔ͨͬͨ n ֤ۀछަ׵ྔͷଟ͍اۀ୅දOࣾͱͷަ׵ྔ ߹Θͤͯ໿ಛ௃

Model Catboost MLP n CatboostͱMLP（GNNは試したが結局不採用） n ̍ஈ֊໨ͷ༧ଌΛ࢖ͬͯಛ௃ྔΛ࠶࡞੒ Catboost MLP ಛ
௃ ྔ ੜ ੒ ಛ ௃ ྔ ੜ ੒ LB: 0.8197 LB: 0.8219 LB: 0.8233 LB: 0.8159 LB9Ґ LB8Ґ

Model / Feature Importance ަ׵ઌͷަ׵ઌاۀಛ௃ ަ׵اۀಛ௃ 0UIFSಛ௃ ͦͷଞ

Model / MLP Architecture n

What I tried

Heterogeneous Graph n ྡ઀ͷྡ઀Nodeʹண໨ ˠ Heterogeneous Graph n ʮHeterogeneous Graph
Attention NetworkʯΛ࣮૷ n [Paper link: https://arxiv.org/abs/1903.07293] (WWW2019) Կ͔͠ΒͷNodeΛܦ༝ͯ͠ܨ͕ΔNodeΛ༻͍Δ ܨ͕ΓํΛMeta-Pathͱݺͼɺෳ਺ͷMeta-PathͰ֊૚తʹAttention ͰॏΈ͚ͮΛͯ͠࠷ऴతʹू໿͢Δ ࠓճ͸ҎԼΛMeta-Pathͱͯ͠ఆٛʢfrom: ΋Βͬͨɺto: ౉ͨ͠ʣ 1. from - from 2. from - to 3. to - from 4. to - to ܦ༝Node ର৅Node ઀ଓNode

ྡ઀NodeϥϕϧΛ༻͍ΔGNN n पลNodeͷϥϕϧ৘ใ͸ॏཁ n Node ClassificationʹϦʔΫͳ͘ϥϕϧ৘ใΛ૊ΈࠐΈ͍ͨ n self-loopͷԋࢉ࣌ʹదٓࣗ਎ͷlabel৘ใΛফ͢ʢίʔυվมʣ n ֤NodeͷͨΊʹsub-graphΛ࡞ΓɺbatchͰճ͢
n গ͠޻෉ʢ࠾༻ʣ test valid train_train train_valid train_trainʹϥϕϧ৘ใΛ࣋ͨͤΔ ͦΕҎ֎͸ϥϕϧ෦෼͸Unknownͷٙࣅϥϕϧ train_validͰlossΛܭࢉ͠backward validΛ࢖ͬͯval_lossΛܭࢉ͠early_stopping

ྡ઀NodeϥϕϧΛ༻͍ΔGNN（補足） Inductive Representation Learning on Large Graphs, Figure1 [2]

ྡ઀NodeϥϕϧΛ༻͍ΔGNN（補足） Inductive Representation Learning on Large Graphs, Figure1

After Competition

w2v EmbeddingΛ༻͍ͨGNN n w2vΛ༻͍ͨcorp_idͷຒΊࠐΈ n corp_idຖʹ૬खاۀΛཏྻ͠w2vͰಘΒΕͨຒΊࠐΈ n ্ҐͷํʑͷSolutionࢀর corpA corpB
corpC corpD corpE, corpK, corpZ, … corpD, corpS, corpX, … corpN, corpK, corpE, … corpL, corpM, corpV, … … sentences corpA_ver1 30, 50, 100, 300, 600 dim corpA_ver2 取引先企業の上位50の平均

CustomGNN ࣍ ࣍ྡ઀ಛ௃ XWಛ௃ Dropout=0.7 Label embedding Linear BatchNorm Dropout(0.3)
LeakyReLU ࣍ྡ઀ಛ௃ CustomSAGE Dropout (0.3) LeakyReLU Linear BatchNorm Dropout (0.3) LeakyReLU Dropout=0.7 [0.3, 0.5, 0.3, 0.5, 0.8] [0.4, 0.4, 0.3, 0.2, 0.2] [0.1, 0.2, 0.01, 0.4, 0.3] [0.3, 0.5, 0.3, 0.0, 0.0] [0.4, 0.4, 0.3, 0.2, 0.2] [0.1, 0.2, 0.01, 0.4, 0.3] dglΛΧελϜ label info 4thͷํͷsolutionΛࢀߟʹ ͍͖ͤͯͨͩ͞·ͨ͠

Experiments n ྡ઀ϥϕϧGNNʢCustomGraphSAGEʣ n ΞϯαϯϒϧʢCat, MLP, GNNʣ n PublicLBϕετϞσϧΛબͿͱͯ͠PrivateLBൺֱ Model
1 Model 2 ίϯϖ։࢝ ίϯϖऴྃ 4PMVUJPOൃදձ 10/31 2020/10/19 11/16 11/04 ௥ࢼ։࢝

Late Submission 1st 2nd 3rd 4th 5th 6th Model 1
Model 2 0.8241 0.8292 0.8350 0.8365 0.8404 0.8405 0.8422 0.8310 SAGE 0.8365 CustomSAGE 0.8393 1 2

Impression

ײ૝ n dglͱ޲͖߹͏ྑ͍ػձʹͳͬͨ n ϥϕϧ৘ใ૊ΈࠐΈGNN͸ଞϞσϧ΋࣮૷͓͖͍ͯͨ͠ n ࠓճ͸GraphSAGE n GraphAttention͸পͬͨ n
ೋ૚Ҏ্ʹ͢Δ࣌͸ʁ n ௥ࢼͯ͠΋͍͚ͳ͔ͬͨͷ͕৺࢒Γ

ご静聴ありがとうございました

Sansan×atmaCup#6 Solution

Sansan×atmaCup#6 Solution

Sansan DSOC

More Decks by Sansan DSOC

Other Decks in Technology

Featured

Transcript

Sansan atmaCup #6

Self-Introduction n ༄ล े෢ʢTom Yanabeʣ T0m n ܚጯٛक़େֶେֶӃ ৘ใ޻ֶઐम

About Competition Public/Private ׶ಆ৆ ֶੜ࿮ 6th 1st 2nd

Solution

Feature Engineering / Target Feature n ަ׵ઌاۀͷۀछׂ߹ n ͲΜͳۀछ͔Β΋Β͔ͬͨ౉͔ͨ͠ n

Feature Engineering / Target Feature n 'PME͝ͱʹ࡞੒Ͱ͸৘ใଛࣦ͕େ͖͍ͷͰ n ஸೡʹϦʔΫཁҼΛফ͠ͳ͕Β࡞੒ …

Feature Engineering / Other n ަ׵ઌͰಉ࣌ʹొ࿥͞ΕΔۀछׂ߹ n ࣌ؒ৘ใʢλΠϛϯάʣ n ֤ۀछ೥ؒͰ࠷΋ަ׵͞ΕΔλΠϛϯάͰͷަ׵ྔ

Model Catboost MLP n CatboostͱMLP（GNNは試したが結局不採用） n ̍ஈ֊໨ͷ༧ଌΛ࢖ͬͯಛ௃ྔΛ࠶࡞੒ Catboost MLP ಛ

Model / Feature Importance ަ׵ઌͷަ׵ઌاۀಛ௃ ަ׵اۀಛ௃ 0UIFSಛ௃ ͦͷଞ

Model / MLP Architecture n

What I tried

Heterogeneous Graph n ྡ઀ͷྡ઀Nodeʹண໨ ˠ Heterogeneous Graph n ʮHeterogeneous Graph

ྡ઀NodeϥϕϧΛ༻͍ΔGNN n पลNodeͷϥϕϧ৘ใ͸ॏཁ n Node ClassificationʹϦʔΫͳ͘ϥϕϧ৘ใΛ૊ΈࠐΈ͍ͨ n self-loopͷԋࢉ࣌ʹదٓࣗ਎ͷlabel৘ใΛফ͢ʢίʔυվมʣ n ֤NodeͷͨΊʹsub-graphΛ࡞ΓɺbatchͰճ͢

ྡ઀NodeϥϕϧΛ༻͍ΔGNN（補足） Inductive Representation Learning on Large Graphs, Figure1 [2]

ྡ઀NodeϥϕϧΛ༻͍ΔGNN（補足） Inductive Representation Learning on Large Graphs, Figure1

ྡ઀NodeϥϕϧΛ༻͍ΔGNN（補足） Inductive Representation Learning on Large Graphs, Figure1

After Competition

w2v EmbeddingΛ༻͍ͨGNN n w2vΛ༻͍ͨcorp_idͷຒΊࠐΈ n corp_idຖʹ૬खاۀΛཏྻ͠w2vͰಘΒΕͨຒΊࠐΈ n ্ҐͷํʑͷSolutionࢀর corpA corpB

CustomGNN ࣍ ࣍ྡ઀ಛ௃ XWಛ௃ Dropout=0.7 Label embedding Linear BatchNorm Dropout(0.3)

Experiments n ྡ઀ϥϕϧGNNʢCustomGraphSAGEʣ n ΞϯαϯϒϧʢCat, MLP, GNNʣ n PublicLBϕετϞσϧΛબͿͱͯ͠PrivateLBൺֱ Model

Late Submission 1st 2nd 3rd 4th 5th 6th Model 1

Impression

ײ૝ n dglͱ޲͖߹͏ྑ͍ػձʹͳͬͨ n ϥϕϧ৘ใ૊ΈࠐΈGNN͸ଞϞσϧ΋࣮૷͓͖͍ͯͨ͠ n ࠓճ͸GraphSAGE n GraphAttention͸পͬͨ n

ご静聴ありがとうございました