Sansan×atmaCup#6 Solution

by Sansan DSOC

Slide 1

Slide 1 text

Sansan atmaCup #6

Slide 2

Slide 2 text

Self-Introduction n ༄ล े෢ʢTom Yanabeʣ T0m n ܚጯٛक़େֶେֶӃ ৘ใ޻ֶઐम . n DeNAͱPKSHATechnologyͰΠϯλʔϯ n @ryuhenjubunomac n kaggle Expert n silver×2, TReNDS / Walmart Accuracy n 趣味 n サーフィン・キャンプ

Slide 3

Slide 3 text

About Competition Public/Private ׶ಆ৆ ֶੜ࿮ 6th 1st 2nd

Slide 4

Slide 4 text

Solution

Slide 5

Slide 5 text

Feature Engineering / Target Feature n ަ׵ઌاۀͷۀछׂ߹ n ͲΜͳۀछ͔Β΋Β͔ͬͨ౉͔ͨ͠ n ӈਤʣ n ͋Δۀछ͕ަ׵͢Δۀछׂ߹ n ަ׵ઌاۀͷަ׵ઌاۀͷۀछׂ߹ n ަ׵૬ख͕ͲΜͳۀछͱަ׵ͯ͠Δ͔ n ӈਤʣ n ͋Δۀछ͕ަ׵͢Δاۀͷަ׵͢Δ n ۀछׂ߹

Slide 6

Slide 6 text

Feature Engineering / Target Feature n 'PME͝ͱʹ࡞੒Ͱ͸৘ใଛࣦ͕େ͖͍ͷͰ n ஸೡʹϦʔΫཁҼΛফ͠ͳ͕Β࡞੒ … rank 1 2 3 4 … corporation A corporation A corporation E corporation B corporation S ͔͜͜Β LΛ૿΍͠·͘Δ k=3の例

Slide 7

Slide 7 text

Feature Engineering / Other n ަ׵ઌͰಉ࣌ʹొ࿥͞ΕΔۀछׂ߹ n ࣌ؒ৘ใʢλΠϛϯάʣ n ֤ۀछ೥ؒͰ࠷΋ަ׵͞ΕΔλΠϛϯάͰͷަ׵ྔ n ΠϕϯτͳͲΛ૊ΈࠐΈ͔ͨͬͨ n ֤ۀछަ׵ྔͷଟ͍اۀ୅දOࣾͱͷަ׵ྔ ߹Θͤͯ໿ಛ௃

Slide 8

Slide 8 text

Model Catboost MLP n CatboostͱMLP（GNNは試したが結局不採用） n ̍ஈ֊໨ͷ༧ଌΛ࢖ͬͯಛ௃ྔΛ࠶࡞੒ Catboost MLP ಛ ௃ ྔ ੜ ੒ ಛ ௃ ྔ ੜ ੒ LB: 0.8197 LB: 0.8219 LB: 0.8233 LB: 0.8159 LB9Ґ LB8Ґ

Slide 9

Slide 9 text

Model / Feature Importance ަ׵ઌͷަ׵ઌاۀಛ௃ ަ׵اۀಛ௃ 0UIFSಛ௃ ͦͷଞ

Slide 10

Slide 10 text

Model / MLP Architecture n

Slide 11

Slide 11 text

What I tried

Slide 12

Slide 12 text

Heterogeneous Graph n ྡ઀ͷྡ઀Nodeʹண໨ ˠ Heterogeneous Graph n ʮHeterogeneous Graph Attention NetworkʯΛ࣮૷ n [Paper link: https://arxiv.org/abs/1903.07293] (WWW2019) Կ͔͠ΒͷNodeΛܦ༝ͯ͠ܨ͕ΔNodeΛ༻͍Δ ܨ͕ΓํΛMeta-Pathͱݺͼɺෳ਺ͷMeta-PathͰ֊૚తʹAttention ͰॏΈ͚ͮΛͯ͠࠷ऴతʹू໿͢Δ ࠓճ͸ҎԼΛMeta-Pathͱͯ͠ఆٛʢfrom: ΋Βͬͨɺto: ౉ͨ͠ʣ 1. from - from 2. from - to 3. to - from 4. to - to ܦ༝Node ର৅Node ઀ଓNode

Slide 13

Slide 13 text

ྡ઀NodeϥϕϧΛ༻͍ΔGNN n पลNodeͷϥϕϧ৘ใ͸ॏཁ n Node ClassificationʹϦʔΫͳ͘ϥϕϧ৘ใΛ૊ΈࠐΈ͍ͨ n self-loopͷԋࢉ࣌ʹదٓࣗ਎ͷlabel৘ใΛফ͢ʢίʔυվมʣ n ֤NodeͷͨΊʹsub-graphΛ࡞ΓɺbatchͰճ͢ n গ͠޻෉ʢ࠾༻ʣ test valid train_train train_valid train_trainʹϥϕϧ৘ใΛ࣋ͨͤΔ ͦΕҎ֎͸ϥϕϧ෦෼͸Unknownͷٙࣅϥϕϧ train_validͰlossΛܭࢉ͠backward validΛ࢖ͬͯval_lossΛܭࢉ͠early_stopping

Slide 14

Slide 14 text

ྡ઀NodeϥϕϧΛ༻͍ΔGNN（補足） Inductive Representation Learning on Large Graphs, Figure1 [2]

Slide 15

Slide 15 text

ྡ઀NodeϥϕϧΛ༻͍ΔGNN（補足） Inductive Representation Learning on Large Graphs, Figure1

Slide 16

Slide 16 text

ྡ઀NodeϥϕϧΛ༻͍ΔGNN（補足） Inductive Representation Learning on Large Graphs, Figure1

Slide 17

Slide 17 text

After Competition

Slide 18

Slide 18 text

w2v EmbeddingΛ༻͍ͨGNN n w2vΛ༻͍ͨcorp_idͷຒΊࠐΈ n corp_idຖʹ૬खاۀΛཏྻ͠w2vͰಘΒΕͨຒΊࠐΈ n ্ҐͷํʑͷSolutionࢀর corpA corpB corpC corpD corpE, corpK, corpZ, … corpD, corpS, corpX, … corpN, corpK, corpE, … corpL, corpM, corpV, … … sentences corpA_ver1 30, 50, 100, 300, 600 dim corpA_ver2 取引先企業の上位50の平均

Slide 19

Slide 19 text

CustomGNN ࣍ ࣍ྡ઀ಛ௃ XWಛ௃ Dropout=0.7 Label embedding Linear BatchNorm Dropout(0.3) LeakyReLU ࣍ྡ઀ಛ௃ CustomSAGE Dropout (0.3) LeakyReLU Linear BatchNorm Dropout (0.3) LeakyReLU Dropout=0.7 [0.3, 0.5, 0.3, 0.5, 0.8] [0.4, 0.4, 0.3, 0.2, 0.2] [0.1, 0.2, 0.01, 0.4, 0.3] [0.3, 0.5, 0.3, 0.0, 0.0] [0.4, 0.4, 0.3, 0.2, 0.2] [0.1, 0.2, 0.01, 0.4, 0.3] dglΛΧελϜ label info 4thͷํͷsolutionΛࢀߟʹ ͍͖ͤͯͨͩ͞·ͨ͠

Slide 20

Slide 20 text

Experiments n ྡ઀ϥϕϧGNNʢCustomGraphSAGEʣ n ΞϯαϯϒϧʢCat, MLP, GNNʣ n PublicLBϕετϞσϧΛબͿͱͯ͠PrivateLBൺֱ Model 1 Model 2 ίϯϖ։࢝ ίϯϖऴྃ 4PMVUJPOൃදձ 10/31 2020/10/19 11/16 11/04 ௥ࢼ։࢝

Slide 21

Slide 21 text

Late Submission 1st 2nd 3rd 4th 5th 6th Model 1 Model 2 0.8241 0.8292 0.8350 0.8365 0.8404 0.8405 0.8422 0.8310 SAGE 0.8365 CustomSAGE 0.8393 1 2

Slide 22

Slide 22 text

Impression

Slide 23

Slide 23 text

ײ૝ n dglͱ޲͖߹͏ྑ͍ػձʹͳͬͨ n ϥϕϧ৘ใ૊ΈࠐΈGNN͸ଞϞσϧ΋࣮૷͓͖͍ͯͨ͠ n ࠓճ͸GraphSAGE n GraphAttention͸পͬͨ n ೋ૚Ҏ্ʹ͢Δ࣌͸ʁ n ௥ࢼͯ͠΋͍͚ͳ͔ͬͨͷ͕৺࢒Γ

Slide 24

Slide 24 text

ご静聴ありがとうございました