Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sansan×atmaCup#6 Solution

A2cac4b3dcb2bc0b87917ddc034ef708?s=47 Sansan DSOC
November 16, 2020

Sansan×atmaCup#6 Solution

■イベント 
:Sansan×atmaCup #6 solution発表会
https://sansan.connpass.com/event/193901/

■登壇概要
タイトル:Sansan×atmaCup#6 Solution
発表者: 
柳辺 十武 さん

Sansan DSOC
▼Website
https://sansan-dsoc.com/
▼Twitter
https://twitter.com/SansanDSOC

A2cac4b3dcb2bc0b87917ddc034ef708?s=128

Sansan DSOC

November 16, 2020
Tweet

Transcript

  1. Sansan atmaCup #6

  2. Self-Introduction n ༄ล े෢ʢTom Yanabeʣ  T0m n ܚጯٛक़େֶେֶӃ ৘ใ޻ֶઐम

    . n DeNAͱPKSHATechnologyͰΠϯλʔϯ n @ryuhenjubunomac n kaggle Expert n silver×2, TReNDS / Walmart Accuracy n 趣味 n サーフィン・キャンプ
  3. About Competition Public/Private ׶ಆ৆ ֶੜ࿮ 6th 1st 2nd

  4. Solution

  5. Feature Engineering / Target Feature n ަ׵ઌاۀͷۀछׂ߹ n ͲΜͳۀछ͔Β΋Β͔ͬͨ౉͔ͨ͠ n

    ӈਤʣ n ͋Δۀछ͕ަ׵͢Δۀछׂ߹ n ަ׵ઌاۀͷަ׵ઌاۀͷۀछׂ߹ n ަ׵૬ख͕ͲΜͳۀछͱަ׵ͯ͠Δ͔ n ӈਤʣ n ͋Δۀछ͕ަ׵͢Δاۀͷަ׵͢Δ n ۀछׂ߹
  6. Feature Engineering / Target Feature n 'PME͝ͱʹ࡞੒Ͱ͸৘ใଛࣦ͕େ͖͍ͷͰ n ஸೡʹϦʔΫཁҼΛফ͠ͳ͕Β࡞੒ …

    rank 1 2 3 4 … corporation A corporation A corporation E corporation B corporation S ͔͜͜Β LΛ૿΍͠·͘Δ k=3の例
  7. Feature Engineering / Other n ަ׵ઌͰಉ࣌ʹొ࿥͞ΕΔۀछׂ߹ n ࣌ؒ৘ใʢλΠϛϯάʣ n ֤ۀछ೥ؒͰ࠷΋ަ׵͞ΕΔλΠϛϯάͰͷަ׵ྔ

    n ΠϕϯτͳͲΛ૊ΈࠐΈ͔ͨͬͨ n ֤ۀछަ׵ྔͷଟ͍اۀ୅දOࣾͱͷަ׵ྔ ߹Θͤͯ໿ಛ௃
  8. Model Catboost MLP n CatboostͱMLP(GNNは試したが結局不採用) n ̍ஈ֊໨ͷ༧ଌΛ࢖ͬͯಛ௃ྔΛ࠶࡞੒ Catboost MLP ಛ

    ௃ ྔ ੜ ੒ ಛ ௃ ྔ ੜ ੒ LB: 0.8197 LB: 0.8219 LB: 0.8233 LB: 0.8159 LB9Ґ LB8Ґ
  9. Model / Feature Importance ަ׵ઌͷަ׵ઌاۀಛ௃ ަ׵اۀಛ௃ 0UIFSಛ௃ ͦͷଞ

  10. Model / MLP Architecture n

  11. What I tried

  12. Heterogeneous Graph n ྡ઀ͷྡ઀Nodeʹண໨ ˠ Heterogeneous Graph n ʮHeterogeneous Graph

    Attention NetworkʯΛ࣮૷ n [Paper link: https://arxiv.org/abs/1903.07293] (WWW2019) Կ͔͠ΒͷNodeΛܦ༝ͯ͠ܨ͕ΔNodeΛ༻͍Δ ܨ͕ΓํΛMeta-Pathͱݺͼɺෳ਺ͷMeta-PathͰ֊૚తʹAttention ͰॏΈ͚ͮΛͯ͠࠷ऴతʹू໿͢Δ ࠓճ͸ҎԼΛMeta-Pathͱͯ͠ఆٛʢfrom: ΋Βͬͨɺto: ౉ͨ͠ʣ 1. from - from 2. from - to 3. to - from 4. to - to ܦ༝Node ର৅Node ઀ଓNode
  13. ྡ઀NodeϥϕϧΛ༻͍ΔGNN n पลNodeͷϥϕϧ৘ใ͸ॏཁ n Node ClassificationʹϦʔΫͳ͘ϥϕϧ৘ใΛ૊ΈࠐΈ͍ͨ n self-loopͷԋࢉ࣌ʹదٓࣗ਎ͷlabel৘ใΛফ͢ʢίʔυվมʣ n ֤NodeͷͨΊʹsub-graphΛ࡞ΓɺbatchͰճ͢

    n গ͠޻෉ʢ࠾༻ʣ test valid train_train train_valid train_trainʹϥϕϧ৘ใΛ࣋ͨͤΔ ͦΕҎ֎͸ϥϕϧ෦෼͸Unknownͷٙࣅϥϕϧ train_validͰlossΛܭࢉ͠backward validΛ࢖ͬͯval_lossΛܭࢉ͠early_stopping
  14. ྡ઀NodeϥϕϧΛ༻͍ΔGNN(補足) Inductive Representation Learning on Large Graphs, Figure1 [2]

  15. ྡ઀NodeϥϕϧΛ༻͍ΔGNN(補足) Inductive Representation Learning on Large Graphs, Figure1

  16. ྡ઀NodeϥϕϧΛ༻͍ΔGNN(補足) Inductive Representation Learning on Large Graphs, Figure1

  17. After Competition

  18. w2v EmbeddingΛ༻͍ͨGNN n w2vΛ༻͍ͨcorp_idͷຒΊࠐΈ n corp_idຖʹ૬खاۀΛཏྻ͠w2vͰಘΒΕͨຒΊࠐΈ n ্ҐͷํʑͷSolutionࢀর corpA corpB

    corpC corpD corpE, corpK, corpZ, … corpD, corpS, corpX, … corpN, corpK, corpE, … corpL, corpM, corpV, … … sentences corpA_ver1 30, 50, 100, 300, 600 dim corpA_ver2 取引先企業の上位50の平均
  19. CustomGNN ࣍ ࣍ྡ઀ಛ௃ XWಛ௃ Dropout=0.7 Label embedding Linear BatchNorm Dropout(0.3)

    LeakyReLU ࣍ྡ઀ಛ௃ CustomSAGE Dropout (0.3) LeakyReLU Linear BatchNorm Dropout (0.3) LeakyReLU Dropout=0.7 [0.3, 0.5, 0.3, 0.5, 0.8] [0.4, 0.4, 0.3, 0.2, 0.2] [0.1, 0.2, 0.01, 0.4, 0.3] [0.3, 0.5, 0.3, 0.0, 0.0] [0.4, 0.4, 0.3, 0.2, 0.2] [0.1, 0.2, 0.01, 0.4, 0.3] dglΛΧελϜ label info 4thͷํͷsolutionΛࢀߟʹ ͍͖ͤͯͨͩ͞·ͨ͠
  20. Experiments n ྡ઀ϥϕϧGNNʢCustomGraphSAGEʣ n ΞϯαϯϒϧʢCat, MLP, GNNʣ n PublicLBϕετϞσϧΛબͿͱͯ͠PrivateLBൺֱ Model

    1 Model 2 ίϯϖ։࢝ ίϯϖऴྃ 4PMVUJPOൃදձ 10/31 2020/10/19 11/16 11/04 ௥ࢼ։࢝
  21. Late Submission 1st 2nd 3rd 4th 5th 6th Model 1

    Model 2 0.8241 0.8292 0.8350 0.8365 0.8404 0.8405 0.8422 0.8310 SAGE 0.8365 CustomSAGE 0.8393 1 2
  22. Impression

  23. ײ૝ n dglͱ޲͖߹͏ྑ͍ػձʹͳͬͨ n ϥϕϧ৘ใ૊ΈࠐΈGNN͸ଞϞσϧ΋࣮૷͓͖͍ͯͨ͠ n ࠓճ͸GraphSAGE n GraphAttention͸পͬͨ n

    ೋ૚Ҏ্ʹ͢Δ࣌͸ʁ n ௥ࢼͯ͠΋͍͚ͳ͔ͬͨͷ͕৺࢒Γ
  24. ご静聴ありがとうございました