Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sansan×atmaCup#6 Solution

Sansan DSOC
November 16, 2020

Sansan×atmaCup#6 Solution

■イベント 
:Sansan×atmaCup #6 solution発表会
https://sansan.connpass.com/event/193901/

■登壇概要
タイトル:Sansan×atmaCup#6 Solution
発表者: 
柳辺 十武 さん

▼Sansan R&D Twitter
https://twitter.com/SansanRandD

Sansan DSOC

November 16, 2020
Tweet

More Decks by Sansan DSOC

Other Decks in Technology

Transcript

  1. Self-Introduction n ༄ล े෢ʢTom Yanabeʣ  T0m n ܚጯٛक़େֶେֶӃ ৘ใ޻ֶઐम

    . n DeNAͱPKSHATechnologyͰΠϯλʔϯ n @ryuhenjubunomac n kaggle Expert n silver×2, TReNDS / Walmart Accuracy n 趣味 n サーフィン・キャンプ
  2. Feature Engineering / Target Feature n ަ׵ઌاۀͷۀछׂ߹ n ͲΜͳۀछ͔Β΋Β͔ͬͨ౉͔ͨ͠ n

    ӈਤʣ n ͋Δۀछ͕ަ׵͢Δۀछׂ߹ n ަ׵ઌاۀͷަ׵ઌاۀͷۀछׂ߹ n ަ׵૬ख͕ͲΜͳۀछͱަ׵ͯ͠Δ͔ n ӈਤʣ n ͋Δۀछ͕ަ׵͢Δاۀͷަ׵͢Δ n ۀछׂ߹
  3. Feature Engineering / Target Feature n 'PME͝ͱʹ࡞੒Ͱ͸৘ใଛࣦ͕େ͖͍ͷͰ n ஸೡʹϦʔΫཁҼΛফ͠ͳ͕Β࡞੒ …

    rank 1 2 3 4 … corporation A corporation A corporation E corporation B corporation S ͔͜͜Β LΛ૿΍͠·͘Δ k=3の例
  4. Feature Engineering / Other n ަ׵ઌͰಉ࣌ʹొ࿥͞ΕΔۀछׂ߹ n ࣌ؒ৘ใʢλΠϛϯάʣ n ֤ۀछ೥ؒͰ࠷΋ަ׵͞ΕΔλΠϛϯάͰͷަ׵ྔ

    n ΠϕϯτͳͲΛ૊ΈࠐΈ͔ͨͬͨ n ֤ۀछަ׵ྔͷଟ͍اۀ୅දOࣾͱͷަ׵ྔ ߹Θͤͯ໿ಛ௃
  5. Model Catboost MLP n CatboostͱMLP(GNNは試したが結局不採用) n ̍ஈ֊໨ͷ༧ଌΛ࢖ͬͯಛ௃ྔΛ࠶࡞੒ Catboost MLP ಛ

    ௃ ྔ ੜ ੒ ಛ ௃ ྔ ੜ ੒ LB: 0.8197 LB: 0.8219 LB: 0.8233 LB: 0.8159 LB9Ґ LB8Ґ
  6. Heterogeneous Graph n ྡ઀ͷྡ઀Nodeʹண໨ ˠ Heterogeneous Graph n ʮHeterogeneous Graph

    Attention NetworkʯΛ࣮૷ n [Paper link: https://arxiv.org/abs/1903.07293] (WWW2019) Կ͔͠ΒͷNodeΛܦ༝ͯ͠ܨ͕ΔNodeΛ༻͍Δ ܨ͕ΓํΛMeta-Pathͱݺͼɺෳ਺ͷMeta-PathͰ֊૚తʹAttention ͰॏΈ͚ͮΛͯ͠࠷ऴతʹू໿͢Δ ࠓճ͸ҎԼΛMeta-Pathͱͯ͠ఆٛʢfrom: ΋Βͬͨɺto: ౉ͨ͠ʣ 1. from - from 2. from - to 3. to - from 4. to - to ܦ༝Node ର৅Node ઀ଓNode
  7. ྡ઀NodeϥϕϧΛ༻͍ΔGNN n पลNodeͷϥϕϧ৘ใ͸ॏཁ n Node ClassificationʹϦʔΫͳ͘ϥϕϧ৘ใΛ૊ΈࠐΈ͍ͨ n self-loopͷԋࢉ࣌ʹదٓࣗ਎ͷlabel৘ใΛফ͢ʢίʔυվมʣ n ֤NodeͷͨΊʹsub-graphΛ࡞ΓɺbatchͰճ͢

    n গ͠޻෉ʢ࠾༻ʣ test valid train_train train_valid train_trainʹϥϕϧ৘ใΛ࣋ͨͤΔ ͦΕҎ֎͸ϥϕϧ෦෼͸Unknownͷٙࣅϥϕϧ train_validͰlossΛܭࢉ͠backward validΛ࢖ͬͯval_lossΛܭࢉ͠early_stopping
  8. w2v EmbeddingΛ༻͍ͨGNN n w2vΛ༻͍ͨcorp_idͷຒΊࠐΈ n corp_idຖʹ૬खاۀΛཏྻ͠w2vͰಘΒΕͨຒΊࠐΈ n ্ҐͷํʑͷSolutionࢀর corpA corpB

    corpC corpD corpE, corpK, corpZ, … corpD, corpS, corpX, … corpN, corpK, corpE, … corpL, corpM, corpV, … … sentences corpA_ver1 30, 50, 100, 300, 600 dim corpA_ver2 取引先企業の上位50の平均
  9. CustomGNN ࣍ ࣍ྡ઀ಛ௃ XWಛ௃ Dropout=0.7 Label embedding Linear BatchNorm Dropout(0.3)

    LeakyReLU ࣍ྡ઀ಛ௃ CustomSAGE Dropout (0.3) LeakyReLU Linear BatchNorm Dropout (0.3) LeakyReLU Dropout=0.7 [0.3, 0.5, 0.3, 0.5, 0.8] [0.4, 0.4, 0.3, 0.2, 0.2] [0.1, 0.2, 0.01, 0.4, 0.3] [0.3, 0.5, 0.3, 0.0, 0.0] [0.4, 0.4, 0.3, 0.2, 0.2] [0.1, 0.2, 0.01, 0.4, 0.3] dglΛΧελϜ label info 4thͷํͷsolutionΛࢀߟʹ ͍͖ͤͯͨͩ͞·ͨ͠
  10. Experiments n ྡ઀ϥϕϧGNNʢCustomGraphSAGEʣ n ΞϯαϯϒϧʢCat, MLP, GNNʣ n PublicLBϕετϞσϧΛબͿͱͯ͠PrivateLBൺֱ Model

    1 Model 2 ίϯϖ։࢝ ίϯϖऴྃ 4PMVUJPOൃදձ 10/31 2020/10/19 11/16 11/04 ௥ࢼ։࢝
  11. Late Submission 1st 2nd 3rd 4th 5th 6th Model 1

    Model 2 0.8241 0.8292 0.8350 0.8365 0.8404 0.8405 0.8422 0.8310 SAGE 0.8365 CustomSAGE 0.8393 1 2