Kaggle Google Quest Q&A Labeling - 23th place solution

99e9e6d2de62c373990ac1bd7c4defc5?s=47 Shuhei Goda
February 28, 2020

Kaggle Google Quest Q&A Labeling - 23th place solution

99e9e6d2de62c373990ac1bd7c4defc5?s=128

Shuhei Goda

February 28, 2020
Tweet

Transcript

  1. 1.

    ©2020 Wantedly, Inc. 23th place solution Kaggle Google Quest Q&A

    Labeling ൓লձ Feb 28, 2020 - Shuhei Goda - @jy_msc
  2. 2.

    ©2020 Wantedly, Inc. Team - The Hand Shuhei Goda @jy_msc

    Visit Engineering Team at Wantedly Naomichi Agata @agatan_ People Engineering Team at Wantedly
  3. 3.

    ©2020 Wantedly, Inc. Model Pipeline #FSUCBTF VODBTFE -JHIU(#. #FSUCBTF VODBTFE

    Settings ɾ3fold with GroupKFold ɾBCE + margin ranking loss ɾ3epoch Settings ɾmax_depth=1 ɾlr=0.1 Meta features ɾtext length ɾstackexchange Text data ɾquestion_title ɾquestion_body ɾanswer 1SF1SPDFTT
 2BOE" 1SF1SPDFTT
 POMZ2 ɾquestion_title ɾquestion_body ɾquestion_title ɾquestion_body ɾanswer Settings ɾhtml escape ɾhead+tail truncation
  4. 5.

    ©2020 Wantedly, Inc. ɾςΩετσʔλͷ݁߹ͱτϦϛϯά ɹɾ[CLS] + question_title + [SEP] +

    question_body + [SEP] + answer ɾquestion_body ͱ answer ͕ࢦఆͷ௕͞Λ௒͑ͨ৔߹, ྆୺͔ΒಉαΠζ෼ΛτϦϛϯά Pre-Process IUUQTBSYJWPSHBCT
  5. 8.

    ©2020 Wantedly, Inc. ɾBCE + margin ranking loss (1 :

    1) ɹɾϛχόονΛ2ͭʹ෼ׂͯ͠ margin ranking loss Λܭࢉ Loss function BCE + margin ranking loss Public: 0.45979, Private: 0.41440 BCE Public: 0.44006, Private: 0.40668
  6. 9.

    ©2020 Wantedly, Inc. ɾQuestion Model ɹɾQ༻ͷλεΫΛQuestion text͚ͩΛ࢖ͬͯղ͘ ɹɾΠϯϓοτ͸Q͚ͩͰ͍͍ͷͰ, Qͷtruncationͷྔ͕ݮΔ (Qͷ৘ใྔ͕૿͑Δ)

    Training Q model + Q and A model Public: 0.45979, Private: 0.41440 Q and A model × 2 (seed average) Public: 0.44298, Private: 0.40613
  7. 10.

    ©2020 Wantedly, Inc. ɾLightGBM ɹɾmax_depth=1, lr=0.1 ɹɾmeta features ɹɹɾtext length

    (question, answer) ɹɹɾmeta data from stackexchange (Score, View, FavoriteCount, …) Post-Process LightGBM Public: 0.45979, Private: 0.41440 Simple binning without meta features Public: 0.45282, Private: 0.41387
  8. 11.

    ©2020 Wantedly, Inc. Why we used LightGBM? 1. Simple binning

    method ɹɾ༧ଌ஋Λ཭ࢄԽ͢Δ͜ͱͰ Spearman’s correlation ͕ྑ͘ͳΔ͜ͱʹؾͮ͘ ɹɾtarget͝ͱʹϏϯαΠζΛࣄલʹઃఆͯ͠Ϗϯೋϯά ɹɾϏϯαΠζ͸ݻఆʹ্ͨ͠ͰBertͷ֤epochͷग़ྗΛweighted average (weight͸࠷దԽ)
  9. 12.

    ©2020 Wantedly, Inc. Why we used LightGBM? 2. Optimize bin-size

    and weights ɹɾϏϯαΠζ΋࠷దͳ஋Λ࢖͍ͨ͘ͳͬͨ ɹɾϏϯαΠζͱweightsͷಉ࣌࠷దԽ্͕ͨ͠ख͍͔͘ͳ͍ ɹɾ࠷దͳϏϯαΠζ͸༧ଌ෼෍ͷܗʹΑܾͬͯ·Δ. ֤foldͷ࠷దͳϏϯαΠζͷฏۉͱ weighted averageޙͷ༧ଌ෼෍͸࠷దͳ΋ͷ͔Βဃ཭͢Δ
  10. 13.

    ©2020 Wantedly, Inc. Why we used LightGBM? 3. LightGBM ɹɾϏϯαΠζͱweightsͷಉ࣌࠷దԽ͍ͨ͠

    ɹɾmeta features΋࢖͍͍ͨ ɹɾGBDT͸σʔλΛ෼ׂͯ͠෼ׂޙͷྖҬʹ࠷దͳ஋ΛׂΓ౰ͯΔख๏ ɹɹˠ ઙ͍৔߹͸Ϗϯχϯάͱಉ༷ͷ཭ࢄԽ͕Ͱ͖ΔΜ͡Όͳ͍͔ max_depth=2 max_depth=8