Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Kaggle Google Quest Q&A Labeling - 23th place s...
Search
Shuhei Goda
February 28, 2020
Technology
4
4.2k
Kaggle Google Quest Q&A Labeling - 23th place solution
Shuhei Goda
February 28, 2020
Tweet
Share
More Decks by Shuhei Goda
See All by Shuhei Goda
Turing × atmaCup #18 - 1st Place Solution
hakubishin3
0
1.2k
ジョブマッチングサービスにおける相互推薦システムの応用事例と課題
hakubishin3
3
1.1k
とある事業会社にとっての Kaggler の魅力
hakubishin3
9
3.1k
課題の解像度が荒かったことで意図した改善ができなかった話
hakubishin3
3
1.1k
Wantedly におけるマッチング体験を最大化させるための推薦システム
hakubishin3
4
1.3k
Recommendation Industry Talks #1 Opening
hakubishin3
1
450
会社訪問アプリ「Wantedly Visit」での シゴトに関する興味選択機能と推薦改善
hakubishin3
0
730
論文紹介: Improving Implicit Feedback-Based Recommendation through Multi-Behavior Alignment(Xin Xin et al., 2023)
hakubishin3
0
700
Feedback Prize - English Language Learning における擬似ラベルの品質向上の取り組み
hakubishin3
1
1.1k
Other Decks in Technology
See All in Technology
マネージャー版 "提案のレベル" を上げる
konifar
22
15k
S3はフラットである –AWS公式SDKにも存在した、 署名付きURLにおけるパストラバーサル脆弱性– / JAWS DAYS 2026
flatt_security
0
1.7k
Evolution of Claude Code & How to use features
oikon48
1
580
最強のAIエージェントを諦めたら品質が上がった話 / how quality improved after giving up on the strongest AI agent
kt2mikan
0
150
楽しく学ぼう!ネットワーク入門
shotashiratori
3
2.8k
マルチプレーンGPUネットワークを実現するシャッフルアーキテクチャの整理と考察
markunet
2
230
Google系サービスで文字起こしから勝手にカレンダーを埋めるエージェントを作った話
risatube
0
130
JAWS FESTA 2025でリリースしたほぼリアルタイム文字起こし/翻訳機能の構成について
naoki8408
1
300
AWS DevOps Agent vs SRE俺 / AWS DevOps Agent vs me, the SRE
sms_tech
3
540
OpenClawで回す組織運営
jacopen
3
690
「ストレッチゾーンに挑戦し続ける」ことって難しくないですか? メンバーの持続的成長を支えるEMの環境設計
sansantech
PRO
3
640
DevOpsエージェントで実現する!! AWS Well-Architected(W-A) を実現するシステム設計 / 20260307 Masaki Okuda
shift_evolve
PRO
3
550
Featured
See All Featured
Claude Code のすすめ
schroneko
67
220k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
670
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
Visualization
eitanlees
150
17k
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
540
Documentation Writing (for coders)
carmenintech
77
5.3k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
860
Making Projects Easy
brettharned
120
6.6k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
9.9k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.2k
Transcript
©2020 Wantedly, Inc. 23th place solution Kaggle Google Quest Q&A
Labeling লձ Feb 28, 2020 - Shuhei Goda - @jy_msc
©2020 Wantedly, Inc. Team - The Hand Shuhei Goda @jy_msc
Visit Engineering Team at Wantedly Naomichi Agata @agatan_ People Engineering Team at Wantedly
©2020 Wantedly, Inc. Model Pipeline #FSUCBTF VODBTFE -JHIU(#. #FSUCBTF VODBTFE
Settings ɾ3fold with GroupKFold ɾBCE + margin ranking loss ɾ3epoch Settings ɾmax_depth=1 ɾlr=0.1 Meta features ɾtext length ɾstackexchange Text data ɾquestion_title ɾquestion_body ɾanswer 1SF1SPDFTT 2BOE" 1SF1SPDFTT POMZ2 ɾquestion_title ɾquestion_body ɾquestion_title ɾquestion_body ɾanswer Settings ɾhtml escape ɾhead+tail truncation
©2020 Wantedly, Inc. ɾHTMLจࣈྻͷΞϯΤεέʔϓ Pre-Process IUUQTXXXLBHHMFDPNDHPPHMFRVFTUDIBMMFOHFEJTDVTTJPO
©2020 Wantedly, Inc. ɾςΩετσʔλͷ݁߹ͱτϦϛϯά ɹɾ[CLS] + question_title + [SEP] +
question_body + [SEP] + answer ɾquestion_body ͱ answer ͕ࢦఆͷ͞Λ͑ͨ߹, ͔྆ΒಉαΠζΛτϦϛϯά Pre-Process IUUQTBSYJWPSHBCT
©2020 Wantedly, Inc. ɾBert-base (uncased) ɹɾޙΖ4ͭͷӅΕͷग़ྗΛ༻ https://arxiv.org/abs/1905.05583 ɹɾQAؒͷSEP tokenͷग़ྗΛ༻ Model
Architecture
©2020 Wantedly, Inc. ɾLabel weight ɹɾ؆୯ͦ͏ͳλεΫweightΛখ͘͞, ෆۉߧͰͦ͠͏ͳλεΫweightΛେ͖͘ ɹɾgpyoptͰweightͷ୳ࡧΛࢼͨ͠Έ͕ͨ, Լهͷ୯७ͳΓํ͕࠷ྑ͔ͬͨ Loss
function Label weight ͋Γ Public: 0.45979, Private: 0.41440 Label weight ͳ͠ Public: 0.43455, Private: 0.40602
©2020 Wantedly, Inc. ɾBCE + margin ranking loss (1 :
1) ɹɾϛχόονΛ2ͭʹׂͯ͠ margin ranking loss Λܭࢉ Loss function BCE + margin ranking loss Public: 0.45979, Private: 0.41440 BCE Public: 0.44006, Private: 0.40668
©2020 Wantedly, Inc. ɾQuestion Model ɹɾQ༻ͷλεΫΛQuestion text͚ͩΛͬͯղ͘ ɹɾΠϯϓοτQ͚ͩͰ͍͍ͷͰ, Qͷtruncationͷྔ͕ݮΔ (Qͷใྔ͕૿͑Δ)
Training Q model + Q and A model Public: 0.45979, Private: 0.41440 Q and A model × 2 (seed average) Public: 0.44298, Private: 0.40613
©2020 Wantedly, Inc. ɾLightGBM ɹɾmax_depth=1, lr=0.1 ɹɾmeta features ɹɹɾtext length
(question, answer) ɹɹɾmeta data from stackexchange (Score, View, FavoriteCount, …) Post-Process LightGBM Public: 0.45979, Private: 0.41440 Simple binning without meta features Public: 0.45282, Private: 0.41387
©2020 Wantedly, Inc. Why we used LightGBM? 1. Simple binning
method ɹɾ༧ଌΛࢄԽ͢Δ͜ͱͰ Spearman’s correlation ͕ྑ͘ͳΔ͜ͱʹؾͮ͘ ɹɾtarget͝ͱʹϏϯαΠζΛࣄલʹઃఆͯ͠Ϗϯೋϯά ɹɾϏϯαΠζݻఆʹ্ͨ͠ͰBertͷ֤epochͷग़ྗΛweighted average (weight࠷దԽ)
©2020 Wantedly, Inc. Why we used LightGBM? 2. Optimize bin-size
and weights ɹɾϏϯαΠζ࠷దͳΛ͍ͨ͘ͳͬͨ ɹɾϏϯαΠζͱweightsͷಉ࣌࠷దԽ্͕ͨ͠ख͍͔͘ͳ͍ ɹɾ࠷దͳϏϯαΠζ༧ଌͷܗʹΑܾͬͯ·Δ. ֤foldͷ࠷దͳϏϯαΠζͷฏۉͱ weighted averageޙͷ༧ଌ࠷దͳͷ͔Βဃ͢Δ
©2020 Wantedly, Inc. Why we used LightGBM? 3. LightGBM ɹɾϏϯαΠζͱweightsͷಉ࣌࠷దԽ͍ͨ͠
ɹɾmeta features͍͍ͨ ɹɾGBDTσʔλΛׂׂͯ͠ޙͷྖҬʹ࠷దͳΛׂΓͯΔख๏ ɹɹˠ ઙ͍߹Ϗϯχϯάͱಉ༷ͷࢄԽ͕Ͱ͖ΔΜ͡Όͳ͍͔ max_depth=2 max_depth=8
©2020 Wantedly, Inc. 4. LightGBM (parameter tuning) ɹɾࢄԽ͢Δ΄Ͳscore͕ྑ͘ͳΔͷͰ, ߏΛۃྗγϯϓϧʹ͍ͨ͠ ɹɾtrainσʔλΛׂͯ͠࠷దͳύϥϝʔλΛݟ͚ͭΔ
ɹɾmax_depthΛҰ൪খ͘͞, lrΛۃྗେ͖ͨ͘͠ํ͕score͕ྑ͘ͳͬͨ Why we used LightGBM?
©2020 Wantedly, Inc. ɾsample weightͷઃఆ ɾhostͷ୯ޠΛΠϯϓοτͷઌ಄ྻʹஔ͘ ɾnew tokenͷՃ ɾBert-base casedΛ͏
ɾtexͷίʔυϒϩοΫΛྗٕͰফڈ Didn’t work for us
©2020 Wantedly, Inc. Discussion: https://www.kaggle.com/c/google-quest-challenge/discussion/129904#742302 Kernel: https://www.kaggle.com/shuheigoda/23th-place-solusion Links
©2020 Wantedly, Inc. https://www.wantedly.com/projects/375150 We are hiring !