Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Kaggle Google Quest Q&A Labeling - 23th place s...
Search
Shuhei Goda
February 28, 2020
Technology
4
4k
Kaggle Google Quest Q&A Labeling - 23th place solution
Shuhei Goda
February 28, 2020
Tweet
Share
More Decks by Shuhei Goda
See All by Shuhei Goda
Turing × atmaCup #18 - 1st Place Solution
hakubishin3
0
490
ジョブマッチングサービスにおける相互推薦システムの応用事例と課題
hakubishin3
3
770
とある事業会社にとっての Kaggler の魅力
hakubishin3
8
2k
課題の解像度が荒かったことで意図した改善ができなかった話
hakubishin3
3
930
Wantedly におけるマッチング体験を最大化させるための推薦システム
hakubishin3
4
1k
Recommendation Industry Talks #1 Opening
hakubishin3
1
330
会社訪問アプリ「Wantedly Visit」での シゴトに関する興味選択機能と推薦改善
hakubishin3
0
580
論文紹介: Improving Implicit Feedback-Based Recommendation through Multi-Behavior Alignment(Xin Xin et al., 2023)
hakubishin3
0
550
Feedback Prize - English Language Learning における擬似ラベルの品質向上の取り組み
hakubishin3
0
910
Other Decks in Technology
See All in Technology
フロントエンド設計にモブ設計を導入してみた / 20241212_cloudsign_TechFrontMeetup
bengo4com
0
1.9k
GitHub Copilot のテクニック集/GitHub Copilot Techniques
rayuron
37
14k
C++26 エラー性動作
faithandbrave
2
760
コンテナセキュリティのためのLandlock入門
nullpo_head
2
320
2024年にチャレンジしたことを振り返るぞ
mitchan
0
140
AWS re:Invent 2024で発表された コードを書く開発者向け機能について
maruto
0
190
Oracle Cloudの生成AIサービスって実際どこまで使えるの? エンジニア目線で試してみた
minorun365
PRO
4
280
Storage Browser for Amazon S3
miu_crescent
1
210
サイバー攻撃を想定したセキュリティガイドライン 策定とASM及びCNAPPの活用方法
syoshie
3
1.3k
LINEヤフーのフロントエンド組織・体制の紹介【24年12月】
lycorp_recruit_jp
0
530
DevOps視点でAWS re:invent2024の新サービス・アプデを振り返ってみた
oshanqq
0
180
1等無人航空機操縦士一発試験 合格までの道のり ドローンミートアップ@大阪 2024/12/18
excdinc
0
160
Featured
See All Featured
How to Think Like a Performance Engineer
csswizardry
22
1.2k
Scaling GitHub
holman
458
140k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
44
9.3k
Testing 201, or: Great Expectations
jmmastey
40
7.1k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
48k
Reflections from 52 weeks, 52 projects
jeffersonlam
347
20k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
365
25k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
2
290
Facilitating Awesome Meetings
lara
50
6.1k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
17
2.3k
Transcript
©2020 Wantedly, Inc. 23th place solution Kaggle Google Quest Q&A
Labeling লձ Feb 28, 2020 - Shuhei Goda - @jy_msc
©2020 Wantedly, Inc. Team - The Hand Shuhei Goda @jy_msc
Visit Engineering Team at Wantedly Naomichi Agata @agatan_ People Engineering Team at Wantedly
©2020 Wantedly, Inc. Model Pipeline #FSUCBTF VODBTFE -JHIU(#. #FSUCBTF VODBTFE
Settings ɾ3fold with GroupKFold ɾBCE + margin ranking loss ɾ3epoch Settings ɾmax_depth=1 ɾlr=0.1 Meta features ɾtext length ɾstackexchange Text data ɾquestion_title ɾquestion_body ɾanswer 1SF1SPDFTT 2BOE" 1SF1SPDFTT POMZ2 ɾquestion_title ɾquestion_body ɾquestion_title ɾquestion_body ɾanswer Settings ɾhtml escape ɾhead+tail truncation
©2020 Wantedly, Inc. ɾHTMLจࣈྻͷΞϯΤεέʔϓ Pre-Process IUUQTXXXLBHHMFDPNDHPPHMFRVFTUDIBMMFOHFEJTDVTTJPO
©2020 Wantedly, Inc. ɾςΩετσʔλͷ݁߹ͱτϦϛϯά ɹɾ[CLS] + question_title + [SEP] +
question_body + [SEP] + answer ɾquestion_body ͱ answer ͕ࢦఆͷ͞Λ͑ͨ߹, ͔྆ΒಉαΠζΛτϦϛϯά Pre-Process IUUQTBSYJWPSHBCT
©2020 Wantedly, Inc. ɾBert-base (uncased) ɹɾޙΖ4ͭͷӅΕͷग़ྗΛ༻ https://arxiv.org/abs/1905.05583 ɹɾQAؒͷSEP tokenͷग़ྗΛ༻ Model
Architecture
©2020 Wantedly, Inc. ɾLabel weight ɹɾ؆୯ͦ͏ͳλεΫweightΛখ͘͞, ෆۉߧͰͦ͠͏ͳλεΫweightΛେ͖͘ ɹɾgpyoptͰweightͷ୳ࡧΛࢼͨ͠Έ͕ͨ, Լهͷ୯७ͳΓํ͕࠷ྑ͔ͬͨ Loss
function Label weight ͋Γ Public: 0.45979, Private: 0.41440 Label weight ͳ͠ Public: 0.43455, Private: 0.40602
©2020 Wantedly, Inc. ɾBCE + margin ranking loss (1 :
1) ɹɾϛχόονΛ2ͭʹׂͯ͠ margin ranking loss Λܭࢉ Loss function BCE + margin ranking loss Public: 0.45979, Private: 0.41440 BCE Public: 0.44006, Private: 0.40668
©2020 Wantedly, Inc. ɾQuestion Model ɹɾQ༻ͷλεΫΛQuestion text͚ͩΛͬͯղ͘ ɹɾΠϯϓοτQ͚ͩͰ͍͍ͷͰ, Qͷtruncationͷྔ͕ݮΔ (Qͷใྔ͕૿͑Δ)
Training Q model + Q and A model Public: 0.45979, Private: 0.41440 Q and A model × 2 (seed average) Public: 0.44298, Private: 0.40613
©2020 Wantedly, Inc. ɾLightGBM ɹɾmax_depth=1, lr=0.1 ɹɾmeta features ɹɹɾtext length
(question, answer) ɹɹɾmeta data from stackexchange (Score, View, FavoriteCount, …) Post-Process LightGBM Public: 0.45979, Private: 0.41440 Simple binning without meta features Public: 0.45282, Private: 0.41387
©2020 Wantedly, Inc. Why we used LightGBM? 1. Simple binning
method ɹɾ༧ଌΛࢄԽ͢Δ͜ͱͰ Spearman’s correlation ͕ྑ͘ͳΔ͜ͱʹؾͮ͘ ɹɾtarget͝ͱʹϏϯαΠζΛࣄલʹઃఆͯ͠Ϗϯೋϯά ɹɾϏϯαΠζݻఆʹ্ͨ͠ͰBertͷ֤epochͷग़ྗΛweighted average (weight࠷దԽ)
©2020 Wantedly, Inc. Why we used LightGBM? 2. Optimize bin-size
and weights ɹɾϏϯαΠζ࠷దͳΛ͍ͨ͘ͳͬͨ ɹɾϏϯαΠζͱweightsͷಉ࣌࠷దԽ্͕ͨ͠ख͍͔͘ͳ͍ ɹɾ࠷దͳϏϯαΠζ༧ଌͷܗʹΑܾͬͯ·Δ. ֤foldͷ࠷దͳϏϯαΠζͷฏۉͱ weighted averageޙͷ༧ଌ࠷దͳͷ͔Βဃ͢Δ
©2020 Wantedly, Inc. Why we used LightGBM? 3. LightGBM ɹɾϏϯαΠζͱweightsͷಉ࣌࠷దԽ͍ͨ͠
ɹɾmeta features͍͍ͨ ɹɾGBDTσʔλΛׂׂͯ͠ޙͷྖҬʹ࠷దͳΛׂΓͯΔख๏ ɹɹˠ ઙ͍߹Ϗϯχϯάͱಉ༷ͷࢄԽ͕Ͱ͖ΔΜ͡Όͳ͍͔ max_depth=2 max_depth=8
©2020 Wantedly, Inc. 4. LightGBM (parameter tuning) ɹɾࢄԽ͢Δ΄Ͳscore͕ྑ͘ͳΔͷͰ, ߏΛۃྗγϯϓϧʹ͍ͨ͠ ɹɾtrainσʔλΛׂͯ͠࠷దͳύϥϝʔλΛݟ͚ͭΔ
ɹɾmax_depthΛҰ൪খ͘͞, lrΛۃྗେ͖ͨ͘͠ํ͕score͕ྑ͘ͳͬͨ Why we used LightGBM?
©2020 Wantedly, Inc. ɾsample weightͷઃఆ ɾhostͷ୯ޠΛΠϯϓοτͷઌ಄ྻʹஔ͘ ɾnew tokenͷՃ ɾBert-base casedΛ͏
ɾtexͷίʔυϒϩοΫΛྗٕͰফڈ Didn’t work for us
©2020 Wantedly, Inc. Discussion: https://www.kaggle.com/c/google-quest-challenge/discussion/129904#742302 Kernel: https://www.kaggle.com/shuheigoda/23th-place-solusion Links
©2020 Wantedly, Inc. https://www.wantedly.com/projects/375150 We are hiring !