Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
CommonLitコンペで学んだこと
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
nogawanogawa
October 12, 2023
2.4k
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
CommonLitコンペで学んだこと
nogawanogawa
October 12, 2023
More Decks by nogawanogawa
See All by nogawanogawa
Amazon Bedrockを用いた新着募集のモデレーション半自動化への取り組み
nogawanogawa
2
300
推薦システムにおけるPost Processの取り組み
nogawanogawa
2
550
Python型チェッカー ty を使ってみた話
nogawanogawa
2
1.8k
Devinを導入してドキュメンテーションで変わったこと
nogawanogawa
2
180
相互推薦システム開発の舞台裏と今後の展望
nogawanogawa
2
410
コサイン類似度のいろんな書き方
nogawanogawa
4
1.6k
機械学習で使用しているGCSの料金を激減させた話
nogawanogawa
2
5.5k
How to Index Item IDs for Recommendation Foundation Models
nogawanogawa
0
640
Featured
See All Featured
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
2
1.6k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
2
1.5k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
430
Designing for Performance
lara
611
70k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1.2k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
480
A Modern Web Designer's Workflow
chriscoyier
698
190k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
280
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
190
Agile that works and the tools we love
rasmusluckow
331
22k
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
1.1k
Transcript
©2023 Wantedly, Inc. CommonLitίϯϖͰֶΜͩ͜ͱ Custom HeaderʹΑΔTransformerੑೳվળ ΈΜͳͷPythonษڧձ #98 Oct. 12
2023 - @nogawanogawa
©2023 Wantedly, Inc. CommonLit ίϯϖ ? CommonLit - Evaluate Student
Summaries
©2023 Wantedly, Inc. CommonLit - Evaluate Student Summaries KaggleʢσʔλαΠΤϯείϯϖςΟγϣϯϓϥοτ ϑΥʔϜʣͰ։࠵͞Ε͍ͯͨେձ
• ظؒ: 7/13 ~ 10/12 (3͔݄ɺࠓேऴྃʂ) • ࢀՃऀ: 2106νʔϜ • ࣗͷ࠷ऴॱҐ25ҐͰͨ͠
©2023 Wantedly, Inc. CommonLit - Evaluate Student Summaries • খֶߍ3ੜʙߴߍ3ੜͷੜెʹΑΔจষͷཁ͕ςʔϚ
• ੜె͕࡞ͨ͠ཁจͷ࣭Λهड़༰(contentʣɺޠኮɾจ๏ʢwordingʣͷ2ͭͷ؍Ͱ࠾ ◦ ࣮ࡍʹਓ͕ؒ࠾ͨ݁͠Ռʹ࠷͍ۙػցֶशϞσϧΛ࡞ͬͨਓ͕উར จষΛཁ ੜె ઌੜ XPSEJOH DPOUFOU XPSEJOH DPOUFOU ػցֶशϞσϧ ͳΔ͘ਓؒͷ ࠾݁Ռʹ͍ۙ ϞσϧΛ࡞Γ͍ͨʂ ࢀՃऀ͜ΕΛ࡞͍ͬͯͨ ࠾ ༧ଌ
©2023 Wantedly, Inc. TransformerͱCustom Header
©2023 Wantedly, Inc. TransformerΛͬͨجຊతͳղ๏ • ࣗવจΛ͏ίϯϖͰΑ͘Transformer͕༻͞ΕΔ • χϡʔϥϧωοτϫʔΫͷݱࡏओྲྀͷΞʔΩςΫνϟ • ߏӈਤͷΑ͏ʹͳ͍ͬͯΔ
◦ ChatGPTཪଆ͜Εͷؒ • ͏͜ͱࣗମҙ֎ͱ؆୯ ◦ HuggingFaceͷTransformersϥΠϒϥϦΛ͑ ेߦఔॻ͚खܰʹ͑Δ ◦ ੈքதͷਓֶ͕शࡁΈϞσϧͷσʔλΛެ։ͯ͘͠Εͯ ͍ͯɺDLͯ͠͏͚ͩ
©2023 Wantedly, Inc. Transformers ༻͍ͨ͠ϞσϧͷछྨΛબ ֶशࡁΈϞσϧΛμϯϩʔυ ͘͠ͳ͍ʂ ۩ମతʹղ͖͍ͨͷͨΊʹඍௐ(fine tune) ͢ΔίʔυΛॻ͍ͨͱͯ͠ϓϥεेߦఔ
• ༻͢Δ͚ͩͳΒߦͰॻ͚Δ
©2023 Wantedly, Inc. ਫ਼্ͷํ • લॲཧ • ଟ͘ͷσʔλ͋·Γ͖Ε͍Ͱͳ͍ʢϊΠζͱͳΔͷ͕ଟ͍ʣࣄ͕ଟ͘ɺͦΕΒΛআڈ͢Δ • ࣄલֶशࡁΈϞσϧͷมߋ
• ͑ΔֶशࡁΈϞσϧʹͨ͘͞Μछྨ͕͋ΔʢBertɺRobertaɺDebertaɺetc… ʣ • ֶशσʔλͷ૿ڧ • ΑΓଟ͘ɾଟ༷ͳσʔλͰֶशͨ͠Ϟσϧ͕ڧ͘ͳΓ͕ͪ • Ξϯαϯϒϧ • ෳͷػցֶशϞσϧΛͬͨଟܾ • TransformerͷߏࣗମʹςίೖΕ • Custom Header ʢPoolerʣ • etc… ࠓ͜͜Λ͍ͨ͠ʂ Transformers؆୯ʹ͑Δ & ߴ͍ਫ਼ͷϞσϧΛ༻Ͱ͖Δɻͨͩ͠ɺKaggleͳͲͷίϯϖͰ͜Ε Λ͏ͷ͕ελʔτϥΠϯͰɺίϯϖͰॱҐΛ্͛Δʹ͔͜͜Βߋʹ͕ඞཁ
©2023 Wantedly, Inc. Custom Header (Pooler)ʹΑΔਫ਼্ࡦ • σϑΥϧτͰTransformerΛ༻͢Δͱɺೖྗͷ࠷ॳͷτʔΫϯ (CLSτʔΫϯ)ͷ࠷ऴग़ྗΛͱʹ༧ଌ ࠷ॳͷτʔΫϯग़ྗ͚ͩΛ༻ɺଞͷτʔΫϯͷग़ྗࣺ͍ͯͯΔ
→ײతʹඇৗʹ͍ͬͨͳ͘ײ͡Δ ʢจষશମͷग़ྗΛͬͨ΄͏͕දݱྗߴ͘ͳΓͦ͏ʹࢥ͑Δʣ
©2023 Wantedly, Inc. Custom Header (Pooler)ʹΑΔਫ਼্ࡦ • Custom Header (Pooler)
: จશମͷग़ྗΛͬͯpoolingΛߦ͍ɺ࠷ऴʹೖྗ • poolingͷྫ: ฏۉɺ࠷େɺ࠷ऴ4ͷग़ྗΛͭͳ͛Δɺetc… ࢀߟ: https://www.kaggle.com/code/rhtsingh/utilizing-transformer-representations-efficiently ͦ͠͏ʹݟ͑ͯ ҙ֎ͱߦͰॻ͚Δ ྫ: Average pooling
©2023 Wantedly, Inc. ΦϦδφϧCustom HeaderʹΑΔਫ਼্ࡦ • ࠓճ͕ࣗͬͯҰ൪͏·͘ߦͬͨํ๏ • ཁରͷจͷՕॴ͚ͩΛaverage poolingͯ͠༻
• ղऍ • ࠓճੜెͷཁจͷ࣭ΛධՁ͢Δͷ͕త • ධՁରͷ෦͚ͩ༻͢Δ͜ͱͰ࠷ऴ ͷϊΠζΛݮΒ͢
©2023 Wantedly, Inc. ·ͱΊɿCommonLitίϯϖͰֶΜͩ͜ͱ •TransformerϞσϧͦ͜·Ͱۤ࿑ͤͣ͑ͯߴੑೳɻͨͩ͠Ԟਂ͍ɻ • TransformersΛ͏ͱઌਓ͕࡞ͬͨߴੑೳͷֶशࡁΈϞσϧΛ༻Ͱ͖Δ ◦ ͏͚ͩͳΒͦ͜·Ͱ͘͠ͳ͍ •
ͨͩ͠࠷ۙͷKaggleͳͲͷNLPίϯϖͰɺTransformerΛ͏ͷ͋͘·ͰελʔτϥΠϯ ◦ σʔλαΠΤϯςΟετతʹ͏Ұา౿ΈࠐΜͰ͍͖͍ͨ • TransformerͷੑೳΛߋʹ্ͤ͞ΔΞϓϩʔνͷ1ͭ: Custom Header (Pooler) ◦ Transformer͕λεΫΛղ͖͍͢Α͏ʹ࠷ޙͷPoolingΛௐ͢Δͱ্ख͍͘͘͜ͱ͕͋Δ