Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
NLP2025参加報告
Search
Yano
April 11, 2025
0
550
NLP2025参加報告
こちらのNLP振り返りイベントにおけるLTで使用したスライドです(
https://moneyforward.connpass.com/event/344276/
)
Yano
April 11, 2025
Tweet
Share
More Decks by Yano
See All by Yano
【輪講資料】Length-Induced Embedding Collapse in PLM-based Models
yano0
0
130
【輪講資料】How Do Large Language Models Acquire Factual Knowledge During Pretraining?
yano0
0
210
【輪講資料】ReAct: Synergizing Reasoning and Acting in Language Models / Tree of Thoughts: Deliberate Problem Solving with Large Language Models
yano0
0
180
【輪講資料】SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval
yano0
2
340
【輪講資料】From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
yano0
0
91
【輪講資料】Zero-shot Cross-lingual Semantic Parsing
yano0
0
130
Featured
See All Featured
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Making Projects Easy
brettharned
119
6.4k
Building Adaptive Systems
keathley
43
2.8k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
2.6k
Why You Should Never Use an ORM
jnunemaker
PRO
59
9.6k
What's in a price? How to price your products and services
michaelherold
246
12k
Into the Great Unknown - MozCon
thekraken
40
2.1k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.6k
Writing Fast Ruby
sferik
629
62k
A Modern Web Designer's Workflow
chriscoyier
697
190k
We Have a Design System, Now What?
morganepeng
53
7.8k
Transcript
NLPௌߨࢀՃใࠂ 4݄11 ໊େ ݚڀࣨɹD1 ઍߛ
ࣗݾհ ઍߛʢͷ ͪͻΖʣ • ܦྺɿ໊େ ాɾݚʢम࢜՝ఔʣˠPKSHAʢػցֶशΤϯδχΞʣ →໊େ ݚʢത࢜՝ఔʣ •
ڵຯؔ৺ɿҙຯɺຒΊࠐΈදݱ • ࠷ۙςΩετຒΊࠐΈָ͕͍͠ • ݕࡧಛԽϞσϧɺͥͻ͍ͬͯͩ͘͞ɿ ɹpkshatech/GLuCoSE-base-ja-v2 2 9ZBOP@D
ςΩετຒΊࠐΈϞσϧͱ • ࣗવݴޠจ·ͨจষΛܭࢉػ͕ཧղՄೳͳදݱʢҰൠʹϕΫτ ϧʣʹΤϯίʔυ͢Δͷ • ϕΫτϧؒͷྨࣅΛଌΔ͜ͱͰɺྨࣅΛଌΔ͜ͱ͕Ͱ͖Δ 3 ࣍ճNLPͷ։࠵ʁ ࢜ࢁຊ࠷ߴๆͷಠཱๆͰ͢ɻ ࢁསݝͱ੩Ԭݝʹލ͍ͬͯ·͢ɻ
ຊͰҰ൪ߴ͍ࢁʁ ྨࣅɿ ྨࣅɿߴ Ϟσϧ ʜ ʜ ʜ Ϟσϧ Ϟσϧ 2"λεΫͰͷఆڍಈ
ࠓճͷNLP • ϓϩάϥϜ͔Β”ຒΊࠐΈදݱ”ςʔϚͷফࣦ 😢 • ຒΊࠐΈʹؔ࿈͢Δൃද͕ݮ͍ͬͯΔͷͰʁ • “ຒΊࠐΈ”͕λΠτϧʹೖͬͨൃදɿ17/499݅ˠ26/777݅ 😊 •
ʢׂ߹ʹ͢Δͱ΄ͱΜͲҰఆʣ • “ςΩετ” or ”จ” + “ຒΊࠐΈ”͕λΠτϧʹೖͬͨൃදɿ6݅ˠ6݅ 😊 • “ςΩετຒΊࠐΈ”͕λΠτϧʹೖͬͨൃද0 -> 5݅ ※ ͋͘·Ͱදʹجͮ͘౷ܭ 4
”ຒΊࠐΈ”ΛλΠτϧʹؚΉൃදҰཡ 5 ΨεաఔʹΑΔຒΊࠐΈू߹ͷ࣌ؒભҠͷϞσϧԽ ຒΊࠐΈදݱͷಠཱͷݴޠɾݴޠؒҰ؏ੑͷੳ ຒΊࠐΈϕΫτϧΛ༻͍ͨಈࢺͷҙຯͷཻੳͱڞىؔ Lۙࣄྫʹجͮ͘ຒΊࠐΈදݱͷυϝΠϯదԠͱݕࡧͷԠ༻ ຒΊࠐΈදݱͷࡏ࣍ݩΛଌΔ ՎͷຒΊࠐΈʹجͮ͘ຊՎऔΓͷਪఆ 3VSJຊޠʹಛԽͨ͠൚༻ςΩετຒΊࠐΈϞσϧ จͷຒΊࠐΈʹޮՌతͳ੩త୯ޠϕΫτϧͷ֫ಘ
ରཤྺͷ--.ຒΊࠐΈΛ༻͍ͨԻ߹ͷελΠϧ੍ޚ ܇࿅ෆཁͳ͖݅ςΩετຒΊࠐΈ ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈͷλεΫʹΑΔੑͷҧ͍ খઆձจͷ༁͚ͨٯ༁Λ༻͍ͨऀຒΊࠐΈͷ࡞ ୯ޠຒΊࠐΈͷಠཱੳͷ͕࣠ղऍͰ͖ΔཻͲΕ͘Β͍͔ʁ Ϣʔβߦಈϩάʹجͮ͘ΫΤϦཧղͷͨΊͷݕࡧΫΤϦຒΊࠐΈ ςΩετͷຒΊࠐΈදݱʹجͮ͘σʔλ૿ڧΛ༻͍ͨ 9ʢچ5XJUUFSʣʹ͓͚Δຊޠͷൽݕग़ ςΩετຒΊࠐΈ͔ΒͷςΩετ෮ݩʹ͓͚Δ ༧ଌ੍ޚͷԉ༻ͷޮՌݕূ จֶ൷ධ͔ΒେنݴޠϞσϧ ʕ୯ޠຒΊࠐΈͷΈ͑ʹΑΔจֶςΫετղऍͷࢼΈ --.ຒΊࠐΈͱભҠ֬༧ଌΛར༻ͨ͠ ࣮ళฮސ٬ߦಈγϛϡϨʔγϣϯ ಠཱੳʹΑΔࣄલֶशࡁΈଟݴޠϞσϧͷ Λԣஅͨ͠୯ޠຒΊࠐΈදݱͷੳ දهΏΕ͕จຒΊࠐΈϞσϧʹٴ΅͢Өڹʹ͍ͭͯͷߟ -BSHF7JTJPO-BOHVBHF.PEFMͷ จॻը૾ςΩετຒΊࠐΈͷݕূ จॻຒΊࠐΈͱΫϥελϦϯάΛΈ߹Θͤͨ τϐοΫੳख๏ͷఏҊ --.ࣄલֶशͷޮԽͱੑ࣭վળɿ ຒΊࠐΈ͓Αͼग़ྗͷύϥϝʔλݻఆʹΑΔ࠶ར༻ ຒΊࠐΈϞσϧϕʔεͷڭࢣͳ͠ΩʔϑϨʔζநग़ʹ͓͚Δ จʹର͢Δநग़ਫ਼ͷվળ ֦ࢄϞσϧΛ༻͍ͨςΩετੜʹ͓͚Δ ʮ่յʯͱ࣌ࠁຒΊࠐΈͷӨڹ దԠతରγεςϜͷͨΊͷ ऴ൫ͷձΛ༧ଌ͢ΔຒΊࠐΈϞσϧͷߏங ର༷ʑ👀
• ΞϒετϥΫτΛwordcloudʹͯ͠Έͨ ୯ޠςΩετͷຒ ΊࠐΈ͕ଟͦ͏ ൃදͷ 6 ੳ͕ϝΠϯͷݚڀ ଟͦ͏ ୯ޠςΩετͷຒ ΊࠐΈ͕ଟͦ͏
͍͔ͭ͘հ • ϞσϧΛ܇࿅͍ͯ͠Δจ • Ruri: ຊޠʹಛԽͨ͠൚༻ςΩετຒΊࠐΈϞσϧ [௩ӽΒ] • ΠϯετϥΫγϣϯͱෳλεΫΛར༻ͨ͠ຊޠ͚ࢄදݱϞσϧͷߏ ங
[উຢΒ] • Ϣʔβߦಈϩάʹجͮ͘ΫΤϦཧղͷͨΊͷݕࡧΫΤϦຒΊࠐΈ [Β] • ຒΊࠐΈΛੳ͍ͯ͠Δจ • ಠཱੳʹΑΔࣄલֶशࡁΈଟݴޠϞσϧͷΛԣஅͨ͠୯ޠຒΊࠐΈ දݱͷੳ [Β] • ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈͷλεΫʹΑΔੑͷҧ͍ [௩ӽΒ] 7 ˞˞ͱͯओ؍ͰબΜͰ͍·͢˞˞ଞʹ͜Μͳ͓Ζ͍จ͋ͬͨΑʂͳͲͷίϝϯτܴͰ͢ʂ
Ruri: ຊޠʹಛԽͨ͠൚༻ςΩετຒΊࠐΈϞσϧ • ຊޠ൚༻ςΩετຒΊࠐΈϞσϧɺRuriͷ։ൃɺެ։ • ϞσϧαΠζෳʢsmallɺbaseɺlargeʣ • ్தͷϞσϧͰ͋ΔRuri-PTɺRuri-Rerankerެ։ • ܇࿅༻σʔληοτͷඋ
• ਓσʔληοτΛ࡞ • ෳͷެ։σʔλΛಉҰϑΥʔϚοτͰඋ 8 ຊޠBERT Ruri-PT Ruri-Reranker Ruri ରরࣄલֶश 'JOF5VOJOH ৠཹ ͜͜ʹ͔ͳΓશ͕ͯ͋Γ ͋Γ͕͍ͨ Πϝʔδ จհɿϞσϧߏஙܥ
ΠϯετϥΫγϣϯͱෳλεΫΛར༻ͨ͠ຊޠ͚ࢄදݱϞσϧͷߏங • ෳλεΫɾݴޠͷ܇࿅σʔλͰͷ܇࿅͕ɺJMTEB (=แׅతͳςΩετ ຒΊࠐΈϞσϧͷධՁࢦඪ) ʹରͯ͠༩͑ΔӨڹΛੳ • ӳࠞ߹ͯ͠܇࿅ͨ͠ํ͕ɺຊޠͷΈͰ Ͱ܇࿅͢ΔΑΓߴ͍ੑೳ •
ධՁλεΫʹΑͬͯ༗ޮͳ܇࿅λεΫҟͳΔ • ӳࠞ߹σʔλͰ܇࿅͠ߏஙͨ͠ϞσϧΛެ։ • retrieva-jp/amber-base, retrieva-jp/amber-large 9 จհɿϞσϧߏஙܥ ܇࿅͔Βআ͘λεΫ ධՁλεΫͷੑೳมԽ ྫ) NLIͰͷ܇࿅ɿSTSੑೳ⤴ ΫϥελϦϯάੑೳ⤵ ˝ਤΑΓҾ༻
Ϣʔβߦಈϩάʹجͮ͘ΫΤϦཧղͷͨΊͷݕࡧΫΤϦຒΊࠐΈ • ݕࡧΫΤϦ௨ৗͷςΩετຒΊࠐΈͰରͱ͞ΕΔࣗવจͱൺֱͯ͘͠ จ຺͕͍ܽͯ͠Δʢྫɿ࡚ͷདྷिͷఱؾԿʁʣ • ϢʔβʔͷߦಈϩάΛར༻ͯ͠ྨࣅҙਤΛ࣋ͭΫΤϦϖΞΛநग़͠܇࿅ʹར༻͢ Δख๏ɺUBIQUEΛఏҊ • ΫϦοΫϩάɿݕࡧ݁Ռͷಉ͡URLΛΫϦοΫ ͨ͠ΫΤϦ
• ηογϣϯϩάɿಉ͡ηογϣϯͰҰఆ࣌ؒʹ ೖྗ͞ΕͨΫΤϦ • → දʹΑΒͣಉ͡ҙਤͷΫΤϦ͕நग़͞ΕΔ • ಛʹදมԽʹؤ݈ͳϞσϧΛߏங 10 จհɿϞσϧߏஙܥ ˝ਤΑΓҾ༻
ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈͷλεΫʹΑΔੑͷҧ͍ • ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈɿ • λεΫݻ༗ͷΠϯετϥΫγϣϯΛ༩ͯ͠ຒΊࠐΈΛ࡞Δ • λεΫʹґଘͨ͠ੑΛ࣋ͭ͜ͱΛࣔͨ͠ 11 จհɿੳܥ 4096࣍ݩΛ16࣍ݩ·Ͱ࣍ݩݮ
ͯ͠ੑೳྼԽͳ͍ 512࣍ݩ͙Β͍·Ͱ ྼԽগͳ͍ Ҏ߱ੑೳྼԽ͕ݦஶ ྨλεΫ ݕࡧλεΫ ˝ਤΑΓҾ༻ ˝ਤΑΓҾ༻
ಠཱੳʹΑΔࣄલֶशࡁΈଟݴޠϞσϧͷΛԣஅͨ͠୯ޠຒΊࠐΈදݱͷੳ • ଟݴޠϞσϧ͕͝ͱʹ࣋ͭಛΛ ಠཱੳʢICAʣʹΑ֤ͬͯ࣠ʹ • ग़ྗʹ͍ۙ΄ͲҙຯʹΑ͕ͬͯ࣠ ͢Δ͜ͱΛ໌Β͔ʹͨ͠ • 1ɿ ɹɹɹ࣠දܥʹΑͬͯ
• 712ɿ ɹɹɹ࣠ҙຯʹΑͬͯ 12 จհɿੳܥ ˝ਤΑΓҾ༻
ࢀՃใࠂతͳ༰ • จͷհ͔ͬͯ͠͠·ͬͨͷͰ… • 3ճͷNLPࢀՃͰײͨ͜͡ͱ • ࠃݚڀք۾ͰͷϗοτͳςʔϚ͕໌ʹͳͬͯษڧʹͳΔ • ྫ͑ࣗϞσϧղੳपΓʹ͍ͭͯશʹӜౡଠͩͬͨ •
ϙελʔ͔ͬΓݟ͍ͯΔͷΛࣙΊ͍ͨ • ͱΓ͋͑ͣͰϙελʔձʹߦͬͯ͠·͏͜ͱ͕ଟ͔͕ͬͨɺޱड़ ʹ໘നͦ͏ͳൃද͋ͬͨͳ…ͱؼޙʹޙչ • ࣗͷֶͼ͕͕Δͱͱʹɺݟ͕ͬͨ૿ָ͍͑ͯ͠ 13
࠷ޙʹ • օ͞Μͷ͓͢͢ΊNLPจɺͥͻڭ͍͑ͯͩ͘͞🥺 • ͠Β͘౦ژʹ͍ΔͷͰɺͳΜͰ༠͍ͬͯͩ͘͞ʂ • ʢ໘നͦ͏ͳΠϯλʔϯͳͲɺڭ͍͑ͯͩ͘͞ʣ 14