Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
NLP2025参加報告
Search
Yano
April 11, 2025
0
530
NLP2025参加報告
こちらのNLP振り返りイベントにおけるLTで使用したスライドです(
https://moneyforward.connpass.com/event/344276/
)
Yano
April 11, 2025
Tweet
Share
More Decks by Yano
See All by Yano
[WIP] How Do Large Language Models Acquire Factual Knowledge During Pretraining?
yano0
0
8
【輪講資料】ReAct: Synergizing Reasoning and Acting in Language Models / Tree of Thoughts: Deliberate Problem Solving with Large Language Models
yano0
0
170
【輪講資料】SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval
yano0
2
320
【輪講資料】From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
yano0
0
84
【輪講資料】Zero-shot Cross-lingual Semantic Parsing
yano0
0
120
Featured
See All Featured
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Adopting Sorbet at Scale
ufuk
77
9.5k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
A Modern Web Designer's Workflow
chriscoyier
695
190k
Scaling GitHub
holman
462
140k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
How STYLIGHT went responsive
nonsquared
100
5.7k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.4k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3k
Code Review Best Practice
trishagee
70
19k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.6k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Transcript
NLPௌߨࢀՃใࠂ 4݄11 ໊େ ݚڀࣨɹD1 ઍߛ
ࣗݾհ ઍߛʢͷ ͪͻΖʣ • ܦྺɿ໊େ ాɾݚʢम࢜՝ఔʣˠPKSHAʢػցֶशΤϯδχΞʣ →໊େ ݚʢത࢜՝ఔʣ •
ڵຯؔ৺ɿҙຯɺຒΊࠐΈදݱ • ࠷ۙςΩετຒΊࠐΈָ͕͍͠ • ݕࡧಛԽϞσϧɺͥͻ͍ͬͯͩ͘͞ɿ ɹpkshatech/GLuCoSE-base-ja-v2 2 9ZBOP@D
ςΩετຒΊࠐΈϞσϧͱ • ࣗવݴޠจ·ͨจষΛܭࢉػ͕ཧղՄೳͳදݱʢҰൠʹϕΫτ ϧʣʹΤϯίʔυ͢Δͷ • ϕΫτϧؒͷྨࣅΛଌΔ͜ͱͰɺྨࣅΛଌΔ͜ͱ͕Ͱ͖Δ 3 ࣍ճNLPͷ։࠵ʁ ࢜ࢁຊ࠷ߴๆͷಠཱๆͰ͢ɻ ࢁསݝͱ੩Ԭݝʹލ͍ͬͯ·͢ɻ
ຊͰҰ൪ߴ͍ࢁʁ ྨࣅɿ ྨࣅɿߴ Ϟσϧ ʜ ʜ ʜ Ϟσϧ Ϟσϧ 2"λεΫͰͷఆڍಈ
ࠓճͷNLP • ϓϩάϥϜ͔Β”ຒΊࠐΈදݱ”ςʔϚͷফࣦ 😢 • ຒΊࠐΈʹؔ࿈͢Δൃද͕ݮ͍ͬͯΔͷͰʁ • “ຒΊࠐΈ”͕λΠτϧʹೖͬͨൃදɿ17/499݅ˠ26/777݅ 😊 •
ʢׂ߹ʹ͢Δͱ΄ͱΜͲҰఆʣ • “ςΩετ” or ”จ” + “ຒΊࠐΈ”͕λΠτϧʹೖͬͨൃදɿ6݅ˠ6݅ 😊 • “ςΩετຒΊࠐΈ”͕λΠτϧʹೖͬͨൃද0 -> 5݅ ※ ͋͘·Ͱදʹجͮ͘౷ܭ 4
”ຒΊࠐΈ”ΛλΠτϧʹؚΉൃදҰཡ 5 ΨεաఔʹΑΔຒΊࠐΈू߹ͷ࣌ؒભҠͷϞσϧԽ ຒΊࠐΈදݱͷಠཱͷݴޠɾݴޠؒҰ؏ੑͷੳ ຒΊࠐΈϕΫτϧΛ༻͍ͨಈࢺͷҙຯͷཻੳͱڞىؔ Lۙࣄྫʹجͮ͘ຒΊࠐΈදݱͷυϝΠϯదԠͱݕࡧͷԠ༻ ຒΊࠐΈදݱͷࡏ࣍ݩΛଌΔ ՎͷຒΊࠐΈʹجͮ͘ຊՎऔΓͷਪఆ 3VSJຊޠʹಛԽͨ͠൚༻ςΩετຒΊࠐΈϞσϧ จͷຒΊࠐΈʹޮՌతͳ੩త୯ޠϕΫτϧͷ֫ಘ
ରཤྺͷ--.ຒΊࠐΈΛ༻͍ͨԻ߹ͷελΠϧ੍ޚ ܇࿅ෆཁͳ͖݅ςΩετຒΊࠐΈ ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈͷλεΫʹΑΔੑͷҧ͍ খઆձจͷ༁͚ͨٯ༁Λ༻͍ͨऀຒΊࠐΈͷ࡞ ୯ޠຒΊࠐΈͷಠཱੳͷ͕࣠ղऍͰ͖ΔཻͲΕ͘Β͍͔ʁ Ϣʔβߦಈϩάʹجͮ͘ΫΤϦཧղͷͨΊͷݕࡧΫΤϦຒΊࠐΈ ςΩετͷຒΊࠐΈදݱʹجͮ͘σʔλ૿ڧΛ༻͍ͨ 9ʢچ5XJUUFSʣʹ͓͚Δຊޠͷൽݕग़ ςΩετຒΊࠐΈ͔ΒͷςΩετ෮ݩʹ͓͚Δ ༧ଌ੍ޚͷԉ༻ͷޮՌݕূ จֶ൷ධ͔ΒେنݴޠϞσϧ ʕ୯ޠຒΊࠐΈͷΈ͑ʹΑΔจֶςΫετղऍͷࢼΈ --.ຒΊࠐΈͱભҠ֬༧ଌΛར༻ͨ͠ ࣮ళฮސ٬ߦಈγϛϡϨʔγϣϯ ಠཱੳʹΑΔࣄલֶशࡁΈଟݴޠϞσϧͷ Λԣஅͨ͠୯ޠຒΊࠐΈදݱͷੳ දهΏΕ͕จຒΊࠐΈϞσϧʹٴ΅͢Өڹʹ͍ͭͯͷߟ -BSHF7JTJPO-BOHVBHF.PEFMͷ จॻը૾ςΩετຒΊࠐΈͷݕূ จॻຒΊࠐΈͱΫϥελϦϯάΛΈ߹Θͤͨ τϐοΫੳख๏ͷఏҊ --.ࣄલֶशͷޮԽͱੑ࣭վળɿ ຒΊࠐΈ͓Αͼग़ྗͷύϥϝʔλݻఆʹΑΔ࠶ར༻ ຒΊࠐΈϞσϧϕʔεͷڭࢣͳ͠ΩʔϑϨʔζநग़ʹ͓͚Δ จʹର͢Δநग़ਫ਼ͷվળ ֦ࢄϞσϧΛ༻͍ͨςΩετੜʹ͓͚Δ ʮ่յʯͱ࣌ࠁຒΊࠐΈͷӨڹ దԠతରγεςϜͷͨΊͷ ऴ൫ͷձΛ༧ଌ͢ΔຒΊࠐΈϞσϧͷߏங ର༷ʑ👀
• ΞϒετϥΫτΛwordcloudʹͯ͠Έͨ ୯ޠςΩετͷຒ ΊࠐΈ͕ଟͦ͏ ൃදͷ 6 ੳ͕ϝΠϯͷݚڀ ଟͦ͏ ୯ޠςΩετͷຒ ΊࠐΈ͕ଟͦ͏
͍͔ͭ͘հ • ϞσϧΛ܇࿅͍ͯ͠Δจ • Ruri: ຊޠʹಛԽͨ͠൚༻ςΩετຒΊࠐΈϞσϧ [௩ӽΒ] • ΠϯετϥΫγϣϯͱෳλεΫΛར༻ͨ͠ຊޠ͚ࢄදݱϞσϧͷߏ ங
[উຢΒ] • Ϣʔβߦಈϩάʹجͮ͘ΫΤϦཧղͷͨΊͷݕࡧΫΤϦຒΊࠐΈ [Β] • ຒΊࠐΈΛੳ͍ͯ͠Δจ • ಠཱੳʹΑΔࣄલֶशࡁΈଟݴޠϞσϧͷΛԣஅͨ͠୯ޠຒΊࠐΈ දݱͷੳ [Β] • ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈͷλεΫʹΑΔੑͷҧ͍ [௩ӽΒ] 7 ˞˞ͱͯओ؍ͰબΜͰ͍·͢˞˞ଞʹ͜Μͳ͓Ζ͍จ͋ͬͨΑʂͳͲͷίϝϯτܴͰ͢ʂ
Ruri: ຊޠʹಛԽͨ͠൚༻ςΩετຒΊࠐΈϞσϧ • ຊޠ൚༻ςΩετຒΊࠐΈϞσϧɺRuriͷ։ൃɺެ։ • ϞσϧαΠζෳʢsmallɺbaseɺlargeʣ • ్தͷϞσϧͰ͋ΔRuri-PTɺRuri-Rerankerެ։ • ܇࿅༻σʔληοτͷඋ
• ਓσʔληοτΛ࡞ • ෳͷެ։σʔλΛಉҰϑΥʔϚοτͰඋ 8 ຊޠBERT Ruri-PT Ruri-Reranker Ruri ରরࣄલֶश 'JOF5VOJOH ৠཹ ͜͜ʹ͔ͳΓશ͕ͯ͋Γ ͋Γ͕͍ͨ Πϝʔδ จհɿϞσϧߏஙܥ
ΠϯετϥΫγϣϯͱෳλεΫΛར༻ͨ͠ຊޠ͚ࢄදݱϞσϧͷߏங • ෳλεΫɾݴޠͷ܇࿅σʔλͰͷ܇࿅͕ɺJMTEB (=แׅతͳςΩετ ຒΊࠐΈϞσϧͷධՁࢦඪ) ʹରͯ͠༩͑ΔӨڹΛੳ • ӳࠞ߹ͯ͠܇࿅ͨ͠ํ͕ɺຊޠͷΈͰ Ͱ܇࿅͢ΔΑΓߴ͍ੑೳ •
ධՁλεΫʹΑͬͯ༗ޮͳ܇࿅λεΫҟͳΔ • ӳࠞ߹σʔλͰ܇࿅͠ߏஙͨ͠ϞσϧΛެ։ • retrieva-jp/amber-base, retrieva-jp/amber-large 9 จհɿϞσϧߏஙܥ ܇࿅͔Βআ͘λεΫ ධՁλεΫͷੑೳมԽ ྫ) NLIͰͷ܇࿅ɿSTSੑೳ⤴ ΫϥελϦϯάੑೳ⤵ ˝ਤΑΓҾ༻
Ϣʔβߦಈϩάʹجͮ͘ΫΤϦཧղͷͨΊͷݕࡧΫΤϦຒΊࠐΈ • ݕࡧΫΤϦ௨ৗͷςΩετຒΊࠐΈͰରͱ͞ΕΔࣗવจͱൺֱͯ͘͠ จ຺͕͍ܽͯ͠Δʢྫɿ࡚ͷདྷिͷఱؾԿʁʣ • ϢʔβʔͷߦಈϩάΛར༻ͯ͠ྨࣅҙਤΛ࣋ͭΫΤϦϖΞΛநग़͠܇࿅ʹར༻͢ Δख๏ɺUBIQUEΛఏҊ • ΫϦοΫϩάɿݕࡧ݁Ռͷಉ͡URLΛΫϦοΫ ͨ͠ΫΤϦ
• ηογϣϯϩάɿಉ͡ηογϣϯͰҰఆ࣌ؒʹ ೖྗ͞ΕͨΫΤϦ • → දʹΑΒͣಉ͡ҙਤͷΫΤϦ͕நग़͞ΕΔ • ಛʹදมԽʹؤ݈ͳϞσϧΛߏங 10 จհɿϞσϧߏஙܥ ˝ਤΑΓҾ༻
ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈͷλεΫʹΑΔੑͷҧ͍ • ϓϩϯϓτʹجͮ͘ςΩετຒΊࠐΈɿ • λεΫݻ༗ͷΠϯετϥΫγϣϯΛ༩ͯ͠ຒΊࠐΈΛ࡞Δ • λεΫʹґଘͨ͠ੑΛ࣋ͭ͜ͱΛࣔͨ͠ 11 จհɿੳܥ 4096࣍ݩΛ16࣍ݩ·Ͱ࣍ݩݮ
ͯ͠ੑೳྼԽͳ͍ 512࣍ݩ͙Β͍·Ͱ ྼԽগͳ͍ Ҏ߱ੑೳྼԽ͕ݦஶ ྨλεΫ ݕࡧλεΫ ˝ਤΑΓҾ༻ ˝ਤΑΓҾ༻
ಠཱੳʹΑΔࣄલֶशࡁΈଟݴޠϞσϧͷΛԣஅͨ͠୯ޠຒΊࠐΈදݱͷੳ • ଟݴޠϞσϧ͕͝ͱʹ࣋ͭಛΛ ಠཱੳʢICAʣʹΑ֤ͬͯ࣠ʹ • ग़ྗʹ͍ۙ΄ͲҙຯʹΑ͕ͬͯ࣠ ͢Δ͜ͱΛ໌Β͔ʹͨ͠ • 1ɿ ɹɹɹ࣠දܥʹΑͬͯ
• 712ɿ ɹɹɹ࣠ҙຯʹΑͬͯ 12 จհɿੳܥ ˝ਤΑΓҾ༻
ࢀՃใࠂతͳ༰ • จͷհ͔ͬͯ͠͠·ͬͨͷͰ… • 3ճͷNLPࢀՃͰײͨ͜͡ͱ • ࠃݚڀք۾ͰͷϗοτͳςʔϚ͕໌ʹͳͬͯษڧʹͳΔ • ྫ͑ࣗϞσϧղੳपΓʹ͍ͭͯશʹӜౡଠͩͬͨ •
ϙελʔ͔ͬΓݟ͍ͯΔͷΛࣙΊ͍ͨ • ͱΓ͋͑ͣͰϙελʔձʹߦͬͯ͠·͏͜ͱ͕ଟ͔͕ͬͨɺޱड़ ʹ໘നͦ͏ͳൃද͋ͬͨͳ…ͱؼޙʹޙչ • ࣗͷֶͼ͕͕Δͱͱʹɺݟ͕ͬͨ૿ָ͍͑ͯ͠ 13
࠷ޙʹ • օ͞Μͷ͓͢͢ΊNLPจɺͥͻڭ͍͑ͯͩ͘͞🥺 • ͠Β͘౦ژʹ͍ΔͷͰɺͳΜͰ༠͍ͬͯͩ͘͞ʂ • ʢ໘നͦ͏ͳΠϯλʔϯͳͲɺڭ͍͑ͯͩ͘͞ʣ 14