Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINEにおける自然言語処理技術の 研究開発の現状 / Current status of R...

LINEにおける自然言語処理技術の 研究開発の現状 / Current status of R&D of Natural Language Processing technology at LINE

LINE株式会社 佐藤敏紀 (@overlast)
第13回最先端NLP勉強会でのスポンサーセッション資料です
https://sites.google.com/view/snlp-jp/home/2021

LINE Developers

September 16, 2021
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. Toshinori Sato (@overlast) • Senior Software Engineer / Manager •

    Natural Language Processing • Information Retrieval • LINE CLOVA • Japanese NLU system • HyperCLOVA • Japanese Corpus / Evaluation • OSS • Main Contributor of NEologd project • mecab-ipadic-NEologd
  2. ֓ཁͱfew-shotͷྫ: ఆٛจཁ໿ • ֓ཁఆٛจΛཁ໿͢Δɻ • ఆٛจ҇໺ल໌ʢ͋ΜͷͻͰ͖͋ɺ೥݄೔ʣ͸ɺ೔ຊͷΞχϝʔλʔɺөը؂ಜɺ ࣮ۀՈɻΧϥʔ୅දऔక໾ࣾ௕ɻגࣜձࣾϓϩδΣΫτελδΦ2૑࡞؅ཧ౷ׅɻגࣜձࣾͰ΄͗ ΌΒΓʔऔక໾ɻ/10๏ਓΞχϝಛࡱΞʔΧΠϒػߏཧࣄ௕ɻࢁޱݝӉ෦ࢢग़਎ɻࢁޱݝཱӉ෦ ߴ౳ֶߍଔۀɻେࡕܳज़େֶܳज़ֶ෦ө૾ܭըֶՊʢݱɾө૾ֶՊʣআ੶ɻ݂ӷܕ͸"ܕɻ࠺͸ອ ըՈͷ҆໺ϞϤίɻ

    • ཁ໿҇໺ल໌͸೔ຊͷΦλΫͰ͢ɻ • ఆٛจࢳҪॏཬʢ͍ͱ͍͛͠͞ͱɺ೥ʢত࿨೥ʣ݄೔ʣ͸ɺ೔ຊͷίϐʔϥΠ λʔɺΤοηΠετɺλϨϯτɺ࡞ࢺՈɻגࣜձࣾ΄΅೔୅දऔక໾ࣾ௕<>ɻגࣜձࣾΤΠϓ୅ දऔక໾<>ɻϑΟʔϧζגࣜձࣾࣾ֎औక໾<>ɻ࠺͸ঁ༏ͷṤޱՄೆࢠɻѪݘ͸δϟοΫɾϥο ηϧɾςϦΞͷϒΠίɺ೔ຊϞϊϙϦʔڠձձ௕ɻ݂ӷܕ͸"ܕ<>ɻ਎௕DNɻ 
 • ཁ໿
  3. ֓ཁͱfew-shotͷྫ: ఆٛจཁ໿ 888 • ֓ཁఆٛจΛཁ໿͢Δɻ • ఆٛจ҇໺ल໌ʢ͋ΜͷͻͰ͖͋ɺ೥݄೔ʣ͸ɺ೔ຊͷΞχϝʔλʔɺөը؂ಜɺ ࣮ۀՈɻΧϥʔ୅දऔక໾ࣾ௕ɻגࣜձࣾϓϩδΣΫτελδΦ2૑࡞؅ཧ౷ׅɻגࣜձࣾͰ΄͗ ΌΒΓʔऔక໾ɻ/10๏ਓΞχϝಛࡱΞʔΧΠϒػߏཧࣄ௕ɻࢁޱݝӉ෦ࢢग़਎ɻࢁޱݝཱӉ෦ ߴ౳ֶߍଔۀɻେࡕܳज़େֶܳज़ֶ෦ө૾ܭըֶՊʢݱɾө૾ֶՊʣআ੶ɻ݂ӷܕ͸"ܕɻ࠺͸ອ

    ըՈͷ҆໺ϞϤίɻ • ཁ໿҇໺ल໌͸೔ຊͷΦλΫͰ͢ɻ • ఆٛจࢳҪॏཬʢ͍ͱ͍͛͠͞ͱɺ೥ʢত࿨೥ʣ݄೔ʣ͸ɺ೔ຊͷίϐʔϥΠ λʔɺΤοηΠετɺλϨϯτɺ࡞ࢺՈɻגࣜձࣾ΄΅೔୅දऔక໾ࣾ௕<>ɻגࣜձࣾΤΠϓ୅ දऔక໾<>ɻϑΟʔϧζגࣜձࣾࣾ֎औక໾<>ɻ࠺͸ঁ༏ͷṤޱՄೆࢠɻѪݘ͸δϟοΫɾϥο ηϧɾςϦΞͷϒΠίɺ೔ຊϞϊϙϦʔڠձձ௕ɻ݂ӷܕ͸"ܕ<>ɻ਎௕DNɻ 
 • ཁ໿ ֓ཁ GFXTIPU@ ೖྗ
  4. ֓ཁͱfew-shotͷྫ: ఆٛจཁ໿ • ֓ཁఆٛจΛཁ໿͢Δɻ • ఆٛจ҇໺ल໌ʢ͋ΜͷͻͰ͖͋ɺ೥݄೔ʣ͸ɺ೔ຊͷΞχϝʔλʔɺөը؂ಜɺ ࣮ۀՈɻΧϥʔ୅දऔక໾ࣾ௕ɻגࣜձࣾϓϩδΣΫτελδΦ2૑࡞؅ཧ౷ׅɻגࣜձࣾͰ΄͗ ΌΒΓʔऔక໾ɻ/10๏ਓΞχϝಛࡱΞʔΧΠϒػߏཧࣄ௕ɻࢁޱݝӉ෦ࢢग़਎ɻࢁޱݝཱӉ෦ ߴ౳ֶߍଔۀɻେࡕܳज़େֶܳज़ֶ෦ө૾ܭըֶՊʢݱɾө૾ֶՊʣআ੶ɻ݂ӷܕ͸"ܕɻ࠺͸ອ ըՈͷ҆໺ϞϤίɻ

    • ཁ໿҇໺ल໌͸೔ຊͷΦλΫͰ͢ɻ • ఆٛจࢳҪॏཬʢ͍ͱ͍͛͠͞ͱɺ೥ʢত࿨೥ʣ݄೔ʣ͸ɺ೔ຊͷίϐʔϥΠ λʔɺΤοηΠετɺλϨϯτɺ࡞ࢺՈɻגࣜձࣾ΄΅೔୅දऔక໾ࣾ௕<>ɻגࣜձࣾΤΠϓ୅ දऔక໾<>ɻϑΟʔϧζגࣜձࣾࣾ֎औక໾<>ɻ࠺͸ঁ༏ͷṤޱՄೆࢠɻѪݘ͸δϟοΫɾϥο ηϧɾςϦΞͷϒΠίɺ೔ຊϞϊϙϦʔڠձձ௕ɻ݂ӷܕ͸"ܕ<>ɻ਎௕DNɻ 
 • ཁ໿ࢳҪॏཬ͸೔ຊͷίϐʔϥΠλʔͰ͢ɻ ग़ྗ
  5. ֓ཁͱfew-shotͷྫ: ആ۟(?)ͷੜ੒ • ղઆ͔Βആ۟Λੜ੒͠·͢ɻ • */͕֝ݹ͍஑ʹඈͼࠐΜͩ࣌ͷԻͷ༷ࢠΛӵΜͩ۟Ͱ͢ɻ͕֝஑ʹඈͼࠐΉԻΛදݱͨ͠୯७ͳ۟Ͱ͸͋Γ ·͕͢ɺपғͷ੩ऐ΍ऐΕͨݹ஑ͷ༷ࢠɺ͕֝஑ʹඈͼࠐΉੜͷ༂ಈͷΑ͏ͳ৘ܠ͕·͟·͟ͱ఻Θͬͯ͘ Δɺझͷ͋Δ۟ͱͳ͍ͬͯ·͢ɻ͜ͷ۟ͷقޠ͸ʰ֝ʱͰɺ͜Ε͸य़Λදݱ͍ͯ͠ΔقޠͰ͢ɻ • 065ݹ஑΍֝ඈ͜Ήਫͷ͓ͱ

    • */ࢁܗݝʹ͋ΔཱੴࣉʢΓͬ͠Ό͘͡ʣͰηϛ͕໐͍͍ͯΔ༷ࢠΛӵ·Εͨ۟Ͱ͢ɻ͜ͷ۟͸໌֬ͳ۟༁͸ Θ͔͍ͬͯͳ͍ͷͰɺগ͠ϛεςϦΞεͰӵΈղ͘͜ͱʹϩϚϯ͕͋Δആ۟Ͱ͢ɻʮؓ͞ʯͱʮઊͷ੠ʯͱ͍ ͏Ұݟໃ६͢Δදݱͷҙਤ͕ಛʹߟ͑ͤ͞ΒΕ·͢Ͷɻ • 065ؓ͞΍ؠʹ͠Έೖઊͷ੠ • */݄຤ͷॵ͍೔ͷޕޙͷձٞʹͱͯ΋૖େͳγεςϜ։ൃͷ࿩Λ͠Α͏ͱ͍ͯ͠Δ༷ࢠΛӵ·Εͨ۟Ͱ͢ɻ ୯७ͳ۟Ͱ͸͋Γ·͕͢ɺαϥϦʔϚϯͳΤϯδχΞ͕༷ʑͳݒ೦͕͋ΔதͰɺͦΕͰ΋͜ͷٕज़Λܗʹ͢Δ ͜ͱͰਓ޻஌ೳٕज़ͷະདྷΛ։͖͍ͨͱ͍͏ر๬Λ͜Ίͨؾ࣋ͪΛදݱ͍ͯ͠Δɻ 
 065
  6. ֓ཁͱfew-shotͷྫ: ആ۟(?)ͷੜ੒ • ղઆ͔Βആ۟Λੜ੒͠·͢ɻ • */͕֝ݹ͍஑ʹඈͼࠐΜͩ࣌ͷԻͷ༷ࢠΛӵΜͩ۟Ͱ͢ɻ͕֝஑ʹඈͼࠐΉԻΛදݱͨ͠୯७ͳ۟Ͱ͸͋Γ ·͕͢ɺपғͷ੩ऐ΍ऐΕͨݹ஑ͷ༷ࢠɺ͕֝஑ʹඈͼࠐΉੜͷ༂ಈͷΑ͏ͳ৘ܠ͕·͟·͟ͱ఻Θͬͯ͘ Δɺझͷ͋Δ۟ͱͳ͍ͬͯ·͢ɻ͜ͷ۟ͷقޠ͸ʰ֝ʱͰɺ͜Ε͸य़Λදݱ͍ͯ͠ΔقޠͰ͢ɻ • 065ݹ஑΍֝ඈ͜Ήਫͷ͓ͱ

    • */ࢁܗݝʹ͋ΔཱੴࣉʢΓͬ͠Ό͘͡ʣͰηϛ͕໐͍͍ͯΔ༷ࢠΛӵ·Εͨ۟Ͱ͢ɻ͜ͷ۟͸໌֬ͳ۟༁͸ Θ͔͍ͬͯͳ͍ͷͰɺগ͠ϛεςϦΞεͰӵΈղ͘͜ͱʹϩϚϯ͕͋Δആ۟Ͱ͢ɻʮؓ͞ʯͱʮઊͷ੠ʯͱ͍ ͏Ұݟໃ६͢Δදݱͷҙਤ͕ಛʹߟ͑ͤ͞ΒΕ·͢Ͷɻ • 065ؓ͞΍ؠʹ͠Έೖઊͷ੠ • */݄຤ͷॵ͍೔ͷޕޙͷձٞʹͱͯ΋૖େͳγεςϜ։ൃͷ࿩Λ͠Α͏ͱ͍ͯ͠Δ༷ࢠΛӵ·Εͨ۟Ͱ͢ɻ ୯७ͳ۟Ͱ͸͋Γ·͕͢ɺαϥϦʔϚϯͳΤϯδχΞ͕༷ʑͳݒ೦͕͋ΔதͰɺͦΕͰ΋͜ͷٕज़Λܗʹ͢Δ ͜ͱͰਓ޻஌ೳٕज़ͷະདྷΛ։͖͍ͨͱ͍͏ر๬Λ͜Ίͨؾ࣋ͪΛදݱ͍ͯ͠Δɻ 
 065 ֓ཁ GFXTIPU@ GFXTIPU@ ೖྗ
  7. ֓ཁͱfew-shotͷྫ: ആ۟(?)ͷੜ੒ • ղઆ͔Βആ۟Λੜ੒͠·͢ɻ • */͕֝ݹ͍஑ʹඈͼࠐΜͩ࣌ͷԻͷ༷ࢠΛӵΜͩ۟Ͱ͢ɻ͕֝஑ʹඈͼࠐΉԻΛදݱͨ͠୯७ͳ۟Ͱ͸͋Γ ·͕͢ɺपғͷ੩ऐ΍ऐΕͨݹ஑ͷ༷ࢠɺ͕֝஑ʹඈͼࠐΉੜͷ༂ಈͷΑ͏ͳ৘ܠ͕·͟·͟ͱ఻Θͬͯ͘ Δɺझͷ͋Δ۟ͱͳ͍ͬͯ·͢ɻ͜ͷ۟ͷقޠ͸ʰ֝ʱͰɺ͜Ε͸य़Λදݱ͍ͯ͠ΔقޠͰ͢ɻ • 065ݹ஑΍֝ඈ͜Ήਫͷ͓ͱ

    • */ࢁܗݝʹ͋ΔཱੴࣉʢΓͬ͠Ό͘͡ʣͰηϛ͕໐͍͍ͯΔ༷ࢠΛӵ·Εͨ۟Ͱ͢ɻ͜ͷ۟͸໌֬ͳ۟༁͸ Θ͔͍ͬͯͳ͍ͷͰɺগ͠ϛεςϦΞεͰӵΈղ͘͜ͱʹϩϚϯ͕͋Δആ۟Ͱ͢ɻʮؓ͞ʯͱʮઊͷ੠ʯͱ͍ ͏Ұݟໃ६͢Δදݱͷҙਤ͕ಛʹߟ͑ͤ͞ΒΕ·͢Ͷɻ • 065ؓ͞΍ؠʹ͠Έೖઊͷ੠ • */݄຤ͷॵ͍೔ͷޕޙͷձٞʹͱͯ΋૖େͳγεςϜ։ൃͷ࿩Λ͠Α͏ͱ͍ͯ͠Δ༷ࢠΛӵ·Εͨ۟Ͱ͢ɻ ୯७ͳ۟Ͱ͸͋Γ·͕͢ɺαϥϦʔϚϯͳΤϯδχΞ͕༷ʑͳݒ೦͕͋ΔதͰɺͦΕͰ΋͜ͷٕज़Λܗʹ͢Δ ͜ͱͰਓ޻஌ೳٕज़ͷະདྷΛ։͖͍ͨͱ͍͏ر๬Λ͜Ίͨؾ࣋ͪΛදݱ͍ͯ͠Δɻ • 065݄ͷ೤͍࠭ʹຒΊͨΔΘ͕ࠢ • ˡ͜ͷޙগ͠ฤू͢Δͱྑ͍ FH೤͖݄࠭ʹຒΊͨΔΘ͕ࠢ
  8. 39BϞσϧʹର͢Δओ؍ධՁͷ࣮ࢪ Natural response Following a topic Providing a topic or

    asking a question Achievement of goals Understanding of primary and secondary vocabulary 97.8% 98.4% 0.2% 83.5% Transition between two topics 90.9% 95.3% 2.2% 90.7% Responding to users' positive feelings about a topic 90.7% 95.1% 3.3% 89.9% Responding to users' negative feelings about a topic 87.2% 93.0% 3.5% 50.5% Regular chit-chat 92.4% 93.5% 8.6% ---
  9. HyperCLOVA(େن໛൚༻ݴޠϞσϧͷڻዼ͢΂͖఺ ܰྔͳେن໛൚༻ݴޠϞσϧͱҰઢΛը͢ΔੑೳΛ࣮ݱͰ͖Δ • ख๏: େྔͷύϥϝλ(ܭࢉػ) + େྔͷσʔλ + େྔͳܭࢉྔ •

    ඞཁͳίετ = ਺ेԯԁ + ਺ඦਓͷΤϯδχΞ + 1ϲ݄ ҰઢΛը͢ΔੑೳΛ࣮ݱͰ͖ΔՄೳੑ͕ߴ͍λεΫ • ݴޠϞσϧΛ׆༻͍ͯ͠Δ • ίʔύεʹؚ·ΕΔσʔλ͕౴͑ʹͳ͍ͬͯΔ • ༩͑ͨจ຺͕໌֬ͳλεΫͷఆٛʹͳ͍ͬͯΔ
  10. HyperCLOVAΛӡ༻͢Δଆͷ༷ʑͳ՝୊ྫ • Ϟσϧͷߏங΍ɺͦͷϞσϧΛ࢖ͬͨਪ࿦ͷ࣮ࢪɺͦΕΒࣗମ͕࠷େͷνϟϨϯδ • 8FCίʔύεͷߏங • ॏෳσʔλͷ࡟আ • ར༻͢Δ֤ΤϯτϦʹؔ͢Δઆ໌੹೚ͷ࣮ݱ •

    63-*%୯ҐͰͷ࡟আཁ੥΁ͷରԠ • "*ྙཧͷ࣮૷ • ೖग़ྗԠ༻ઌʹԠͯ͡ϑΟϧλϦϯάͱཧ༝ͷ໌ࣔ • ݸਓ৘ใͷಗ໊Խ͕ඞཁͳ৔߹ • ίʔύεσʔλιʔεʹԠͯ͡ಗ໊Խ • ग़ྗঢ়گʹԠͯ͡ಗ໊Խ • ΑΓίϯύΫτͳϞσϧͷௐ੔ͱ׆༻ • Ϟσϧߏங࣌Ҏ߱ʹൃੜͨ͠৽͍͠τϐοΫ΁ͷରԠ
  11. ༷ʑͳݴޠࢿݯ΍Ϟσϧͷߏங΋ඞཁ • λά෇͖ίʔύε • ΩʔϫʔυɾγϊχϜͷϦετ • 8FCαΠτ͝ͱͷΫϩʔϦϯάσʔλ • $PNNPO$SBXMJOH%BUBͷ࠶Ϋϩʔϧ ߋ৽

     • ஌ࣝϕʔε • ֤छͷର࿩γφϦΦ • ࢴॻ੶ͷ0$3 • ࣗಈධՁ༻ͷධՁσʔλ • খن໛ͳ൚༻ݴޠϞσϧͷվྑGJOFUVOJOHධՁ • େن໛ͳ൚༻ݴޠϞσϧͷQSPNQUJOHQSPNQUUVOJOHQBSBNFUFSUVOJOH
  12. NLU => Type of requests Datetime String Extraction => Data

    ۩ମྫ2: LINE௨࿩Λ༻͍ͨόΠτݕࡧαʔϏε
  13. LINEͰͰ͖ΔNLPٕज़ͷR&Dͷ·ͱΊ - LINEͷB2B: ྫɺLINE AiCall΍CLOVA OCR - LINEͷ֤αʔϏε΁ͷཁૉٕज़ͷಋೖ - LINEͷ೔ຊޠʹಛԽͨ͠େن໛൚༻ݴޠϞσϧͷߏஙɾධՁ

    - େن໛൚༻ݴޠϞσϧͷӡ༻ʹඞཁͳཁૉٕज़ͷ։ൃ - ֤छݴޠࢿݯͷ࡞੒ - ࿦จࣥච / ڞ௨λεΫ΁ͷࢀՃ / OSSͱͯ͠ެ։ɾߩݙ