Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LLM으로 음성인식 성능 개선하기

kakao
November 01, 2024

LLM으로 음성인식 성능 개선하기

#Kanana #MultiModal #LLM #음성인식 #STT

LLM을 활용하여 기존 E2E 기반 음성인식기의 성능을 개선한 방법을 소개합니다.
일반적으로 LLM 기반의 디코더를 E2E 음성인식기에 적용할 경우 성능은 높아지지만, 디코딩 속도가 현저히 느려지는 문제가 있습니다.
이때 LLM 디코더의 언어 능력을 E2E 음성인식기의 디코더로 전이시켜 연산량은 유지하면서도 음성인식률을 크게 향상시키는 방법을 공유합니다.

발표자 : jessie.e, heize.v
카나나 알파 Multimodal LLM Application 조직의 제씨입니다.
오디오, 언어 등 다양한 모달리티의 입출력을 다루는 멀티모달 LLM을 연구하고 있습니다.

카나나 알파 Multimodal LLM Application 조직의 헤이즈입니다.
사람처럼 듣고 이해하는 실용적인 AI를 연구하고 있습니다.

kakao

November 01, 2024
Tweet

More Decks by kakao

Other Decks in Programming

Transcript

  1. ӝઓ੄਺ࢿੋध "DPVTUJD 
 .PEFM %FDPEJOH -FYJDPO -BOHVBHF .PEFM 
 tয়טզॿঌ۰઻u

    p(X|S) p(W) p(S|W) ੑ۱ػ਺ࢿ੉যڃޙ੗ৌ۽੉ܖযઉ੓ਸഛܫ੉о੢֫਷૑ܳ଺חѪ
  2. &OE - UP - FOE਺ࢿੋध &OE - UP - FOE

    .PEFM tয়טզॿঌ۰઻u
  3. &OE - UP - FOE਺ࢿੋधݽ؛ $5$ "&% 3// - 5

    &ODPEFS 4PGUNBY  31SBCIBWBMLBSFUBM t&OE - UP - &OE4QFFDI3FDPHOJUJPO"4VSWFZ u*&&&"$.5SBOTPO"VEJP4QFFDIBOE-BOH1SPD WPM QQ  &ODPEFS %FDPEFS 4PGUNBY +PJOU 
 /FUXPSL 
 4PGUNBY &ODPEFS 1SFEJDUJPO 
 /FUXPSL
  4. &OE - UP - FOE਺ࢿੋधݽ؛  4,JN 5)PSJ BOE48BUBOBCF t+PJOUDUD

    - BUUFOUJPOCBTFEFOE - UP - FOETQFFDISFDPHOJUJPOVTJOHNVMUJ - UBTLMFBSOJOH uJO1SPD*$"441  QQm j x1 x2 TP h1 h2 h3 h4 hL xT x3 x4 x5 x6 x7 x8 @ @ j y1 y2 @ FP h y1 y2 s0 s1 s2 su a0 a1 a2 au j &ODPEFS %FDPEFS $5$ LAttention LCTC LMTL = λLCTC + (1 − λ)LAttention λ : 0 ≤ λ ≤ 1
  5. --.ਸ੸ਊೠ਺ࢿੋधݽ؛  5SBOTGPSNFS 
 %FDPEFS "VEJP 
 &ODPEFS  +8VFUBM

    t0OEFDPEFS - POMZBSDIJUFDUVSFGPSTQFFDI - UP - UFYUBOEMBSHFMBOHVBHFNPEFMJOUFHSBUJPO uJO1SPD"436  QQm 
  :'BUIVMMBIFUBM t1SPNQUJOHMBSHFMBOHVBHFNPEFMTXJUITQFFDISFDPHOJUJPOBCJMJUJFT uJO1SPD*$"441  QQm
  6. TUBDLJOH -BSHF-BOHVBHF.PEFM "VEJP 
 &ODPEFS --.ਸ੸ਊೠ਺ࢿੋधݽ؛   +8VFUBM t0OEFDPEFS

    - POMZBSDIJUFDUVSFGPSTQFFDI - UP - UFYUBOEMBSHFMBOHVBHFNPEFMJOUFHSBUJPO uJO1SPD"436  QQm 
  :'BUIVMMBIFUBM t1SPNQUJOHMBSHFMBOHVBHFNPEFMTXJUITQFFDISFDPHOJUJPOBCJMJUJFT uJO1SPD*$"441  QQm
  7. 3FTQPOTF - CBTFELOPXMFEHFEJTUJMMBUJPO  ૑धઙܨীٮܲ,OPXMFEHF%JTUJMMBUJPO࠙ܨ 5FBDIFS 
 -PHJUT 4UVEFOU 


    -PHJUT %JTUJMMBUJPO-PTT  +(PV #:V 4+.BZCBOL BOE%5BP t,OPXMFEHFEJTUJMMBUJPO"TVSWFZ u*OUFSOBUJPOBM+PVSOBMPG$PNQVUFS7JTJPO WPM OP QQm 
  8. 'FBUVSF - CBTFELOPXMFEHFEJTUJMMBUJPO  ૑धઙܨীٮܲ,OPXMFEHF%JTUJMMBUJPO࠙ܨ -BZFS -BZFS -BZFS/ -PHJUT -BZFS

    -BZFS -BZFS. -PHJUT %JTUJMMBUJPO-PTT 5FBDIFS.PEFM 4UVEFOU.PEFM    +(PV #:V 4+.BZCBOL BOE%5BP t,OPXMFEHFEJTUJMMBUJPO"TVSWFZ u*OUFSOBUJPOBM+PVSOBMPG$PNQVUFS7JTJPO WPM OP QQm 
  9. 0G fl JOFEJTUJMMBUJPO  ࢎ੹೟णػҮࢎݽ؛ਸࢎਊೞৈ೟ࢤݽ؛ਸ೟ण  Үࢎݽ؛਷Ҋ੿غয੓ਵݴ ೟ࢤݽ؛݅೟ण  فݽ؛੉࢚ഐ੘ਊ੉ࠛоמ

     ೟णઙܨীٮܲ,OPXMFEHF%JTUJMMBUJPO࠙ܨ 0OMJOFEJTUJMMBUJPO  Үࢎݽ؛җ೟ࢤݽ؛ਸزदী೟ण  Үࢎݽ؛੉࢜۽਍ؘ੉ఠܳ୊ܻೞݶࢲ೟ࢤݽ؛ ীѱ૑धਸ੹׳  ֫਷҅࢑੗ਗ೙ਃ ೟णࠛউ੿ࢿૐо  5FBDIFS 
 .PEFM 4UVEFOU 
 .PEFM 5FBDIFS 
 .PEFM 4UVEFOU 
 .PEFM
  10. --.ਸ੉ਊೠ૑धૐܨݽ؛ 5SBJOJOHMPTT L = αLCTC + (1 − α)LAttention +

    βLLLM + γLKD "VEJP 
 &ODPEFS 5SBOTGPSNF S 
 --. 
 %FDPEFS ,OPXMFEHFEJTUJMMBUJPOMPTT $5$MPTT $SPTTFOUSPQZMPTT $SPTTFOUSPQZMPTT 5SBJOJOHPOMZ
  11. प೷ജ҃ ׮Ҵয ೠҴয %# '-&634 ѐ঱য ,TQPOTQFFDI "VEJP&ODPEFS 8BWWFD -

    #&35 8IJTQFS - MBSHF - W --.%FDPEFS --B."# *O - IPVTF,PSFBO--.# 5SBOTGPSNFS%FDPEFS 5SBOTGPSNFSMBZFST BUUFOUJPOIFBET IJEEFOVOJUT ೟णߑߨ -PX - 3BOL"EBQUBUJPO -P3"
  12. $&3    XF FF DNO TTB TB TFB

    DKL ਺੺ੋधয়ܨਯࢿמѐࢶ '-&634 ▇#BTFMJOF ▇--.EFDPEFS - CBTFENPEFM ▇0VST
  13. $&3    XF FF DNO TTB TB TFB

    DKL -BZFSNBQQJOHTUSBUFHJFTীٮܲ਺੺ੋधয়ܨਯࢿמ࠺Ү 
 '-&634 ▇6OJGPSNMBZFSTUSBUFHZ ▇6QQFSMBZFSTUSBUFHZ ▇-PXFSMBZFSTUSBUFHZ
  14. пݽ؛߹पઁੋधѾҗ࠺Ү ੿׹ 4&(3&("5*0/"/%3&$0.#*/"5*0/4)6''-&7"3*"5*0/#"$,"/%'035)#&58&&/ 5)&580100-48*5)&"$)(&/&3"5*0/ #BTFMJOF $*(3&("5*0/)&"-5)3&$0.#*/"5*0/4)6''-&7"3*"5*0/#"$,"/%'03#&58&&/ 5)&58010-&48*5)&"$)(&/&3"5*0/ --.EFDPEFS - CBTFENPEFM

    4&(3&("5*0/"/%3&$0.#*/"5*0/4)6''-&7"3*"5*0/#"$,"/%'035)#&58&&/ 5)&58010-&48*5)&"$)(&/&3"5*0/ 0VST 4&(3&("5*0/"/%3&$0.#*/"5*0/4)6''-&7"3*"5*0/#"$,"/%'035)#&58&&/ 5)&58010-&48*5)&"$)(&/&3"5*0/
  15. SFMBUJWF35'    XF FF DNO TTB TB TFB

    DKL ୶ۿࣘبѐࢶ '-&634 ▇#BTFMJOF ▇--.EFDPEFS - CBTFENPEFM ▇0VST
  16. 2"