$30 off During Our Annual Pro Sale. View Details »

[ASJ_22nd_summer_seminar] 系列変換でできる音声認識・音声合成+∞ -実装 is all we need-

Takuma OKAMOTO
August 24, 2021
350

[ASJ_22nd_summer_seminar] 系列変換でできる音声認識・音声合成+∞ -実装 is all we need-

Takuma OKAMOTO

August 24, 2021
Tweet

More Decks by Takuma OKAMOTO

Transcript

  1. ܥྻม׵ͰͰ͖ΔԻ੠ೝࣝɾԻ੠߹੒ʴ㱣 Š࣮૷JTBMMXFOFFEŠ Ԭຊ୓ຏ ࠃཱݚڀ։ൃ๏ਓɹ৘ใ௨৴ݚڀػߏ /*$5  ˞֤ςʔϚͷεϥΠυΧϥʔ ɹŠԻ੠ೝࣝܥ ɹŠԻ੠߹੒ܥ ɹŠԻ৔ऩ࿥ܥ

    ɹŠԻ৔੍ޚܥ UI"VH OE4VNNFS4FNJOBSPG"4+!Y
  2. ࣗݾ঺հ Ի੠ೝࣝɾԻ੠߹੒ɾػց຋༁ͱ͸ Կ͕೉͍͠ͷ͔  ஫ҙػߏ෇͖ܥྻม׵ϞσϧʹΑΔ࣮ݱ ܥྻม׵ϞσϧͷԠ༻ྫͱՄೳੑ㱣 ܥྻม׵Ϟσϧͷ՝୊ "UUFOUJPOJTBMMZPVOFFE࣮૷JTBMMXFOFFE ·ͱΊ એ఻ɿདྷ݄ͷݚڀൃදձͰͷൃද

    ෇࿥ ຊ೔ͷൃද 
  3. ࣗݾ঺հ  Ԭຊ୓ຏ ݚڀςʔϚ Իڹ৴߸ॲཧɿಛʹԻ৔ऩ࿥ɾ੍ޚ ϚΠΫϩϗϯɾεϐʔΧΞϨΠ৴߸ॲཧ 
 ೥݄ʙ೥݄ɿ౦๺େֶઌ୺Ի৘ใγεςϜ म࢜ɾത࢜ɾϙευΫ 


    ೥݄ʙ೥݄ɿ/*$5௒ྟ৔ײϓϩδΣΫτˏ/*$5 
 ೥ʙݱࡏɿࣗ਎ͷՊݚඅ!/*$5 Ի੠ॲཧ 
 ೥݄ʙ೥݄ɿԻ੠ର࿩ɾݴޠࣝผ!/*$5 
 ೥݄ʙݱࡏɿχϡʔϥϧωοτϫʔΫΛ༻͍ͨԻ੠߹੒ɾԻ੠೾ܗੜ੒ झຯ ҿΈձ ίϩφͷͨΊօແˠࣗ୐൩ऌঢ়ଶ ɼδϣΪϯά ݄ؒΩϩ  ঺հϖʔδ /*$5ݚڀ৬ɾݚڀٕज़৬࠾༻αΠτɿIUUQTXXXOJDUHPKQFNQMPZNFOUSFTFBSDIFSPLBNPUPUBLVNBIUNM ೔ຊԻڹֶձࢽΩϟϦΞύεখಛूʮೋీΛ௥͏΋ͷԿీΛಘΔ ʯɿIUUQTEPJPSHKBTK@ ͦͷଞɿ೥݄ʙ೥݄ɿ೔ຊԻڹֶձֶੜɾएखϑΥʔϥϜװࣄձୈظ୅ද ଍ͷΘΒ͡ݚڀੜ׆೥໨ˠඇৗʹָ͍͠ ͚Ͳ๩͍͠
  4. /*$5͕ఏڙ͢ΔԻ੠຋༁ΞϓϦ7PJDF5SBΛྫʹ Ի੠ೝࣝ ສਓͷԻ੠ΛςΩετ΁ม׵ ػց຋༁ ೖྗ͞ΕͨςΩετΛผͷݴޠ΁ม׵ ςΩετԻ੠߹੒ ೖྗ͞ΕͨςΩετΛԻ੠৴߸΁ม׵ Ի੠ೝࣝɾԻ੠߹੒ɾػց຋༁ͱ͸ 

  5. χϡʔϥϧԻ੠߹੒σϞ Šࢲͷ୅ΘΓʹࠃࡍձٞϏσΦͰ࿩͍͍ͯͨͩͨ͠Š 

  6. Noise level limited sub-modeling for diffusion probabilistic vocoders Takuma Okamoto1,

    Tomoki Toda2,1, Yoshinori Shiga1* and Hisashi Kawai1 1National Institute of Information and Communications Technology (NICT), Japan 2Nagoya University, Japan *Y. Shiga is currently with the Tokyo Denki University, Japan WaveGrad + DiffWave 
  7. ೉͠͞ͷͭɿೖྗͱग़ྗͷ௕͕͞શવҧ͏ ࣮ࡍʹԻ੠߹੒ͯ͠Έͨྫ  ςΩετɿจࣈ ʴ۟఺ ɼԻૉྻɿ Իڹಛ௃ྔ ϝϧεϖΫτϩάϥϜ ɿϑϨʔϜ γϑτྔNT

     Ի੠೾ܗ αϯϓϦϯάप೾਺L)[ ɿ αϯϓϧ Կ͕೉͍͠ͷ͔  ͋ΒΏΔݱ࣮Λɼ͢΂ͯࣗ෼ͷ΄͏΁Ͷ͡ۂ͛ͨͷͩ BSBZVSVHF/KJUTVPQBVTVCFUFKJCV/OPIPPFOFKJNBHFUBOPEB 
  8. ػց຋༁͔Βੜ·Εͨ஫ҙػߏ෇͖ܥྻม׵χϡʔϥϧωοτϫʔΫϞσϧ ೖྗ.ͱग़ྗ/ͷ௕͕͞ҧ͏ˠߦྻԋࢉʹΑͬͯม׵Մೳɿ<"Y.>Y<.Y/><"Y/> .ߦ/ྻͷม׵ߦྻ ೖྗͱग़ྗͱͷҐஔؔ܎ΞϥΠϝϯτ ΋ֶशʹΑΓࣗಈ֫ಘ͢ΔҰ؏ֶश ɹ஫ҙػߏ෇͖ܥྻม׵ϞσϧʹΑΔ࣮ݱ  σίʔμ ςΩετ Իૉྻ

     . Τϯίʔμ / ɾ஫ҙػߏ "UUFOUJPO ߦྻ ɹೖྗͷͲ͜ʹ஫ҙΛ޲͚Δ͔ ɾΤϯίʔμͱσίʔμʹ΋ ɹ஫ҙػߏΛ࣋ͨͤΔ ɹࣗݾ஫ҙػߏ 4FMGBUUFOUJPO ˣ ຋༁ɾೝࣝɾ߹੒͚ͩͰͳ͘ ༷ʑͳ෼໺ʹͯ׆༻
  9. ܥྻม׵Ϟσϧͷಛ௃ ೖྗͱग़ྗͷϖΞσʔλ͕͋Ε͹͍Ζ͍Ζͱ΋࢖͑Δ ྫɿςΩετˡˠԻ੠  ຋༁ɾೝࣝɾ߹੒Ͱಉ͡ωοτϫʔΫ͕࢖͑ΔͷͰॳֶऀͷෑډ͸௿͍ ෇࿥ࢀর  Ի੠෼໺͚ͩͰ΋Ԡ༻͸༷ʑ ը૾ೝࣝ౳Ͱ΋ଟ਺࠾༻ 

    ࿩ऀμΠΞϦθʔγϣϯɿऩ࿥Ի੠ˠ୭͕͍ͭ࿩͔ͨ͠ ෳ਺࿩ऀԻ੠ೝࣝɿऩ࿥Ի੠ˠ୭͕ԿΛ࿩͔ͨ͠ &OEUPFOEԻ੠຋༁ɿ೔ຊޠԻ੠ˠɹɹɹɹɹˠӳޠ຋༁Ի੠ Ի੠ڧௐɿϊΠδʔԻ੠ˠΫϦʔϯԻ੠ Իݯ෼཭ɿࠞ߹Ի੠ˠ෼཭Ի੠ ԻڹΠϕϯτݕग़ɿऩ࿥೾ܗˠΠϕϯτϥϕϧ แஸͰ໺ࡊΛ੾ΔԻɼͳͲ  ੠࣭ม׵ɿ࿩ऀ"ͷԻ੠ˠɹɹɹɹɹɹˠ࿩ऀ#ͷԻ੠ ܥྻม׵ϞσϧͷԠ༻ྫ  ෼໺໊ ྫɿTQFFDIFOIBODFNFOU  BUUFOUJPO Ͱݕࡧ͢Δͱ͍Ζ͍Ζݟ͔ͭΔ
  10.  ௚ઢεϐʔΧΞϨΠΛ༻͍ͨϚϧνεϙοτ࠶ੜγεςϜ *$"441 +"DPVTU4PD"N ͜Μʹͪ͸ )FMMP 㟬޷ Japanese area <latexit

    sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> English area <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Chinese area <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Bright zone (Listening area) <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Dark zone (Quiet area) <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Իͷฉ͑͜Δ ΤϦΞ Իͷฉ͑͜ͳ͍ ΤϦΞ ॏͶ߹Θͤ
  11.  χϡʔϥϧԻ੠຋༁ٕज़ͱϚϧνεϙοτ࠶ੜٕज़ͷ༥߹ ଟݴޠϚϧνεϙοτ࠶ੜܕಉ࣌௨༁γεςϜ ೥݄/*$5খۚҪΦʔϓϯϋ΢ε αΠΤϯετʔΫ͓Αͼ"453&$σϞలࣔʹͯެ։ ͷ͸͕ͣίϩφʹΑΓதࢭ ೔ຊޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ͸ )FMMP

    㟬޷ ӳޠ͚͕ͩ ฉ͑͜ΔΤϦΞ தࠃޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ͸ ଟݴޠ χϡʔϥϧ Ի੠ೝࣝ ͜Μʹͪ͸ ςΩετ ଟݴޠ χϡʔϥϧ ػց຋༁ ͜Μʹͪ͸ ςΩετ 㟬޷ ςΩετ )FMMP ςΩετ ଟݴޠ χϡʔϥϧ Ի੠߹੒ ͜Μʹͪ͸ Ի੠ 㟬޷ Ի੠ )FMMP Ի੠ Ϛϧνεϙοτ ࠶ੜ
  12.  ͜ΜͳԠ༻ྫ΋ΞϦ &OEUPFOEଟݴޠϚϧνεϙοτ࠶ੜܕಉ࣌௨༁γεςϜ Ԡ༻ઌ͸㱣 ೔ຊޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ͸ )FMMP 㟬޷ ӳޠ͚͕ͩ

    ฉ͑͜ΔΤϦΞ தࠃޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ͸ ଟݴޠ χϡʔϥϧ Ի੠ೝࣝ ͜Μʹͪ͸ ςΩετ ଟݴޠ χϡʔϥϧ ػց຋༁ ͜Μʹͪ͸ ςΩετ 㟬޷ ςΩετ )FMMP ςΩετ ଟݴޠ χϡʔϥϧ Ի੠߹੒ ͜Μʹͪ͸ Ի੠ 㟬޷ Ի੠ )FMMP Ի੠ Ϛϧνεϙοτ ࠶ੜ ܥྻม׵χϡʔϥϧωοτ
  13. Ի੠ೝࣝ΍Ի੠߹੒͸஫ҙػߏߦྻͷॏΈ͕ର֯ʹͳΔඞཁ͋Γ ɹ஫ҙػߏਪఆ͕ࣦഊͨ͠৔߹ͷ࢒೦ͳྫ  ஫ҙػߏͷਪఆ͕ࣦഊ͢Δͱʜ ࣮αʔϏεͰ͸࢖͑ͳ͍ˠݚڀ՝୊

  14. ݚڀΛՃ଎ͤ͞Δެ։࣮૷Ϟσϧ ओʹ1ZUIPO ɾίʔύε (JUIVCʹΑΔιʔείʔυͷެ։ ެ։ίʔύε Ի੠߹੒ ɿ-+4QFFDI ӳޠ ɼ7$5, ӳޠෳ਺࿩ऀ

    ɼ-JCSJ554 ӳޠෳ਺࿩ऀ ɼ+465 ೔ຊޠ ɼ +74 ೔ຊޠෳ਺࿩ऀ ɼʜ &41OFUFOEUPFOETQFFDIQSPDFTTJOHUPPMLJU ܥྻม׵ &OEUPFOE ༻Ի੠ॲཧπʔϧΩοτɿӳޠ͚ͩͲ ओ࠵ऀؚΊ ೔ຊਓ΋ଟ਺ࢀը Ի੠ೝࣝɼԻ੠߹੒ɼԻ੠຋༁ɼԻ੠ڧௐɼ੠࣭ม׵ɼԻ੠ݴޠཧղɼͰಉؔ͡਺͕࢖ΘΕ͍ͯΔˠԠ༻ઌ㱣 Α͘Θ͔Βͳ͍ ࣮૷ͯ͠ΈΑ͏  (PPHMFͷܥྻม׵ 5SBOTGPSNFS ͷ࿦จλΠτϧ l"UUFOUJPOJTBMMZPVOFFEz ˣ ࣮૷JTBMMXFOFFE ཧղ͢Δʹ͸࣮૷͋ΔͷΈମͰཧղ͢Δ ࣗ਎ͷ࣮૷͕ಈ͘ˠೝࣝͰ͖ΔPS߹੒Ͱ͖Δˠײແྔ
  15. छͷਆث  ܥྻม׵Ϟσϧ͸ষ ܥྻม׵Ϟσϧ͸ɾষ

  16. χϡʔϥϧωοτͷجૅ͸ͪ͜Β  %FFQ͔ͩΒਂւڕ

  17. ܥྻม׵Ϟσϧͱ͸ ೖྗͱܥྻ௕͕ҧ͏ग़ྗ΁ͷม׵͕ՄೳɿԻ੠ςΩετɼςΩετԻ੠ɼʜ ܥྻ௕Λม͑ΔͨΊͷߦྻΛֶश ॴ๬ͷग़ྗΛಘΔʹ͸ೖྗͷͲ͜ʹ஫໨͢Δͷ͔Λֶश Ԡ༻ઌ͸㱣 ࣮૷JTBMMXFOFFE ཧղ͢ΔͨΊʹ͸ମͰ֮͑Δ࣮૷͋ΔͷΈ ࣮૷Ͱ͖Δ؀ڥ͸े෼ʹ͋ΔɿHJUIVCɼίʔύε ࠓޙ͸΋ͬͱ૿͑Δϋζ 

    Ի੠߹੒ɿ࣮ࡍʹ࡞ͬͨϞσϧͰ͠Ό΂ͬͨ࣌͸ײಈ ·ͱΊ 
  18. ߴ଎࠶ੜɾෳ਺࿩ऀԻ੠೾ܗੜ੒χϡʔϥϧωοτϫʔΫϞσϧ ̍ɿদݪ ਆށେ.ɿ/*$5ݚमੜ ɼԬຊɼߴౡ ਆށେ ɼୌޱ ਆށେ ɼށా ໊େ ɼՏҪɼ

    )J'J("/Ϙίʔμʹ͓͚Δ-1$/FUಛ௃ྔͷݕ౼ ݴޠ֫ಘΤʔδΣϯτ Ի੠೾ܗੜ੒χϡʔϥϧωοτϫʔΫϞσϧ  εϖγϟϧηογϣϯ ɿాத ౦޻େमྃੜ ɼԬຊɼࣰ࡚ ౦޻େ ɼ Ի੠ݴޠ֫ಘγεςϜͷͨΊͷ8BWF(SBEΛ༻͍ͨԻ੠ൃ੠ػߏͱൃԻదԠ $16ͷΈͰߴ଎ੜ੒ɾߴ඼࣭೔ຊޠχϡʔϥϧԻ੠߹੒Ϟσϧ 1ɿԬຊɼށా ໊େ ɼՏҪ ڧ੍ΞϥΠϝϯτ൛1BSBMMFM5BDPUSPOͱ)J'J("/Λ༻͍ͨ$16ܕϦΞϧλΠϜ೔ຊޠχϡʔϥϧςΩετԻ ੠߹੒γεςϜͷ࣮૷ ෳ਺ྖҬԻ৔੍ޚ ࣗ਎ͷՊݚඅςʔϚ  ɿԬຊɼ Ի৔෼཭ͱ಺෦֎෦ಉ੍࣌ޚʹجͮ͘ϚϧνԻ৔੍ޚ એ఻ɿདྷ݄ͷݚڀൃදձͰͷൃද 
  19. Thank you for your !! Ԭຊ୓ຏ (Takuma OKAMOTO)ˏNICT e-mail: HP:

    https://www.okamotocamera.com Twitter: @okamotocamerea 
  20. Ի੠ೝࣝͱԻ੠߹੒ɼԿ͕೉͍͠ͷ͔  ڞ௨ɿೖྗͱग़ྗͷ௕͕͞શવҧ͏ˠԻڹಛ௃ྔɿ਺ඦϑϨʔϜɼςΩετɿ਺ेจࣈ ೝࣝɿશਓྨͷൃ࿩͕λʔήοτ ലେͳଟ༷ੑ ɼ࿩଎΋όϥόϥ ߹੒ɿೖྗͱग़ྗͷ௕͞͸͞Βʹશવҧ͏ˠԻ੠೾ܗ L)[ ɿͨͬͨඵͰສαϯϓϧ ͜Ε·ͰͷԻ੠ೝࣝͱԻ੠߹੒ɿϋʔυϧߴ͍

    ઐ໳஌ࣝଟ਺ඞཁ  ม׵·Ͱʹ͸༷ʑͳϞδϡʔϧΛͦΕͧΕֶशɾ࿈݁ ڞ௨ɿԻڹಛ௃ྔͱςΩετͱͷҐஔؔ܎Λֶश ΞϥΠϝϯτ  ೝࣝɿԻڹϞσϧɼൃԻࣙॻɼݴޠϞσϧɼσίʔμʔ ߹੒ɿԻૉܧଓ௕ϞσϧɼԻڹϞσϧɼ೾ܗੜ੒Ϟσϧ Ϙίʔμ  ܥྻม׵Ϟσϧͷొ৔ɿϋʔυϧ௿͍ ઐ໳஌ࣝ͸χϡʔϥϧωοτ͕Χόʔ  ͭͷχϡʔϥϧωοτʹΑΔҰׅม׵͕Մೳ ೝࣝɿԻڹಛ௃ྔˠ<ܥྻม׵Ϟσϧ>ˠ୯ޠྻ ςΩετ  ߹੒ɿςΩετ Իૉܥྻ ˠ<ܥྻม׵Ϟσϧ>ˠԻڹಛ௃ྔˠ<೾ܗੜ੒Ϟσϧ>ˠԻ੠೾ܗ 
 ɹɹɹςΩετ Իૉܥྻ ˠ<ܥྻม׵Ϟσϧʴ೾ܗੜ੒Ϟσϧ>ˠԻ੠೾ܗ ෇࿥ɿԻ੠ೝࣝɾԻ੠߹੒ͷ͜Ε·Ͱͱݱࡏ