Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[ASJ_22nd_summer_seminar] 系列変換でできる音声認識・音声合成+∞ -...
Search
Takuma OKAMOTO
August 24, 2021
1
550
[ASJ_22nd_summer_seminar] 系列変換でできる音声認識・音声合成+∞ -実装 is all we need-
Takuma OKAMOTO
August 24, 2021
Tweet
Share
More Decks by Takuma OKAMOTO
See All by Takuma OKAMOTO
2025/7/5 応用音響研究会招待講演@北海道大学
takuma_okamoto
1
260
2025/1/30「システムデザイン論」@東京都立大学日野キャンパス
takuma_okamoto
0
150
[INTERSPEECH 2024] Challenge of singing voice synthesis using only text-to-speech corpus with FIRNet source-filter neural vocoder
takuma_okamoto
0
200
[Internoise 2023 (invited)] Multilingual sound spot synthesis systems
takuma_okamoto
0
360
マルチスポット再生 meets 多言語同時通訳システム
takuma_okamoto
0
240
[SPEASIP 2023招待講演] マルチスポット再生 meets 多言語ニューラル音声合成 ~実装 is ホンマに all we need~
takuma_okamoto
1
360
和歌山大学2022年度教養科目「世界の情報通信を知る」:音響・音声情報処理編
takuma_okamoto
0
250
[asj2022a] 16チャネル小型円形スピーカアレイを用いたマルチスポット再生システムの実装
takuma_okamoto
0
500
[asj2022a] Harmonic-Net+:高調波入力とLayerwise-Quasi-Periodic畳み込みを用いた基本周波数制御可能な高速ニューラルボコーダ
takuma_okamoto
0
340
Featured
See All Featured
How to Ace a Technical Interview
jacobian
281
24k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.4k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
170
Tell your own story through comics
letsgokoyo
1
800
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
1
1.4k
Everyday Curiosity
cassininazir
0
130
ラッコキーワード サービス紹介資料
rakko
1
2.2M
Building an army of robots
kneath
306
46k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
89
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
120
The Curious Case for Waylosing
cassininazir
0
230
Transcript
ܥྻมͰͰ͖ΔԻೝࣝɾԻ߹ʴ㱣 ࣮JTBMMXFOFFE Ԭຊຏ ࠃཱݚڀ։ൃ๏ਓɹใ௨৴ݚڀػߏ /*$5 ˞֤ςʔϚͷεϥΠυΧϥʔ ɹԻೝࣝܥ ɹԻ߹ܥ ɹԻऩܥ
ɹԻ੍ޚܥ UI"VH OE4VNNFS4FNJOBSPG"4+!Y
ࣗݾհ ԻೝࣝɾԻ߹ɾػց༁ͱ Կ͕͍͠ͷ͔ ҙػߏ͖ܥྻมϞσϧʹΑΔ࣮ݱ ܥྻมϞσϧͷԠ༻ྫͱՄೳੑ㱣 ܥྻมϞσϧͷ՝ "UUFOUJPOJTBMMZPVOFFE࣮JTBMMXFOFFE ·ͱΊ એɿདྷ݄ͷݚڀൃදձͰͷൃද
ຊͷൃද
ࣗݾհ Ԭຊຏ ݚڀςʔϚ Իڹ৴߸ॲཧɿಛʹԻऩɾ੍ޚ ϚΠΫϩϗϯɾεϐʔΧΞϨΠ৴߸ॲཧ ݄ʙ݄ɿ౦େֶઌԻใγεςϜ म࢜ɾത࢜ɾϙευΫ
݄ʙ݄ɿ/*$5ྟײϓϩδΣΫτˏ/*$5 ʙݱࡏɿࣗͷՊݚඅ!/*$5 Իॲཧ ݄ʙ݄ɿԻରɾݴޠࣝผ!/*$5 ݄ʙݱࡏɿχϡʔϥϧωοτϫʔΫΛ༻͍ͨԻ߹ɾԻܗੜ झຯ ҿΈձ ίϩφͷͨΊօແˠࣗ൩ऌঢ়ଶ ɼδϣΪϯά ݄ؒΩϩ հϖʔδ /*$5ݚڀ৬ɾݚڀٕज़৬࠾༻αΠτɿIUUQTXXXOJDUHPKQFNQMPZNFOUSFTFBSDIFSPLBNPUPUBLVNBIUNM ຊԻڹֶձࢽΩϟϦΞύεখಛूʮೋీΛ͏ͷԿీΛಘΔ ʯɿIUUQTEPJPSHKBTK@ ͦͷଞɿ݄ʙ݄ɿຊԻڹֶձֶੜɾएखϑΥʔϥϜװࣄձୈظද ͷΘΒ͡ݚڀੜ׆ˠඇৗʹָ͍͠ ͚Ͳ͍͠
/*$5͕ఏڙ͢ΔԻ༁ΞϓϦ7PJDF5SBΛྫʹ Իೝࣝ ສਓͷԻΛςΩετม ػց༁ ೖྗ͞ΕͨςΩετΛผͷݴޠม ςΩετԻ߹ ೖྗ͞ΕͨςΩετΛԻ৴߸ม ԻೝࣝɾԻ߹ɾػց༁ͱ
χϡʔϥϧԻ߹σϞ ࢲͷΘΓʹࠃࡍձٞϏσΦͰ͍͍ͯͨͩͨ͠
Noise level limited sub-modeling for diffusion probabilistic vocoders Takuma Okamoto1,
Tomoki Toda2,1, Yoshinori Shiga1* and Hisashi Kawai1 1National Institute of Information and Communications Technology (NICT), Japan 2Nagoya University, Japan *Y. Shiga is currently with the Tokyo Denki University, Japan WaveGrad + DiffWave
͠͞ͷͭɿೖྗͱग़ྗͷ͕͞શવҧ͏ ࣮ࡍʹԻ߹ͯ͠Έͨྫ ςΩετɿจࣈ ʴ۟ ɼԻૉྻɿ Իڹಛྔ ϝϧεϖΫτϩάϥϜ ɿϑϨʔϜ γϑτྔNT
Իܗ αϯϓϦϯάपL)[ ɿ αϯϓϧ Կ͕͍͠ͷ͔ ͋ΒΏΔݱ࣮Λɼͯࣗ͢ͷ΄͏Ͷ͡ۂ͛ͨͷͩ BSBZVSVHF/KJUTVPQBVTVCFUFKJCV/OPIPPFOFKJNBHFUBOPEB
ػց༁͔Βੜ·Εͨҙػߏ͖ܥྻมχϡʔϥϧωοτϫʔΫϞσϧ ೖྗ.ͱग़ྗ/ͷ͕͞ҧ͏ˠߦྻԋࢉʹΑͬͯมՄೳɿ<"Y.>Y<.Y/><"Y/> .ߦ/ྻͷมߦྻ ೖྗͱग़ྗͱͷҐஔؔΞϥΠϝϯτ ֶशʹΑΓࣗಈ֫ಘ͢ΔҰ؏ֶश ɹҙػߏ͖ܥྻมϞσϧʹΑΔ࣮ݱ σίʔμ ςΩετ Իૉྻ
. Τϯίʔμ / ɾҙػߏ "UUFOUJPO ߦྻ ɹೖྗͷͲ͜ʹҙΛ͚Δ͔ ɾΤϯίʔμͱσίʔμʹ ɹҙػߏΛ࣋ͨͤΔ ɹࣗݾҙػߏ 4FMGBUUFOUJPO ˣ ༁ɾೝࣝɾ߹͚ͩͰͳ͘ ༷ʑͳʹͯ׆༻
ܥྻมϞσϧͷಛ ೖྗͱग़ྗͷϖΞσʔλ͕͋Ε͍Ζ͍Ζͱ͑Δ ྫɿςΩετˡˠԻ ༁ɾೝࣝɾ߹Ͱಉ͡ωοτϫʔΫ͕͑ΔͷͰॳֶऀͷෑډ͍ ࢀর Ի͚ͩͰԠ༻༷ʑ ը૾ೝࣝͰଟ࠾༻
ऀμΠΞϦθʔγϣϯɿऩԻˠ୭͕͍͔ͭͨ͠ ෳऀԻೝࣝɿऩԻˠ୭͕ԿΛ͔ͨ͠ &OEUPFOEԻ༁ɿຊޠԻˠɹɹɹɹɹˠӳޠ༁Ի ԻڧௐɿϊΠδʔԻˠΫϦʔϯԻ Իݯɿࠞ߹ԻˠԻ ԻڹΠϕϯτݕग़ɿऩܗˠΠϕϯτϥϕϧ แஸͰࡊΛΔԻɼͳͲ ࣭มɿऀ"ͷԻˠɹɹɹɹɹɹˠऀ#ͷԻ ܥྻมϞσϧͷԠ༻ྫ ໊ ྫɿTQFFDIFOIBODFNFOU BUUFOUJPO Ͱݕࡧ͢Δͱ͍Ζ͍Ζݟ͔ͭΔ
ઢεϐʔΧΞϨΠΛ༻͍ͨϚϧνεϙοτ࠶ੜγεςϜ *$"441 +"DPVTU4PD"N ͜Μʹͪ )FMMP 㟬 Japanese area <latexit
sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> English area <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Chinese area <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Bright zone (Listening area) <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Dark zone (Quiet area) <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Իͷฉ͑͜Δ ΤϦΞ Իͷฉ͑͜ͳ͍ ΤϦΞ ॏͶ߹Θͤ
χϡʔϥϧԻ༁ٕज़ͱϚϧνεϙοτ࠶ੜٕज़ͷ༥߹ ଟݴޠϚϧνεϙοτ࠶ੜܕಉ࣌௨༁γεςϜ ݄/*$5খۚҪΦʔϓϯϋε αΠΤϯετʔΫ͓Αͼ"453&$σϞలࣔʹͯެ։ ͷ͕ͣίϩφʹΑΓதࢭ ຊޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ )FMMP
㟬 ӳޠ͚͕ͩ ฉ͑͜ΔΤϦΞ தࠃޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ ଟݴޠ χϡʔϥϧ Իೝࣝ ͜Μʹͪ ςΩετ ଟݴޠ χϡʔϥϧ ػց༁ ͜Μʹͪ ςΩετ 㟬 ςΩετ )FMMP ςΩετ ଟݴޠ χϡʔϥϧ Ի߹ ͜Μʹͪ Ի 㟬 Ի )FMMP Ի Ϛϧνεϙοτ ࠶ੜ
͜ΜͳԠ༻ྫΞϦ &OEUPFOEଟݴޠϚϧνεϙοτ࠶ੜܕಉ࣌௨༁γεςϜ Ԡ༻ઌ㱣 ຊޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ )FMMP 㟬 ӳޠ͚͕ͩ
ฉ͑͜ΔΤϦΞ தࠃޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ ଟݴޠ χϡʔϥϧ Իೝࣝ ͜Μʹͪ ςΩετ ଟݴޠ χϡʔϥϧ ػց༁ ͜Μʹͪ ςΩετ 㟬 ςΩετ )FMMP ςΩετ ଟݴޠ χϡʔϥϧ Ի߹ ͜Μʹͪ Ի 㟬 Ի )FMMP Ի Ϛϧνεϙοτ ࠶ੜ ܥྻมχϡʔϥϧωοτ
ԻೝࣝԻ߹ҙػߏߦྻͷॏΈ͕ର֯ʹͳΔඞཁ͋Γ ɹҙػߏਪఆ͕ࣦഊͨ͠߹ͷ೦ͳྫ ҙػߏͷਪఆ͕ࣦഊ͢Δͱʜ ࣮αʔϏεͰ͑ͳ͍ˠݚڀ՝
ݚڀΛՃͤ͞Δެ։࣮Ϟσϧ ओʹ1ZUIPO ɾίʔύε (JUIVCʹΑΔιʔείʔυͷެ։ ެ։ίʔύε Ի߹ ɿ-+4QFFDI ӳޠ ɼ7$5, ӳޠෳऀ
ɼ-JCSJ554 ӳޠෳऀ ɼ+465 ຊޠ ɼ +74 ຊޠෳऀ ɼʜ &41OFUFOEUPFOETQFFDIQSPDFTTJOHUPPMLJU ܥྻม &OEUPFOE ༻ԻॲཧπʔϧΩοτɿӳޠ͚ͩͲ ओ࠵ऀؚΊ ຊਓଟࢀը ԻೝࣝɼԻ߹ɼԻ༁ɼԻڧௐɼ࣭มɼԻݴޠཧղɼͰಉ͕ؔ͡ΘΕ͍ͯΔˠԠ༻ઌ㱣 Α͘Θ͔Βͳ͍ ࣮ͯ͠ΈΑ͏ (PPHMFͷܥྻม 5SBOTGPSNFS ͷจλΠτϧ l"UUFOUJPOJTBMMZPVOFFEz ˣ ࣮JTBMMXFOFFE ཧղ͢Δʹ࣮͋ΔͷΈମͰཧղ͢Δ ࣗͷ࣮͕ಈ͘ˠೝࣝͰ͖ΔPS߹Ͱ͖Δˠײແྔ
छͷਆث ܥྻมϞσϧষ ܥྻมϞσϧɾষ
χϡʔϥϧωοτͷجૅͪ͜Β %FFQ͔ͩΒਂւڕ
ܥྻมϞσϧͱ ೖྗͱܥྻ͕ҧ͏ग़ྗͷม͕ՄೳɿԻςΩετɼςΩετԻɼʜ ܥྻΛม͑ΔͨΊͷߦྻΛֶश ॴͷग़ྗΛಘΔʹೖྗͷͲ͜ʹ͢Δͷ͔Λֶश Ԡ༻ઌ㱣 ࣮JTBMMXFOFFE ཧղ͢ΔͨΊʹମͰ֮͑Δ࣮͋ΔͷΈ ࣮Ͱ͖Δڥेʹ͋ΔɿHJUIVCɼίʔύε ࠓޙͬͱ૿͑Δϋζ
Ի߹ɿ࣮ࡍʹ࡞ͬͨϞσϧͰ͠Όͬͨ࣌ײಈ ·ͱΊ
ߴ࠶ੜɾෳऀԻܗੜχϡʔϥϧωοτϫʔΫϞσϧ ̍ɿদݪ ਆށେ.ɿ/*$5ݚमੜ ɼԬຊɼߴౡ ਆށେ ɼୌޱ ਆށେ ɼށా ໊େ ɼՏҪɼ
)J'J("/Ϙίʔμʹ͓͚Δ-1$/FUಛྔͷݕ౼ ݴޠ֫ಘΤʔδΣϯτ ԻܗੜχϡʔϥϧωοτϫʔΫϞσϧ εϖγϟϧηογϣϯ ɿాத ౦େमྃੜ ɼԬຊɼࣰ࡚ ౦େ ɼ Իݴޠ֫ಘγεςϜͷͨΊͷ8BWF(SBEΛ༻͍ͨԻൃػߏͱൃԻదԠ $16ͷΈͰߴੜɾߴ࣭ຊޠχϡʔϥϧԻ߹Ϟσϧ 1ɿԬຊɼށా ໊େ ɼՏҪ ڧ੍ΞϥΠϝϯτ൛1BSBMMFM5BDPUSPOͱ)J'J("/Λ༻͍ͨ$16ܕϦΞϧλΠϜຊޠχϡʔϥϧςΩετԻ ߹γεςϜͷ࣮ ෳྖҬԻ੍ޚ ࣗͷՊݚඅςʔϚ ɿԬຊɼ Իͱ෦֎෦ಉ੍࣌ޚʹجͮ͘ϚϧνԻ੍ޚ એɿདྷ݄ͷݚڀൃදձͰͷൃද
Thank you for your !! Ԭຊຏ (Takuma OKAMOTO)ˏNICT e-mail: HP:
https://www.okamotocamera.com Twitter: @okamotocamerea
ԻೝࣝͱԻ߹ɼԿ͕͍͠ͷ͔ ڞ௨ɿೖྗͱग़ྗͷ͕͞શવҧ͏ˠԻڹಛྔɿඦϑϨʔϜɼςΩετɿेจࣈ ೝࣝɿશਓྨͷൃ͕λʔήοτ ലେͳଟ༷ੑ ɼόϥόϥ ߹ɿೖྗͱग़ྗͷ͞͞Βʹશવҧ͏ˠԻܗ L)[ ɿͨͬͨඵͰສαϯϓϧ ͜Ε·ͰͷԻೝࣝͱԻ߹ɿϋʔυϧߴ͍
ઐࣝଟඞཁ ม·Ͱʹ༷ʑͳϞδϡʔϧΛͦΕͧΕֶशɾ࿈݁ ڞ௨ɿԻڹಛྔͱςΩετͱͷҐஔؔΛֶश ΞϥΠϝϯτ ೝࣝɿԻڹϞσϧɼൃԻࣙॻɼݴޠϞσϧɼσίʔμʔ ߹ɿԻૉܧଓϞσϧɼԻڹϞσϧɼܗੜϞσϧ Ϙίʔμ ܥྻมϞσϧͷొɿϋʔυϧ͍ ઐࣝχϡʔϥϧωοτ͕Χόʔ ͭͷχϡʔϥϧωοτʹΑΔҰׅม͕Մೳ ೝࣝɿԻڹಛྔˠ<ܥྻมϞσϧ>ˠ୯ޠྻ ςΩετ ߹ɿςΩετ Իૉܥྻ ˠ<ܥྻมϞσϧ>ˠԻڹಛྔˠ<ܗੜϞσϧ>ˠԻܗ ɹɹɹςΩετ Իૉܥྻ ˠ<ܥྻมϞσϧʴܗੜϞσϧ>ˠԻܗ ɿԻೝࣝɾԻ߹ͷ͜Ε·Ͱͱݱࡏ