Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[ASJ_22nd_summer_seminar] 系列変換でできる音声認識・音声合成+∞ -...
Search
Takuma OKAMOTO
August 24, 2021
1
530
[ASJ_22nd_summer_seminar] 系列変換でできる音声認識・音声合成+∞ -実装 is all we need-
Takuma OKAMOTO
August 24, 2021
Tweet
Share
More Decks by Takuma OKAMOTO
See All by Takuma OKAMOTO
2025/7/5 応用音響研究会招待講演@北海道大学
takuma_okamoto
1
220
2025/1/30「システムデザイン論」@東京都立大学日野キャンパス
takuma_okamoto
0
120
[INTERSPEECH 2024] Challenge of singing voice synthesis using only text-to-speech corpus with FIRNet source-filter neural vocoder
takuma_okamoto
0
170
[Internoise 2023 (invited)] Multilingual sound spot synthesis systems
takuma_okamoto
0
310
マルチスポット再生 meets 多言語同時通訳システム
takuma_okamoto
0
230
[SPEASIP 2023招待講演] マルチスポット再生 meets 多言語ニューラル音声合成 ~実装 is ホンマに all we need~
takuma_okamoto
1
350
和歌山大学2022年度教養科目「世界の情報通信を知る」:音響・音声情報処理編
takuma_okamoto
0
230
[asj2022a] 16チャネル小型円形スピーカアレイを用いたマルチスポット再生システムの実装
takuma_okamoto
0
480
[asj2022a] Harmonic-Net+:高調波入力とLayerwise-Quasi-Periodic畳み込みを用いた基本周波数制御可能な高速ニューラルボコーダ
takuma_okamoto
0
320
Featured
See All Featured
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Designing Experiences People Love
moore
142
24k
Raft: Consensus for Rubyists
vanstee
139
7.1k
Statistics for Hackers
jakevdp
799
220k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
2.6k
RailsConf 2023
tenderlove
30
1.2k
The Cult of Friendly URLs
andyhume
79
6.6k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.9k
Navigating Team Friction
lara
189
15k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.5k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.7k
Transcript
ܥྻมͰͰ͖ΔԻೝࣝɾԻ߹ʴ㱣 ࣮JTBMMXFOFFE Ԭຊຏ ࠃཱݚڀ։ൃ๏ਓɹใ௨৴ݚڀػߏ /*$5 ˞֤ςʔϚͷεϥΠυΧϥʔ ɹԻೝࣝܥ ɹԻ߹ܥ ɹԻऩܥ
ɹԻ੍ޚܥ UI"VH OE4VNNFS4FNJOBSPG"4+!Y
ࣗݾհ ԻೝࣝɾԻ߹ɾػց༁ͱ Կ͕͍͠ͷ͔ ҙػߏ͖ܥྻมϞσϧʹΑΔ࣮ݱ ܥྻมϞσϧͷԠ༻ྫͱՄೳੑ㱣 ܥྻมϞσϧͷ՝ "UUFOUJPOJTBMMZPVOFFE࣮JTBMMXFOFFE ·ͱΊ એɿདྷ݄ͷݚڀൃදձͰͷൃද
ຊͷൃද
ࣗݾհ Ԭຊຏ ݚڀςʔϚ Իڹ৴߸ॲཧɿಛʹԻऩɾ੍ޚ ϚΠΫϩϗϯɾεϐʔΧΞϨΠ৴߸ॲཧ ݄ʙ݄ɿ౦େֶઌԻใγεςϜ म࢜ɾത࢜ɾϙευΫ
݄ʙ݄ɿ/*$5ྟײϓϩδΣΫτˏ/*$5 ʙݱࡏɿࣗͷՊݚඅ!/*$5 Իॲཧ ݄ʙ݄ɿԻରɾݴޠࣝผ!/*$5 ݄ʙݱࡏɿχϡʔϥϧωοτϫʔΫΛ༻͍ͨԻ߹ɾԻܗੜ झຯ ҿΈձ ίϩφͷͨΊօແˠࣗ൩ऌঢ়ଶ ɼδϣΪϯά ݄ؒΩϩ հϖʔδ /*$5ݚڀ৬ɾݚڀٕज़৬࠾༻αΠτɿIUUQTXXXOJDUHPKQFNQMPZNFOUSFTFBSDIFSPLBNPUPUBLVNBIUNM ຊԻڹֶձࢽΩϟϦΞύεখಛूʮೋీΛ͏ͷԿీΛಘΔ ʯɿIUUQTEPJPSHKBTK@ ͦͷଞɿ݄ʙ݄ɿຊԻڹֶձֶੜɾएखϑΥʔϥϜװࣄձୈظද ͷΘΒ͡ݚڀੜ׆ˠඇৗʹָ͍͠ ͚Ͳ͍͠
/*$5͕ఏڙ͢ΔԻ༁ΞϓϦ7PJDF5SBΛྫʹ Իೝࣝ ສਓͷԻΛςΩετม ػց༁ ೖྗ͞ΕͨςΩετΛผͷݴޠม ςΩετԻ߹ ೖྗ͞ΕͨςΩετΛԻ৴߸ม ԻೝࣝɾԻ߹ɾػց༁ͱ
χϡʔϥϧԻ߹σϞ ࢲͷΘΓʹࠃࡍձٞϏσΦͰ͍͍ͯͨͩͨ͠
Noise level limited sub-modeling for diffusion probabilistic vocoders Takuma Okamoto1,
Tomoki Toda2,1, Yoshinori Shiga1* and Hisashi Kawai1 1National Institute of Information and Communications Technology (NICT), Japan 2Nagoya University, Japan *Y. Shiga is currently with the Tokyo Denki University, Japan WaveGrad + DiffWave
͠͞ͷͭɿೖྗͱग़ྗͷ͕͞શવҧ͏ ࣮ࡍʹԻ߹ͯ͠Έͨྫ ςΩετɿจࣈ ʴ۟ ɼԻૉྻɿ Իڹಛྔ ϝϧεϖΫτϩάϥϜ ɿϑϨʔϜ γϑτྔNT
Իܗ αϯϓϦϯάपL)[ ɿ αϯϓϧ Կ͕͍͠ͷ͔ ͋ΒΏΔݱ࣮Λɼͯࣗ͢ͷ΄͏Ͷ͡ۂ͛ͨͷͩ BSBZVSVHF/KJUTVPQBVTVCFUFKJCV/OPIPPFOFKJNBHFUBOPEB
ػց༁͔Βੜ·Εͨҙػߏ͖ܥྻมχϡʔϥϧωοτϫʔΫϞσϧ ೖྗ.ͱग़ྗ/ͷ͕͞ҧ͏ˠߦྻԋࢉʹΑͬͯมՄೳɿ<"Y.>Y<.Y/><"Y/> .ߦ/ྻͷมߦྻ ೖྗͱग़ྗͱͷҐஔؔΞϥΠϝϯτ ֶशʹΑΓࣗಈ֫ಘ͢ΔҰ؏ֶश ɹҙػߏ͖ܥྻมϞσϧʹΑΔ࣮ݱ σίʔμ ςΩετ Իૉྻ
. Τϯίʔμ / ɾҙػߏ "UUFOUJPO ߦྻ ɹೖྗͷͲ͜ʹҙΛ͚Δ͔ ɾΤϯίʔμͱσίʔμʹ ɹҙػߏΛ࣋ͨͤΔ ɹࣗݾҙػߏ 4FMGBUUFOUJPO ˣ ༁ɾೝࣝɾ߹͚ͩͰͳ͘ ༷ʑͳʹͯ׆༻
ܥྻมϞσϧͷಛ ೖྗͱग़ྗͷϖΞσʔλ͕͋Ε͍Ζ͍Ζͱ͑Δ ྫɿςΩετˡˠԻ ༁ɾೝࣝɾ߹Ͱಉ͡ωοτϫʔΫ͕͑ΔͷͰॳֶऀͷෑډ͍ ࢀর Ի͚ͩͰԠ༻༷ʑ ը૾ೝࣝͰଟ࠾༻
ऀμΠΞϦθʔγϣϯɿऩԻˠ୭͕͍͔ͭͨ͠ ෳऀԻೝࣝɿऩԻˠ୭͕ԿΛ͔ͨ͠ &OEUPFOEԻ༁ɿຊޠԻˠɹɹɹɹɹˠӳޠ༁Ի ԻڧௐɿϊΠδʔԻˠΫϦʔϯԻ Իݯɿࠞ߹ԻˠԻ ԻڹΠϕϯτݕग़ɿऩܗˠΠϕϯτϥϕϧ แஸͰࡊΛΔԻɼͳͲ ࣭มɿऀ"ͷԻˠɹɹɹɹɹɹˠऀ#ͷԻ ܥྻมϞσϧͷԠ༻ྫ ໊ ྫɿTQFFDIFOIBODFNFOU BUUFOUJPO Ͱݕࡧ͢Δͱ͍Ζ͍Ζݟ͔ͭΔ
ઢεϐʔΧΞϨΠΛ༻͍ͨϚϧνεϙοτ࠶ੜγεςϜ *$"441 +"DPVTU4PD"N ͜Μʹͪ )FMMP 㟬 Japanese area <latexit
sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> English area <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Chinese area <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Bright zone (Listening area) <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Dark zone (Quiet area) <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Իͷฉ͑͜Δ ΤϦΞ Իͷฉ͑͜ͳ͍ ΤϦΞ ॏͶ߹Θͤ
χϡʔϥϧԻ༁ٕज़ͱϚϧνεϙοτ࠶ੜٕज़ͷ༥߹ ଟݴޠϚϧνεϙοτ࠶ੜܕಉ࣌௨༁γεςϜ ݄/*$5খۚҪΦʔϓϯϋε αΠΤϯετʔΫ͓Αͼ"453&$σϞలࣔʹͯެ։ ͷ͕ͣίϩφʹΑΓதࢭ ຊޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ )FMMP
㟬 ӳޠ͚͕ͩ ฉ͑͜ΔΤϦΞ தࠃޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ ଟݴޠ χϡʔϥϧ Իೝࣝ ͜Μʹͪ ςΩετ ଟݴޠ χϡʔϥϧ ػց༁ ͜Μʹͪ ςΩετ 㟬 ςΩετ )FMMP ςΩετ ଟݴޠ χϡʔϥϧ Ի߹ ͜Μʹͪ Ի 㟬 Ի )FMMP Ի Ϛϧνεϙοτ ࠶ੜ
͜ΜͳԠ༻ྫΞϦ &OEUPFOEଟݴޠϚϧνεϙοτ࠶ੜܕಉ࣌௨༁γεςϜ Ԡ༻ઌ㱣 ຊޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ )FMMP 㟬 ӳޠ͚͕ͩ
ฉ͑͜ΔΤϦΞ தࠃޠ͚͕ͩ ฉ͑͜ΔΤϦΞ ͜Μʹͪ ଟݴޠ χϡʔϥϧ Իೝࣝ ͜Μʹͪ ςΩετ ଟݴޠ χϡʔϥϧ ػց༁ ͜Μʹͪ ςΩετ 㟬 ςΩετ )FMMP ςΩετ ଟݴޠ χϡʔϥϧ Ի߹ ͜Μʹͪ Ի 㟬 Ի )FMMP Ի Ϛϧνεϙοτ ࠶ੜ ܥྻมχϡʔϥϧωοτ
ԻೝࣝԻ߹ҙػߏߦྻͷॏΈ͕ର֯ʹͳΔඞཁ͋Γ ɹҙػߏਪఆ͕ࣦഊͨ͠߹ͷ೦ͳྫ ҙػߏͷਪఆ͕ࣦഊ͢Δͱʜ ࣮αʔϏεͰ͑ͳ͍ˠݚڀ՝
ݚڀΛՃͤ͞Δެ։࣮Ϟσϧ ओʹ1ZUIPO ɾίʔύε (JUIVCʹΑΔιʔείʔυͷެ։ ެ։ίʔύε Ի߹ ɿ-+4QFFDI ӳޠ ɼ7$5, ӳޠෳऀ
ɼ-JCSJ554 ӳޠෳऀ ɼ+465 ຊޠ ɼ +74 ຊޠෳऀ ɼʜ &41OFUFOEUPFOETQFFDIQSPDFTTJOHUPPMLJU ܥྻม &OEUPFOE ༻ԻॲཧπʔϧΩοτɿӳޠ͚ͩͲ ओ࠵ऀؚΊ ຊਓଟࢀը ԻೝࣝɼԻ߹ɼԻ༁ɼԻڧௐɼ࣭มɼԻݴޠཧղɼͰಉ͕ؔ͡ΘΕ͍ͯΔˠԠ༻ઌ㱣 Α͘Θ͔Βͳ͍ ࣮ͯ͠ΈΑ͏ (PPHMFͷܥྻม 5SBOTGPSNFS ͷจλΠτϧ l"UUFOUJPOJTBMMZPVOFFEz ˣ ࣮JTBMMXFOFFE ཧղ͢Δʹ࣮͋ΔͷΈମͰཧղ͢Δ ࣗͷ࣮͕ಈ͘ˠೝࣝͰ͖ΔPS߹Ͱ͖Δˠײແྔ
छͷਆث ܥྻมϞσϧষ ܥྻมϞσϧɾষ
χϡʔϥϧωοτͷجૅͪ͜Β %FFQ͔ͩΒਂւڕ
ܥྻมϞσϧͱ ೖྗͱܥྻ͕ҧ͏ग़ྗͷม͕ՄೳɿԻςΩετɼςΩετԻɼʜ ܥྻΛม͑ΔͨΊͷߦྻΛֶश ॴͷग़ྗΛಘΔʹೖྗͷͲ͜ʹ͢Δͷ͔Λֶश Ԡ༻ઌ㱣 ࣮JTBMMXFOFFE ཧղ͢ΔͨΊʹମͰ֮͑Δ࣮͋ΔͷΈ ࣮Ͱ͖Δڥेʹ͋ΔɿHJUIVCɼίʔύε ࠓޙͬͱ૿͑Δϋζ
Ի߹ɿ࣮ࡍʹ࡞ͬͨϞσϧͰ͠Όͬͨ࣌ײಈ ·ͱΊ
ߴ࠶ੜɾෳऀԻܗੜχϡʔϥϧωοτϫʔΫϞσϧ ̍ɿদݪ ਆށେ.ɿ/*$5ݚमੜ ɼԬຊɼߴౡ ਆށେ ɼୌޱ ਆށେ ɼށా ໊େ ɼՏҪɼ
)J'J("/Ϙίʔμʹ͓͚Δ-1$/FUಛྔͷݕ౼ ݴޠ֫ಘΤʔδΣϯτ ԻܗੜχϡʔϥϧωοτϫʔΫϞσϧ εϖγϟϧηογϣϯ ɿాத ౦େमྃੜ ɼԬຊɼࣰ࡚ ౦େ ɼ Իݴޠ֫ಘγεςϜͷͨΊͷ8BWF(SBEΛ༻͍ͨԻൃػߏͱൃԻదԠ $16ͷΈͰߴੜɾߴ࣭ຊޠχϡʔϥϧԻ߹Ϟσϧ 1ɿԬຊɼށా ໊େ ɼՏҪ ڧ੍ΞϥΠϝϯτ൛1BSBMMFM5BDPUSPOͱ)J'J("/Λ༻͍ͨ$16ܕϦΞϧλΠϜຊޠχϡʔϥϧςΩετԻ ߹γεςϜͷ࣮ ෳྖҬԻ੍ޚ ࣗͷՊݚඅςʔϚ ɿԬຊɼ Իͱ෦֎෦ಉ੍࣌ޚʹجͮ͘ϚϧνԻ੍ޚ એɿདྷ݄ͷݚڀൃදձͰͷൃද
Thank you for your !! Ԭຊຏ (Takuma OKAMOTO)ˏNICT e-mail: HP:
https://www.okamotocamera.com Twitter: @okamotocamerea
ԻೝࣝͱԻ߹ɼԿ͕͍͠ͷ͔ ڞ௨ɿೖྗͱग़ྗͷ͕͞શવҧ͏ˠԻڹಛྔɿඦϑϨʔϜɼςΩετɿेจࣈ ೝࣝɿશਓྨͷൃ͕λʔήοτ ലେͳଟ༷ੑ ɼόϥόϥ ߹ɿೖྗͱग़ྗͷ͞͞Βʹશવҧ͏ˠԻܗ L)[ ɿͨͬͨඵͰສαϯϓϧ ͜Ε·ͰͷԻೝࣝͱԻ߹ɿϋʔυϧߴ͍
ઐࣝଟඞཁ ม·Ͱʹ༷ʑͳϞδϡʔϧΛͦΕͧΕֶशɾ࿈݁ ڞ௨ɿԻڹಛྔͱςΩετͱͷҐஔؔΛֶश ΞϥΠϝϯτ ೝࣝɿԻڹϞσϧɼൃԻࣙॻɼݴޠϞσϧɼσίʔμʔ ߹ɿԻૉܧଓϞσϧɼԻڹϞσϧɼܗੜϞσϧ Ϙίʔμ ܥྻมϞσϧͷొɿϋʔυϧ͍ ઐࣝχϡʔϥϧωοτ͕Χόʔ ͭͷχϡʔϥϧωοτʹΑΔҰׅม͕Մೳ ೝࣝɿԻڹಛྔˠ<ܥྻมϞσϧ>ˠ୯ޠྻ ςΩετ ߹ɿςΩετ Իૉܥྻ ˠ<ܥྻมϞσϧ>ˠԻڹಛྔˠ<ܗੜϞσϧ>ˠԻܗ ɹɹɹςΩετ Իૉܥྻ ˠ<ܥྻมϞσϧʴܗੜϞσϧ>ˠԻܗ ɿԻೝࣝɾԻ߹ͷ͜Ε·Ͱͱݱࡏ