Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
TTS Skins: Speaker Conversion via ASR
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
peisuke
November 20, 2020
Technology
460
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
TTS Skins: Speaker Conversion via ASR
Interspeech2020音声読み会発表資料
peisuke
November 20, 2020
More Decks by peisuke
See All by peisuke
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
peisuke
0
250
VGGT: Visual Geometry Grounded Transformer
peisuke
1
1.8k
AI for Kids:小学生に画像認識を教えてみた話
peisuke
1
100
LangGraphで始めるマルチエージェントシステム
peisuke
14
5k
Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflections
peisuke
9
1.6k
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
peisuke
0
14k
LangChain Toolsの運用と改善
peisuke
5
2.9k
GNeRF: GAN-based Neural Radiance Field without Posed Camera
peisuke
1
850
A Quantum Computational Approach to Correspondence Problems on Point Sets
peisuke
0
780
Other Decks in Technology
See All in Technology
LayerX コーポレートエンジニアリング室におけるサプライチェーンセキュリティへの取り組み / Supply Chain Security at LayerX Corporate Engineering
yuyatakeyama
3
810
iOS アプリの「これって不具合ですか?」を AI に調べてもらう
miichan
0
140
FPC(フレキシブル)基板にZephyr実装してみた。
iotengineer22
0
160
アジャイルな経理と Claude Code と経営の未来
kawaguti
PRO
3
180
GitHub Copilot 最新アップデート – 「一歩先」の実践活用術
moulongzhang
5
1.6k
AI時代のコスト管理を考えよう〜明日から使える実践AWSノウハウ~
yoshimi0227
0
820
自分が詳しくない領域でAIを使う #プロヒス2026
konifar
20
7.2k
Comment regagner la souveraineté de vos données tout en étant payé grâce à Nostr !
rlifchitz
0
160
AI-DLCを “そのまま導入しなかった”話 ~組織に合わせてアジャストした 私たちの実践共有~
hiroramos4
PRO
1
400
インシデントレスポンス演習 I / Incident Response Exercise I
ks91
PRO
0
110
スタートアップにAmazon EKSは早すぎる? マルチプロダクト戦略を加速する Platform Engineeringの実践 / Is Amazon EKS Too Soon for Startups? Practical Platform Engineering to Accelerate a Multi-Product Strategy
elmodev09
1
1.7k
【Snowflake Summit 2026 Recap!!】Snowflake Summit Deep Dive: Security & Governance
civitaspo
1
300
Featured
See All Featured
Java REST API Framework Comparison - PWX 2021
mraible
34
9.4k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.8k
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
2
400
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
140
jQuery: Nuts, Bolts and Bling
dougneiner
66
8.5k
Building AI with AI
inesmontani
PRO
1
1.1k
GitHub's CSS Performance
jonrohan
1033
470k
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
2k
My Coaching Mixtape
mlcsv
0
150
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4.1k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Transcript
TTS Skins: Speaker Conversion via ASR Authors: A. Polyak, L.
Wolf, Y. Taigman presenter: @peisuke
2016 ABEJA 2016 Twitter @peisuke Github https://github.com/peisuke Qiita https://qiita.com/peisuke SlideShare
https://www.slideshare.net/FujimotoKeisuke
• • TTS Skins: Speaker Conversion via ASR • •
• ASR WaveNet • • ASR
• Text-to-Speech 100 • • Text-to-Speech • • TTS
• ASR F0
• • Jasper: An End-to-End Convolutional Neural Acoustic Model •
https://github.com/NVIDIA/OpenSeq2Seq • • 1DConv-BN-ReL • Skip-Connection • Pre-trained •
• WaveNet • condition • https://github.com/NVIDIA/nv-WaveNet • • • •
F0 •
• • Look up table pytorch Embedding • • •
F0 • • fine tuning
• • LibriTTS VCTK • • Many-to-many seen unseen •
TTS • • • MOS • Mel cepstral distortion • Speaker classification • • WaveNet AutoEncoder • PPG
Seen • Seen-to-seen • A B • • Identification F0
LibriTTS VCTK MOS MCD Identification MOS MCD Identification Full method 3.78±0.83 96.12 4.08±0.75 8.76±1.72 98.97 w/o F0 3.61±0.83 96.96 3.59±0.96 8.99±1.5 96.89 AE baseline 2.89±0.88 29.19 3.46±1.07 9.45±1.63 69.26 PPG 2.82±0.91 94.01 2.67±0.93 9.19±1.50 98.77 PPG2 2.87±1.00 95.77 3.03±1.06 9.18±1.52 96.24
Uneen • Uneen-to-seen • A B • LibriTTS VCTK MOS
MCD Identification MOS MCD Identification Full method 3.70±0.80 97.10 4.05±0.74 8.94±1.53 98.33 w/o F0 3.67±0.82 97.15 3.62±0.99 9.25±1.62 95.69 AE baseline 3.02±0.89 32.55 3.83±0.91 9.65±1.51 66.20 PPG 2.79±0.93 94.05 2.89±0.93 9.45±1.45 97.45 PPG2 2.71±0.93 95.43 3.19±1.04 9.79±1.86 97.25
TTS • TTS • TTS LibriTTS VCTK MOS MCD Identification
MOS MCD Identification Original TTS 4.25±0.77 10.12±1.27 4.37±0.80 14.52±2.40 Full method 3.67±0.81 8.13±0.95 96.06 4.17±0.88 12.68±2.17 99.25 w/o F0 3.47±0.76 8.43±0.97 96.66 3.75±1.07 13.06±2.26 96.36 AE baseline 3.02±0.84 9.38±1.09 60.26 3.85±1.05 13.81±2.29 75.56 PPG 2.91±0.94 8.52±0.93 96.63 3.50±0.83 12.45±1.92 98.36 PPG2 2.85±0.87 8.76±1.06 95.08 3.66±1.03 12.57±2.10 97.62
• The voice conversion challenge 2018 • 1 81 4-5
• Hub Spoke • Hub Spoke MOS Similarity MOS Similarity Ours 3.84±0.85 2.87±1.14 4.00±0.55 3.14±0.97 N10 3.92±0.75 2.83±1.20 3.98±0.52 3.13±0.97 N17 3.27±0.95 2.77±1.17 3.40±0.88 3.05±0.96
• • TTS • ASR F0 Conditional WaveNet •