Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
wavenet
Search
soymsk
April 27, 2017
Technology
0
86
wavenet
soymsk
April 27, 2017
Tweet
Share
More Decks by soymsk
See All by soymsk
[SUSTEN 勉強会]マイナンバーカードの仕組み
soymsk
0
230
Google_Cloud_Next_19_AI_ML_Summary_public.pdf
soymsk
6
1.9k
DeNAにおけるデータ活用事例 〜移動体データ活用によるサービス創出とその基盤 / Data Driven Service in Taxi hiring app MOV
soymsk
0
390
Introduction of GCP Dataflow
soymsk
1
240
Other Decks in Technology
See All in Technology
因果AIへの招待
sshimizu2006
0
990
AWS Security Agentの紹介/introducing-aws-security-agent
tomoki10
0
310
GitHub Copilotを使いこなす 実例に学ぶAIコーディング活用術
74th
3
3.4k
AWS re:Invent 2025で見たGrafana最新機能の紹介
hamadakoji
0
410
新 Security HubがついにGA!仕組みや料金を深堀り #AWSreInvent #regrowth / AWS Security Hub Advanced GA
masahirokawahara
1
2.2k
AWSを使う上で最低限知っておきたいセキュリティ研修を社内で実施した話 ~みんなでやるセキュリティ~
maimyyym
2
1.7k
Lambdaの常識はどう変わる?!re:Invent 2025 before after
iwatatomoya
1
630
.NET 10の概要
tomokusaba
0
120
SQLだけでマイグレーションしたい!
makki_d
0
460
業務のトイルをバスターせよ 〜AI時代の生存戦略〜
staka121
PRO
2
220
AI駆動開発における設計思想 認知負荷を下げるフロントエンドアーキテクチャ/ 20251211 Teppei Hanai
shift_evolve
PRO
2
420
20251219 OpenIDファウンデーション・ジャパン紹介 / OpenID Foundation Japan Intro
oidfj
0
110
Featured
See All Featured
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
How GitHub (no longer) Works
holman
316
140k
Practical Orchestrator
shlominoach
190
11k
Designing for Performance
lara
610
69k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
Being A Developer After 40
akosma
91
590k
Docker and Python
trallard
47
3.7k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.3k
Bash Introduction
62gerente
615
210k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
730
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Transcript
Wavenet 2017/04/27 @soymsk
Wavenet • 2016ʹDeepMind͕ൃදͨ͠Ի߹ΞϧΰϦζϜ • Text to Speech(TTS)ͷͰߴ͍Ի߹ͷਫ਼Λୡ͠ ͨɻ • ࣮͕ެ։͞Ε͓ͯΒͣɺ·ͨࣜগͳ͘ɺ࣮ࡍʹͲͷΑ
͏ʹͳ͍ͬͯΔ͔ෆ໌ͳॴଟ͍ • Concatenate Text to Speech • parametric TTS parametric TTS • PixelRNN • PixelCNN 8BWFOFU +
ैདྷͷख๏ • Concatenate Text to Speech • ͍ԻσʔλΛେྔʹσʔλϕʔεʹ֨ೲ͠ɺͦΕΛͭͳ͗߹ΘͤΔख๏ • طଘͷσʔλΛͭͳ͗߹ΘͤΔ͚ͩͳͷͰɺڧௐɾ৭มߋͳͲ͕ۤखɻ·
ͨɺ߹ޙͷԻͷͭͳ͕ΓෆࣗવʹͳΓ͕ͪ • parametric TTS • ੜϞσϧʹΑͬͯԻ߹͢Δख๏ • ൃ༰ൃऀͷಛΛϞσϧͷೖྗͱͯ͠ίϯτϩʔϧͤ͞Δ͜ͱ͕Ͱ ͖ΔΑ͏ʹͳͬͨɻ • ͨͩ͠ɺࣗવͳൃɺͱݴ͍͍
ैདྷख๏
Wavenet
Wavenet • Wavenetաڈͷೖྗσʔλ͔Β࣍ͷԻ σʔλͷ֬Λ༧ଌ͢Δ t: ࣌ࠁ x: ೖྗԻ
ೖྗԻσʔλ • Իσʔλܗࣜ • ྔࢠԽ: 16bit • αϯϓϦϯάप: 44.1 kHz
(ԻCD)
Wavenetग़ྗσʔλܗࣜ • Ի৴߸Ұൠతʹ16bitͰྔࢠԽ͞Ε͓ͯΓɺͦ ͷ··Ͱ65,536ͷ1 of N ग़ྗϊʔυ͕ඞཁ • ԼهͷΑ͏ʹೖྗΛมͯ͠ѹॖ •
ԻͰҰൠతͳѹॖܗࣜ: μ-law 256ϊʔυ·Ͱѹॖ
8BWFOFU ЖMBX෮߸ t-1 0 ࣌ࠁtʹ͓͚Δग़ྗ: 1 of 256
Dilated causal convolution
Dilated causal convolution • ࣌ܥྻͷԻσʔλʹରͯ͠ɺRNNͰͳ͘ConvolutionͰֶशΛߦ͏ɻ • ΈࠐΈͷϑΟϧλΛ2ͱ͢ΔͱɺҎԼͷΑ͏ʹ4Ͱ5͔ͭ͠ΈΒΕͳ͍ɻʢ௨ৗͷ ࠐΈ) • 44.1kHz
(ԻCD)ͷೖྗΛѻ͏߹ɺ1ඵؒͷԻೖྗ͚ͩͰɺ44100ͷೖྗϊʔυ͕ඞཁ receptive field(ड༰) = 5
Dilated causal convolution • Dilated causal convolutionͰೖྗΛNݸඈ͠Ͱ࣍ͷʹೖྗ͢Δɻ • ͕ਂ͘ͳΔͨͼʹDilationͷΛഒʹ͢Δ •
DilationʹΑͬͯग़ྗϊʔυͷड༰Λ૿͢͜ͱ͕Ͱ͖Δ
Dilated causal convolution • 44100ͷೖྗ16ͷDilated causal convolution ͰΈΔ͜ͱ͕Մೳ • WavenetͰɺ࠷େDilation=512·ͰΛॏͶ(
1- block )ɺblockΛෳੵΈॏͶΔߏΛऔ͍ͬͯ Δɻ • Λਂֶͯ͘͠शͰ͖ΔΑ͏ʹResidualNetΛར ༻
None
• http://musyoku.github.io/images/post/ 2016-09-17/dilated_conv.gif
RNNͱWavenetͷֶशͷҧ͍ • RNNֶश࣌ɺ࣌ܥྻॱʹσʔλΛೖྗ͍ͯ͘͠ඞཁ͕͋ΔͨΊɺ࣌ؒ ͕͔͔Δɻ • WavenetCNNͷΑ͏ʹɺೖྗσʔλΛ࣌ܥྻʹॲཧ͢Δඞཁ͕ͳ͘ɺ ̍ʹωοτϫʔΫʹೖྗ͢ΔͨΊɺֶश͕ૣ͍ • αϯϓϧʹ͍ͭͯɺ࣌ܥྻॱʹֶश͢Δඞཁ͕ͳ͍ Wavenet
RNN
Wavenetͷߏ filter gate x: input k: layer
Conditional Wavenet • Conditional Pixel CNN ͱಉ༷ɺWavenetʹҙͷύϥϝʔλhಋೖ͢Δ ͜ͱͰɺWavenetΛύϥϝʔλͰૢ࡞ • Global
conditions: WavenetʹൃऀͷಛΛֶशͤ͞Δ ύϥϝʔλhʹΑͬͯൃશମͷதͰͷൃऀͷಛΛ࠶ݱͰ͖Δ ex: ࠃޠ͕ҟͳΔൃऀͷಛ શͯͷ࣌ؒεςοϓͰ࡞༻͢Δ߲
Conditional Wavenet • Local conditions: Wavenetʹݴ༿ͷಛΛֶशͤ͞Δ ݸʑͷ࣌ؒεςοϓͰ࡞༻͢Δ߲ ൃͷݴޠతಛΛύϥϝʔλͱͯ͠ೖྗͰ͖Δ ex: ୯ޠͷͭͳ͕ΓʹΑͬͯൃ͞Εͳ͍จࣈͳͲʁ
ੜ݁ՌσϞ https://deepmind.com/blog/wavenet-generative- model-raw-audio/
࣮ݧ݁Ռ • GoogleͷTTSσʔληοτΛར༻ֶͯ͠श • ैདྷख๏ʹൺͯߴ͍ਫ਼Λୡ
·ͱΊ • WavenetԻ߹ͷʹCNNͷख๏Λಋ ೖ͠ɺߴ͍߹ਫ਼Λୡͨ͠ • Dilated convolutionʹΑͬͯɺRNNͷΑ͏ʹ࣌ ܥྻσʔλʹద༻Ͱ͖ΔՄೳੑΛࣔͨ͠ɻ • Ի͚ͩͰͳ͘ɺԻָͷ߹ͳͲԠ༻ൣғ
͍
ࢀߟ • https://arxiv.org/abs/1609.03499 • ݪஶPDF • https://deepmind.com/blog/wavenet-generative-model-raw-audio/ • σϞ݁ՌͳͲ •
http://musyoku.github.io/2016/09/18/wavenet-a-generative-model-for-raw- audio/ • Chainer࣮Dilationͷ෦͕Θ͔Γ͍͢ • https://www.slideshare.net/DeepLearningJP2016/dlwavenet-a-generative- model-for-raw-audio