Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
wavenet
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
soymsk
April 27, 2017
Technology
0
87
wavenet
soymsk
April 27, 2017
Tweet
Share
More Decks by soymsk
See All by soymsk
[SUSTEN 勉強会]マイナンバーカードの仕組み
soymsk
0
240
Google_Cloud_Next_19_AI_ML_Summary_public.pdf
soymsk
6
1.9k
DeNAにおけるデータ活用事例 〜移動体データ活用によるサービス創出とその基盤 / Data Driven Service in Taxi hiring app MOV
soymsk
0
400
Introduction of GCP Dataflow
soymsk
1
250
Other Decks in Technology
See All in Technology
20260323_データ分析基盤でGeminiを使う話
1210yuichi0
0
170
スケールアップ企業でQA組織が機能し続けるための組織設計と仕組み〜ボトムアップとトップダウンを両輪としたアプローチ〜
qa
0
250
20年以上続く PHP 大規模プロダクトを Kubernetes へ ── クラウド基盤刷新プロジェクトの4年間
oogfranz
PRO
0
170
The Rise of Browser Automation: AI-Powered Web Interaction in 2026
marcthompson_seo
0
310
スピンアウト講座01_GitHub管理
overflowinc
0
1.3k
データマネジメント戦略Night - 4社のリアルを語る会
ktatsuya
1
220
Phase09_自動化_仕組み化
overflowinc
0
1.6k
A4)シラバスを超えて語る、テストマネジメント
moritamasami
0
120
VSCode中心だった自分がターミナル沼に入門した話
sanogemaru
0
200
Phase04_ターミナル基礎
overflowinc
0
2.2k
品質を経営にどう語るか #jassttokyo / Communicating the Strategic Value of Quality to Executive Leadership
kyonmm
PRO
3
1.2k
PostgreSQL 18のNOT ENFORCEDな制約とDEFERRABLEの関係
yahonda
0
110
Featured
See All Featured
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
Are puppies a ranking factor?
jonoalderson
1
3.1k
RailsConf 2023
tenderlove
30
1.4k
ラッコキーワード サービス紹介資料
rakko
1
2.7M
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
180
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
160
How to Talk to Developers About Accessibility
jct
2
160
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
200
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.5k
Amusing Abliteration
ianozsvald
0
140
How to Think Like a Performance Engineer
csswizardry
28
2.5k
GraphQLとの向き合い方2022年版
quramy
50
14k
Transcript
Wavenet 2017/04/27 @soymsk
Wavenet • 2016ʹDeepMind͕ൃදͨ͠Ի߹ΞϧΰϦζϜ • Text to Speech(TTS)ͷͰߴ͍Ի߹ͷਫ਼Λୡ͠ ͨɻ • ࣮͕ެ։͞Ε͓ͯΒͣɺ·ͨࣜগͳ͘ɺ࣮ࡍʹͲͷΑ
͏ʹͳ͍ͬͯΔ͔ෆ໌ͳॴଟ͍ • Concatenate Text to Speech • parametric TTS parametric TTS • PixelRNN • PixelCNN 8BWFOFU +
ैདྷͷख๏ • Concatenate Text to Speech • ͍ԻσʔλΛେྔʹσʔλϕʔεʹ֨ೲ͠ɺͦΕΛͭͳ͗߹ΘͤΔख๏ • طଘͷσʔλΛͭͳ͗߹ΘͤΔ͚ͩͳͷͰɺڧௐɾ৭มߋͳͲ͕ۤखɻ·
ͨɺ߹ޙͷԻͷͭͳ͕ΓෆࣗવʹͳΓ͕ͪ • parametric TTS • ੜϞσϧʹΑͬͯԻ߹͢Δख๏ • ൃ༰ൃऀͷಛΛϞσϧͷೖྗͱͯ͠ίϯτϩʔϧͤ͞Δ͜ͱ͕Ͱ ͖ΔΑ͏ʹͳͬͨɻ • ͨͩ͠ɺࣗવͳൃɺͱݴ͍͍
ैདྷख๏
Wavenet
Wavenet • Wavenetաڈͷೖྗσʔλ͔Β࣍ͷԻ σʔλͷ֬Λ༧ଌ͢Δ t: ࣌ࠁ x: ೖྗԻ
ೖྗԻσʔλ • Իσʔλܗࣜ • ྔࢠԽ: 16bit • αϯϓϦϯάप: 44.1 kHz
(ԻCD)
Wavenetग़ྗσʔλܗࣜ • Ի৴߸Ұൠతʹ16bitͰྔࢠԽ͞Ε͓ͯΓɺͦ ͷ··Ͱ65,536ͷ1 of N ग़ྗϊʔυ͕ඞཁ • ԼهͷΑ͏ʹೖྗΛมͯ͠ѹॖ •
ԻͰҰൠతͳѹॖܗࣜ: μ-law 256ϊʔυ·Ͱѹॖ
8BWFOFU ЖMBX෮߸ t-1 0 ࣌ࠁtʹ͓͚Δग़ྗ: 1 of 256
Dilated causal convolution
Dilated causal convolution • ࣌ܥྻͷԻσʔλʹରͯ͠ɺRNNͰͳ͘ConvolutionͰֶशΛߦ͏ɻ • ΈࠐΈͷϑΟϧλΛ2ͱ͢ΔͱɺҎԼͷΑ͏ʹ4Ͱ5͔ͭ͠ΈΒΕͳ͍ɻʢ௨ৗͷ ࠐΈ) • 44.1kHz
(ԻCD)ͷೖྗΛѻ͏߹ɺ1ඵؒͷԻೖྗ͚ͩͰɺ44100ͷೖྗϊʔυ͕ඞཁ receptive field(ड༰) = 5
Dilated causal convolution • Dilated causal convolutionͰೖྗΛNݸඈ͠Ͱ࣍ͷʹೖྗ͢Δɻ • ͕ਂ͘ͳΔͨͼʹDilationͷΛഒʹ͢Δ •
DilationʹΑͬͯग़ྗϊʔυͷड༰Λ૿͢͜ͱ͕Ͱ͖Δ
Dilated causal convolution • 44100ͷೖྗ16ͷDilated causal convolution ͰΈΔ͜ͱ͕Մೳ • WavenetͰɺ࠷େDilation=512·ͰΛॏͶ(
1- block )ɺblockΛෳੵΈॏͶΔߏΛऔ͍ͬͯ Δɻ • Λਂֶͯ͘͠शͰ͖ΔΑ͏ʹResidualNetΛར ༻
None
• http://musyoku.github.io/images/post/ 2016-09-17/dilated_conv.gif
RNNͱWavenetͷֶशͷҧ͍ • RNNֶश࣌ɺ࣌ܥྻॱʹσʔλΛೖྗ͍ͯ͘͠ඞཁ͕͋ΔͨΊɺ࣌ؒ ͕͔͔Δɻ • WavenetCNNͷΑ͏ʹɺೖྗσʔλΛ࣌ܥྻʹॲཧ͢Δඞཁ͕ͳ͘ɺ ̍ʹωοτϫʔΫʹೖྗ͢ΔͨΊɺֶश͕ૣ͍ • αϯϓϧʹ͍ͭͯɺ࣌ܥྻॱʹֶश͢Δඞཁ͕ͳ͍ Wavenet
RNN
Wavenetͷߏ filter gate x: input k: layer
Conditional Wavenet • Conditional Pixel CNN ͱಉ༷ɺWavenetʹҙͷύϥϝʔλhಋೖ͢Δ ͜ͱͰɺWavenetΛύϥϝʔλͰૢ࡞ • Global
conditions: WavenetʹൃऀͷಛΛֶशͤ͞Δ ύϥϝʔλhʹΑͬͯൃશମͷதͰͷൃऀͷಛΛ࠶ݱͰ͖Δ ex: ࠃޠ͕ҟͳΔൃऀͷಛ શͯͷ࣌ؒεςοϓͰ࡞༻͢Δ߲
Conditional Wavenet • Local conditions: Wavenetʹݴ༿ͷಛΛֶशͤ͞Δ ݸʑͷ࣌ؒεςοϓͰ࡞༻͢Δ߲ ൃͷݴޠతಛΛύϥϝʔλͱͯ͠ೖྗͰ͖Δ ex: ୯ޠͷͭͳ͕ΓʹΑͬͯൃ͞Εͳ͍จࣈͳͲʁ
ੜ݁ՌσϞ https://deepmind.com/blog/wavenet-generative- model-raw-audio/
࣮ݧ݁Ռ • GoogleͷTTSσʔληοτΛར༻ֶͯ͠श • ैདྷख๏ʹൺͯߴ͍ਫ਼Λୡ
·ͱΊ • WavenetԻ߹ͷʹCNNͷख๏Λಋ ೖ͠ɺߴ͍߹ਫ਼Λୡͨ͠ • Dilated convolutionʹΑͬͯɺRNNͷΑ͏ʹ࣌ ܥྻσʔλʹద༻Ͱ͖ΔՄೳੑΛࣔͨ͠ɻ • Ի͚ͩͰͳ͘ɺԻָͷ߹ͳͲԠ༻ൣғ
͍
ࢀߟ • https://arxiv.org/abs/1609.03499 • ݪஶPDF • https://deepmind.com/blog/wavenet-generative-model-raw-audio/ • σϞ݁ՌͳͲ •
http://musyoku.github.io/2016/09/18/wavenet-a-generative-model-for-raw- audio/ • Chainer࣮Dilationͷ෦͕Θ͔Γ͍͢ • https://www.slideshare.net/DeepLearningJP2016/dlwavenet-a-generative- model-for-raw-audio