Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
wavenet
Search
soymsk
April 27, 2017
Technology
0
86
wavenet
soymsk
April 27, 2017
Tweet
Share
More Decks by soymsk
See All by soymsk
[SUSTEN 勉強会]マイナンバーカードの仕組み
soymsk
0
240
Google_Cloud_Next_19_AI_ML_Summary_public.pdf
soymsk
6
1.9k
DeNAにおけるデータ活用事例 〜移動体データ活用によるサービス創出とその基盤 / Data Driven Service in Taxi hiring app MOV
soymsk
0
400
Introduction of GCP Dataflow
soymsk
1
250
Other Decks in Technology
See All in Technology
組織のSREを推進するためのPlatform EngineeringとEKS / Platform Engineering and EKS to drive SRE in your organization
chmikata
0
180
ビズリーチにおける検索・推薦の取り組み / DEIM2026
visional_engineering_and_design
1
100
オンプレとGoogle Cloudを安全に繋ぐための、セキュア通信の勘所
waiwai2111
3
1.1k
Windows ネットワークを再確認する
murachiakira
PRO
0
260
生成AI活用によるPRレビュー改善の歩み
lycorptech_jp
PRO
5
2.1k
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
14k
管理者向けGitHub Enterpriseの運用Tips紹介: 人にもAIにも優しいプラットフォームづくり
yuriemori
0
110
自動テストが巻き起こした開発プロセス・チームの変化 / Impact of Automated Testing on Development Cycles and Team Dynamics
codmoninc
1
1.1k
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
4k
Claude Codeの進化と各機能の活かし方
oikon48
12
3.5k
「ヒットする」+「近い」を同時にかなえるスマートサジェストの作り方.pdf
nakasho
0
110
チームメンバー迷わないIaC設計
hayama17
5
3.8k
Featured
See All Featured
Measuring & Analyzing Core Web Vitals
bluesmoon
9
770
Writing Fast Ruby
sferik
630
63k
WCS-LA-2024
lcolladotor
0
470
BBQ
matthewcrist
89
10k
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
For a Future-Friendly Web
brad_frost
183
10k
Optimising Largest Contentful Paint
csswizardry
37
3.6k
How to train your dragon (web standard)
notwaldorf
97
6.5k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3.1k
Balancing Empowerment & Direction
lara
5
930
Thoughts on Productivity
jonyablonski
75
5.1k
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
210
Transcript
Wavenet 2017/04/27 @soymsk
Wavenet • 2016ʹDeepMind͕ൃදͨ͠Ի߹ΞϧΰϦζϜ • Text to Speech(TTS)ͷͰߴ͍Ի߹ͷਫ਼Λୡ͠ ͨɻ • ࣮͕ެ։͞Ε͓ͯΒͣɺ·ͨࣜগͳ͘ɺ࣮ࡍʹͲͷΑ
͏ʹͳ͍ͬͯΔ͔ෆ໌ͳॴଟ͍ • Concatenate Text to Speech • parametric TTS parametric TTS • PixelRNN • PixelCNN 8BWFOFU +
ैདྷͷख๏ • Concatenate Text to Speech • ͍ԻσʔλΛେྔʹσʔλϕʔεʹ֨ೲ͠ɺͦΕΛͭͳ͗߹ΘͤΔख๏ • طଘͷσʔλΛͭͳ͗߹ΘͤΔ͚ͩͳͷͰɺڧௐɾ৭มߋͳͲ͕ۤखɻ·
ͨɺ߹ޙͷԻͷͭͳ͕ΓෆࣗવʹͳΓ͕ͪ • parametric TTS • ੜϞσϧʹΑͬͯԻ߹͢Δख๏ • ൃ༰ൃऀͷಛΛϞσϧͷೖྗͱͯ͠ίϯτϩʔϧͤ͞Δ͜ͱ͕Ͱ ͖ΔΑ͏ʹͳͬͨɻ • ͨͩ͠ɺࣗવͳൃɺͱݴ͍͍
ैདྷख๏
Wavenet
Wavenet • Wavenetաڈͷೖྗσʔλ͔Β࣍ͷԻ σʔλͷ֬Λ༧ଌ͢Δ t: ࣌ࠁ x: ೖྗԻ
ೖྗԻσʔλ • Իσʔλܗࣜ • ྔࢠԽ: 16bit • αϯϓϦϯάप: 44.1 kHz
(ԻCD)
Wavenetग़ྗσʔλܗࣜ • Ի৴߸Ұൠతʹ16bitͰྔࢠԽ͞Ε͓ͯΓɺͦ ͷ··Ͱ65,536ͷ1 of N ग़ྗϊʔυ͕ඞཁ • ԼهͷΑ͏ʹೖྗΛมͯ͠ѹॖ •
ԻͰҰൠతͳѹॖܗࣜ: μ-law 256ϊʔυ·Ͱѹॖ
8BWFOFU ЖMBX෮߸ t-1 0 ࣌ࠁtʹ͓͚Δग़ྗ: 1 of 256
Dilated causal convolution
Dilated causal convolution • ࣌ܥྻͷԻσʔλʹରͯ͠ɺRNNͰͳ͘ConvolutionͰֶशΛߦ͏ɻ • ΈࠐΈͷϑΟϧλΛ2ͱ͢ΔͱɺҎԼͷΑ͏ʹ4Ͱ5͔ͭ͠ΈΒΕͳ͍ɻʢ௨ৗͷ ࠐΈ) • 44.1kHz
(ԻCD)ͷೖྗΛѻ͏߹ɺ1ඵؒͷԻೖྗ͚ͩͰɺ44100ͷೖྗϊʔυ͕ඞཁ receptive field(ड༰) = 5
Dilated causal convolution • Dilated causal convolutionͰೖྗΛNݸඈ͠Ͱ࣍ͷʹೖྗ͢Δɻ • ͕ਂ͘ͳΔͨͼʹDilationͷΛഒʹ͢Δ •
DilationʹΑͬͯग़ྗϊʔυͷड༰Λ૿͢͜ͱ͕Ͱ͖Δ
Dilated causal convolution • 44100ͷೖྗ16ͷDilated causal convolution ͰΈΔ͜ͱ͕Մೳ • WavenetͰɺ࠷େDilation=512·ͰΛॏͶ(
1- block )ɺblockΛෳੵΈॏͶΔߏΛऔ͍ͬͯ Δɻ • Λਂֶͯ͘͠शͰ͖ΔΑ͏ʹResidualNetΛར ༻
None
• http://musyoku.github.io/images/post/ 2016-09-17/dilated_conv.gif
RNNͱWavenetͷֶशͷҧ͍ • RNNֶश࣌ɺ࣌ܥྻॱʹσʔλΛೖྗ͍ͯ͘͠ඞཁ͕͋ΔͨΊɺ࣌ؒ ͕͔͔Δɻ • WavenetCNNͷΑ͏ʹɺೖྗσʔλΛ࣌ܥྻʹॲཧ͢Δඞཁ͕ͳ͘ɺ ̍ʹωοτϫʔΫʹೖྗ͢ΔͨΊɺֶश͕ૣ͍ • αϯϓϧʹ͍ͭͯɺ࣌ܥྻॱʹֶश͢Δඞཁ͕ͳ͍ Wavenet
RNN
Wavenetͷߏ filter gate x: input k: layer
Conditional Wavenet • Conditional Pixel CNN ͱಉ༷ɺWavenetʹҙͷύϥϝʔλhಋೖ͢Δ ͜ͱͰɺWavenetΛύϥϝʔλͰૢ࡞ • Global
conditions: WavenetʹൃऀͷಛΛֶशͤ͞Δ ύϥϝʔλhʹΑͬͯൃશମͷதͰͷൃऀͷಛΛ࠶ݱͰ͖Δ ex: ࠃޠ͕ҟͳΔൃऀͷಛ શͯͷ࣌ؒεςοϓͰ࡞༻͢Δ߲
Conditional Wavenet • Local conditions: Wavenetʹݴ༿ͷಛΛֶशͤ͞Δ ݸʑͷ࣌ؒεςοϓͰ࡞༻͢Δ߲ ൃͷݴޠతಛΛύϥϝʔλͱͯ͠ೖྗͰ͖Δ ex: ୯ޠͷͭͳ͕ΓʹΑͬͯൃ͞Εͳ͍จࣈͳͲʁ
ੜ݁ՌσϞ https://deepmind.com/blog/wavenet-generative- model-raw-audio/
࣮ݧ݁Ռ • GoogleͷTTSσʔληοτΛར༻ֶͯ͠श • ैདྷख๏ʹൺͯߴ͍ਫ਼Λୡ
·ͱΊ • WavenetԻ߹ͷʹCNNͷख๏Λಋ ೖ͠ɺߴ͍߹ਫ਼Λୡͨ͠ • Dilated convolutionʹΑͬͯɺRNNͷΑ͏ʹ࣌ ܥྻσʔλʹద༻Ͱ͖ΔՄೳੑΛࣔͨ͠ɻ • Ի͚ͩͰͳ͘ɺԻָͷ߹ͳͲԠ༻ൣғ
͍
ࢀߟ • https://arxiv.org/abs/1609.03499 • ݪஶPDF • https://deepmind.com/blog/wavenet-generative-model-raw-audio/ • σϞ݁ՌͳͲ •
http://musyoku.github.io/2016/09/18/wavenet-a-generative-model-for-raw- audio/ • Chainer࣮Dilationͷ෦͕Θ͔Γ͍͢ • https://www.slideshare.net/DeepLearningJP2016/dlwavenet-a-generative- model-for-raw-audio