Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
wavenet
Search
soymsk
April 27, 2017
Technology
0
57
wavenet
soymsk
April 27, 2017
Tweet
Share
More Decks by soymsk
See All by soymsk
[SUSTEN 勉強会]マイナンバーカードの仕組み
soymsk
0
150
Google_Cloud_Next_19_AI_ML_Summary_public.pdf
soymsk
6
1.7k
DeNAにおけるデータ活用事例 〜移動体データ活用によるサービス創出とその基盤 / Data Driven Service in Taxi hiring app MOV
soymsk
0
300
Introduction of GCP Dataflow
soymsk
1
200
Other Decks in Technology
See All in Technology
検証を通して見えてきたTiDBの性能特性
lycorptech_jp
PRO
6
3.8k
Azure Container Apps + Bicep 〜 こんな感じで運用しています
kaz29
2
480
Tellus の衛星データを見てみよう #mf_fukuoka
kongmingstrap
0
210
家族アルバム みてねにおけるGrafana活用術 / Grafana Meetup Japan Vol.1 LT
isaoshimizu
1
770
アクセス制御にまつわる改善 / Improving access control
itkq
0
550
GrafanaMeetup_AmazonManagedGrafanaのアクセス制御機能とマルチテナント環境下でのアクセス制御について
daitak
0
240
プロンプトエンジニアリングでがんばらない-Agentic Workflow へ-近藤憲児
kenjikondobai
3
860
ChatworkのSRE部って実は 半分くらいPlatform Engineering部かもしれない
saramune
0
160
MapLibreとAmazon Location Service
dayjournal
1
160
プラットフォームってつくることより計測することが重要なんじゃないかという話 / Platform Engineering Meetup #8
taishin
1
370
長期間TiDBを使ってきた話 @ 私たちはなぜNewSQLを使うのかTiDB選定5社が語る選定理由と活用LT / Experiences with TiDB Over Time
chibiegg
2
900
IaCジェネレーターとBedrockで詳細設計書を生成してみた
tsukasa_ishimaru
1
280
Featured
See All Featured
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
274
13k
Typedesign – Prime Four
hannesfritz
36
2.1k
StorybookのUI Testing Handbookを読んだ
zakiyama
13
4.6k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
20
1.9k
What’s in a name? Adding method to the madness
productmarketing
PRO
16
2.6k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
352
28k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
227
16k
Fashionably flexible responsive web design (full day workshop)
malarkey
398
65k
How to name files
jennybc
65
93k
How STYLIGHT went responsive
nonsquared
92
4.8k
Embracing the Ebb and Flow
colly
80
4.1k
[RailsConf 2023] Rails as a piece of cake
palkan
23
4k
Transcript
Wavenet 2017/04/27 @soymsk
Wavenet • 2016ʹDeepMind͕ൃදͨ͠Ի߹ΞϧΰϦζϜ • Text to Speech(TTS)ͷͰߴ͍Ի߹ͷਫ਼Λୡ͠ ͨɻ • ࣮͕ެ։͞Ε͓ͯΒͣɺ·ͨࣜগͳ͘ɺ࣮ࡍʹͲͷΑ
͏ʹͳ͍ͬͯΔ͔ෆ໌ͳॴଟ͍ • Concatenate Text to Speech • parametric TTS parametric TTS • PixelRNN • PixelCNN 8BWFOFU +
ैདྷͷख๏ • Concatenate Text to Speech • ͍ԻσʔλΛେྔʹσʔλϕʔεʹ֨ೲ͠ɺͦΕΛͭͳ͗߹ΘͤΔख๏ • طଘͷσʔλΛͭͳ͗߹ΘͤΔ͚ͩͳͷͰɺڧௐɾ৭มߋͳͲ͕ۤखɻ·
ͨɺ߹ޙͷԻͷͭͳ͕ΓෆࣗવʹͳΓ͕ͪ • parametric TTS • ੜϞσϧʹΑͬͯԻ߹͢Δख๏ • ൃ༰ൃऀͷಛΛϞσϧͷೖྗͱͯ͠ίϯτϩʔϧͤ͞Δ͜ͱ͕Ͱ ͖ΔΑ͏ʹͳͬͨɻ • ͨͩ͠ɺࣗવͳൃɺͱݴ͍͍
ैདྷख๏
Wavenet
Wavenet • Wavenetաڈͷೖྗσʔλ͔Β࣍ͷԻ σʔλͷ֬Λ༧ଌ͢Δ t: ࣌ࠁ x: ೖྗԻ
ೖྗԻσʔλ • Իσʔλܗࣜ • ྔࢠԽ: 16bit • αϯϓϦϯάप: 44.1 kHz
(ԻCD)
Wavenetग़ྗσʔλܗࣜ • Ի৴߸Ұൠతʹ16bitͰྔࢠԽ͞Ε͓ͯΓɺͦ ͷ··Ͱ65,536ͷ1 of N ग़ྗϊʔυ͕ඞཁ • ԼهͷΑ͏ʹೖྗΛมͯ͠ѹॖ •
ԻͰҰൠతͳѹॖܗࣜ: μ-law 256ϊʔυ·Ͱѹॖ
8BWFOFU ЖMBX෮߸ t-1 0 ࣌ࠁtʹ͓͚Δग़ྗ: 1 of 256
Dilated causal convolution
Dilated causal convolution • ࣌ܥྻͷԻσʔλʹରͯ͠ɺRNNͰͳ͘ConvolutionͰֶशΛߦ͏ɻ • ΈࠐΈͷϑΟϧλΛ2ͱ͢ΔͱɺҎԼͷΑ͏ʹ4Ͱ5͔ͭ͠ΈΒΕͳ͍ɻʢ௨ৗͷ ࠐΈ) • 44.1kHz
(ԻCD)ͷೖྗΛѻ͏߹ɺ1ඵؒͷԻೖྗ͚ͩͰɺ44100ͷೖྗϊʔυ͕ඞཁ receptive field(ड༰) = 5
Dilated causal convolution • Dilated causal convolutionͰೖྗΛNݸඈ͠Ͱ࣍ͷʹೖྗ͢Δɻ • ͕ਂ͘ͳΔͨͼʹDilationͷΛഒʹ͢Δ •
DilationʹΑͬͯग़ྗϊʔυͷड༰Λ૿͢͜ͱ͕Ͱ͖Δ
Dilated causal convolution • 44100ͷೖྗ16ͷDilated causal convolution ͰΈΔ͜ͱ͕Մೳ • WavenetͰɺ࠷େDilation=512·ͰΛॏͶ(
1- block )ɺblockΛෳੵΈॏͶΔߏΛऔ͍ͬͯ Δɻ • Λਂֶͯ͘͠शͰ͖ΔΑ͏ʹResidualNetΛར ༻
None
• http://musyoku.github.io/images/post/ 2016-09-17/dilated_conv.gif
RNNͱWavenetͷֶशͷҧ͍ • RNNֶश࣌ɺ࣌ܥྻॱʹσʔλΛೖྗ͍ͯ͘͠ඞཁ͕͋ΔͨΊɺ࣌ؒ ͕͔͔Δɻ • WavenetCNNͷΑ͏ʹɺೖྗσʔλΛ࣌ܥྻʹॲཧ͢Δඞཁ͕ͳ͘ɺ ̍ʹωοτϫʔΫʹೖྗ͢ΔͨΊɺֶश͕ૣ͍ • αϯϓϧʹ͍ͭͯɺ࣌ܥྻॱʹֶश͢Δඞཁ͕ͳ͍ Wavenet
RNN
Wavenetͷߏ filter gate x: input k: layer
Conditional Wavenet • Conditional Pixel CNN ͱಉ༷ɺWavenetʹҙͷύϥϝʔλhಋೖ͢Δ ͜ͱͰɺWavenetΛύϥϝʔλͰૢ࡞ • Global
conditions: WavenetʹൃऀͷಛΛֶशͤ͞Δ ύϥϝʔλhʹΑͬͯൃશମͷதͰͷൃऀͷಛΛ࠶ݱͰ͖Δ ex: ࠃޠ͕ҟͳΔൃऀͷಛ શͯͷ࣌ؒεςοϓͰ࡞༻͢Δ߲
Conditional Wavenet • Local conditions: Wavenetʹݴ༿ͷಛΛֶशͤ͞Δ ݸʑͷ࣌ؒεςοϓͰ࡞༻͢Δ߲ ൃͷݴޠతಛΛύϥϝʔλͱͯ͠ೖྗͰ͖Δ ex: ୯ޠͷͭͳ͕ΓʹΑͬͯൃ͞Εͳ͍จࣈͳͲʁ
ੜ݁ՌσϞ https://deepmind.com/blog/wavenet-generative- model-raw-audio/
࣮ݧ݁Ռ • GoogleͷTTSσʔληοτΛར༻ֶͯ͠श • ैདྷख๏ʹൺͯߴ͍ਫ਼Λୡ
·ͱΊ • WavenetԻ߹ͷʹCNNͷख๏Λಋ ೖ͠ɺߴ͍߹ਫ਼Λୡͨ͠ • Dilated convolutionʹΑͬͯɺRNNͷΑ͏ʹ࣌ ܥྻσʔλʹద༻Ͱ͖ΔՄೳੑΛࣔͨ͠ɻ • Ի͚ͩͰͳ͘ɺԻָͷ߹ͳͲԠ༻ൣғ
͍
ࢀߟ • https://arxiv.org/abs/1609.03499 • ݪஶPDF • https://deepmind.com/blog/wavenet-generative-model-raw-audio/ • σϞ݁ՌͳͲ •
http://musyoku.github.io/2016/09/18/wavenet-a-generative-model-for-raw- audio/ • Chainer࣮Dilationͷ෦͕Θ͔Γ͍͢ • https://www.slideshare.net/DeepLearningJP2016/dlwavenet-a-generative- model-for-raw-audio