Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ディープラーニングで音楽生成
Search
masa-ita
October 13, 2018
Technology
0
1.3k
ディープラーニングで音楽生成
DL4USの最終課題として、ディープラーニングでの音楽生成を試みた。
LSTMによる予測モデル、VAE、GANを試した。
Python機械学習勉強会in新潟 2018-10-13での発表スライド。
masa-ita
October 13, 2018
Tweet
Share
More Decks by masa-ita
See All by masa-ita
Ollamaを使ったLocal Language Model活用法
itagakim
1
160
Run Instant NeRF on Docker
itagakim
1
2.3k
3D Clustering and Metric Learning
itagakim
0
350
Cloud TPUの使い方〜BigBirdの日本語学習済みモデルを作る〜
itagakim
0
680
多言語学習済みモデルmT5とは?
itagakim
1
710
AWSのGPUを安く使ってTensorFlowモデルを訓練する方法
itagakim
0
380
最近の自然言語処理モデルの動向
itagakim
1
570
ディープラーニングで芸術はできるか?〜生成系ネットワークの進展〜
itagakim
0
340
AWSとTerraform初心者がやってみたこと
itagakim
1
480
Other Decks in Technology
See All in Technology
Amazon Q Developer CLIをClaude Codeから使うためのベストプラクティスを考えてみた
dar_kuma_san
0
330
OPENLOGI Company Profile for engineer
hr01
1
46k
AI時代の発信活動 ~技術者として認知してもらうための発信法~ / 20251028 Masaki Okuda
shift_evolve
PRO
1
140
進化する大規模言語モデル評価: Swallowプロジェクトにおける実践と知見
chokkan
PRO
3
460
DSPy入門
tomehirata
6
880
文字列操作の達人になる ~ Kotlinの文字列の便利な世界 ~ - Kotlin fest 2025
tomorrowkey
2
470
SOTA競争から人間を超える画像認識へ
shinya7y
0
690
実践マルチモーダル検索!
shibuiwilliam
3
560
次世代のメールプロトコルの斜め読み
hirachan
3
350
Observability — Extending Into Incident Response
nari_ex
2
750
Open Table Format (OTF) が必要になった背景とその機能 (2025.10.28)
simosako
3
630
kotlin-lsp の開発開始に触発されて、Emacs で Kotlin 開発に挑戦した記録 / kotlin‑lsp as a Catalyst: My Journey to Kotlin Development in Emacs
nabeo
2
310
Featured
See All Featured
We Have a Design System, Now What?
morganepeng
54
7.9k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.8k
Keith and Marios Guide to Fast Websites
keithpitt
412
23k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.7k
Why Our Code Smells
bkeepers
PRO
340
57k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
36
6.1k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
How GitHub (no longer) Works
holman
315
140k
The Language of Interfaces
destraynor
162
25k
Transcript
Pythonػցֶशษڧձ in ৽ׁ Restart #2 Kerasで ⾳楽を作る 2018/10/13 ൘֞ ਖ਼හ
DL4USに参加し た DL4US౦ژେֶদඌ๛ݚڀ ࣨʹΑΔɺҰൠʹެ։͞Εͨɺ σΟʔϓϥʔχϯάͷΦϯϥ Πϯߨ࠲
7週にわたるカリキュラム ❖ Lesson1: खॻ͖จࣈೝࣝ —- χϡʔϥϧωοτϫʔΫ, Keras, ࠷దԽख๏, աֶशରࡦ ❖
Lesson2: ΈࠐΈχϡʔϥϧωοτϫʔΫͰը૾ೝࣝ —- CNN, σʔλ֦ு, Batch Normalization, Skip Connection ❖ Lesson3: ܥྻσʔλͰ༧ଌ —- RNNجૅ, LSTM, BPTT, Clipping, γϣʔτΧοτ, ήʔτ ❖ Lesson4: χϡʔϥϧ༁Ϟσϧ —- ݴޠϞσϧ, Seq2Seq, Attentionػߏ ❖ Lesson5: ը૾͔ΒΩϟϓγϣϯੜ —- Ωϟϓγϣϯੜ, సҠֶश, ϏʔϜαʔν ❖ Lesson6: χϡʔϥϧωοτͰը૾ੜ —- ਂੜϞσϧ, VAE, GAN ❖ Lesson7: χϡʔϥϧωοτͰήʔϜΛ߈ུ͢ΔAI —- DQN, OpenAI Gym, Double DQN, Dueling Network
iLect ΦϯϥΠϯߨ࠲Ͱఏڙ͞Εͨ GPU͕͑ΔԾڥ JupterLabͰڭࡐఏڙ ՝ίϯςετܗࣜ
最終課題[レポート] ❖ ʮσΟʔϓϥʔχϯάʹؔ͢Δ͜ͱͳΒ ԿΛ͍͍ͬͯΑʯ
⾳楽⽣成をやってみよう!
作戦 ❖ ࣮Ͱ͋·ΓΘͳ͍ੜϞσϧΛ͍Ζ͍Ζͱࢼͯ͠ ΈΔ ❖ RNNʢLSTMʣʹΑΔ༧ଌϞσϧ ❖ VAEʢVariational Auto EncoderʣʹΑΔੜ
❖ GANʢGenerative Adversarial NetworkʣʹΑΔੜ
LSTM ❖ Long Short Term Memory ❖ γʔέϯγϟϧσʔλΛѻ͏ߏͷද֨ http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTMによる予測モデル ❖ աڈͷγʔέϯε͔Βɺ࣍ʹԿ͕དྷΔ͔ͷ֬Λग़͢ɻ https://towardsdatascience.com/lstm-by-example-using-tensorflow-feb0c1968537
VAE ❖ Auto Encoderʢࣗݾූ߸Խثʣೖྗͷ࣍ݩΛݮͨ͠જࡏۭؒΛɺೖྗͱग़ྗ͕ಉ͡Α͏ʹͳ ΔΑ͏ʹֶश͢Δɻ ❖ Variational Auto EncoderʢมΦʔτΤϯίʔμʣɺજࡏۭؒΛଟมྔਖ਼نۭؒͱఆ͠ɺ ͦͷฏۉͱΛֶशʹΑͬͯٻΊΔɻ
http://mlexplained.com/2017/12/28/an-intuitive-explanation-of-variational-autoencoders-vaes-part-1/
VAE で⽣成される潜在空間 ❖ VAEʹΑͬͯੜ͞ΕΔજࡏۭؒʹʮҙຯͷ͋Δ࠲ඪ࣠ʯ͕ظ͞ΕΔɻ ❖ ্هMNISTͷखॻ͖ࣈͷྫ͕ͩɺإࣸਅͰʮײʯʮϝΨωʯʮͻ͛ʯʮஉ ঁʯͳͲͷ࠲ඪ͕࣠ݟग़͞Ε͍ͯΔɻ https://tiao.io/post/tutorial-on-variational-autoencoders-with-a-concise-keras-implementation/
GAN ❖ GANʢఢରతੜωοτϫʔΫʣͰɺʮآ࡞ऀʢGeneratorʣʯͱʮؑఆՈ ʢDiscriminatorʣʯ͕᛭ୖຏ͠ͳ͕ΒֶशΛߦ͏ɻ ❖ ݁Ռͱͯ͠ʮآ࡞ऀʯ͕ϥϯμϜͳϊΠζΛͱʹɺʮؑఆՈʯʹݟഁΒΕͳ͍Α͏ ͳʮຊΒ͍͠࡞ʯΛੜ͢Δ͜ͱ͕ظ͞ΕΔɻ https://skymind.ai/wiki/generative-adversarial-network-gan
LSTMのモデル ❖ LSTMΛ3ॏͶͨϞσϧ ❖ աֶशͷ੍ʹDropoutΛೖΕ͍ͯΔ͕ɾɾɾ model = Sequential() model.add(LSTM(512, input_shape=(sequence_length,
n_vocab), return_sequences=True)) model.add(Dropout(0.3)) model.add(LSTM(512, return_sequences=True)) model.add(Dropout(0.3)) model.add(LSTM(512)) model.add(Dense(256)) model.add(Dropout(0.3)) model.add(Dense(n_vocab, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc'])
VAEのモデル # Encoder x = Input(shape=(max_length, n_vocab)) h = LSTM(lstm_dim,
return_sequences=False, name='lstm_1')(x) z_mean = Dense(latent_dim)(h) # જࡏมͷฏۉ μ z_log_var = Dense(latent_dim)(h) #જࡏมͷࢄ σͷlog encoder = Model(inputs=x, outputs=[z_mean, z_log_var]) def sampling(args): z_mean, z_log_var = args epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0., stddev=1.0) return z_mean + K.exp(z_log_var) * epsilon z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var]) # Decoder decoder_input = Input(shape=(latent_dim,)) repeated_context = RepeatVector(max_length)(decoder_input) h_decoded = LSTM(lstm_dim, return_sequences=True)(repeated_context) decoder_output = TimeDistributed(Dense(n_vocab, activation='softmax'))(h_decoded) decoder = Model(inputs=decoder_input, outputs=decoder_output) x_decoded = decoder(z)
VAEの損失関数 ❖ VAEͷଛࣦؔɺೖྗͱग़ྗͷؒͷࠩҟΛද͢ʮ෮ݩޡࠩʯʹՃ͑ ͯɺજࡏۭؒͷύϥϝʔλΛنఆ͢Δʮਖ਼ଇԽ߲ʯΛ༻͍Δɻ class CustomVariationalLayer(Layer): # Layer classͷܧঝ def
__init__(self, **kwargs): self.is_placeholder = True super(CustomVariationalLayer, self).__init__(**kwargs) def vae_loss(self, x, x_decoded): x = K.flatten(x) x_decoded = K.flatten(x_decoded) xent_loss = max_length * metrics.binary_crossentropy(x, x_decoded) # ෮ݩޡࠩ kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1) # ਖ਼ଇԽ߲ return K.mean(xent_loss + kl_loss) def call(self, inputs): x = inputs[0] x_decoded = inputs[1] loss = self.vae_loss(x, x_decoded) self.add_loss(loss, inputs=inputs) # Layer class ͷadd_lossΛར༻ return x # ࣮࣭తʹग़ྗར༻͠ͳ͍ y = CustomVariationalLayer()([x, x_decoded]) vae = Model(x, y) # xΛinputʹyΛग़ྗ, ग़ྗ࣮࣭ؔͳ͍ vae.compile(optimizer='rmsprop', loss=None) # CustomVariationalLayerͰՃͨ͠LossΛར༻͢ΔͷͰ͜͜ͰͷlossNoneͱ͢Δ
GANのモデル ❖ GANͷ܇࿅GͱDΛަޓʹֶशͤ͞Δϓϩηε # Generator generator_input = Input(shape=(max_length, latent_dim,)) x
= LSTM(lstm_dim, return_sequences=True)(generator_input) generator_output = TimeDistributed(Dense(n_vocab, activation='softmax'))(x) generator = Model(generator_input, generator_output) # Discriminator discriminator_input = Input(shape=(max_length, n_vocab)) x = LSTM(lstm_dim)(discriminator_input) dense_output = Dense(256, activation='relu')(x) discriminator_output = Dense(2, activation='softmax')(dense_output) discriminator = Model(discriminator_input, discriminator_output) discriminator.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.1)) # GAN gan_input = Input(shape=(max_length, latent_dim)) x = generator(gan_input) gan_output = discriminator(x) model = Model(gan_input, gan_output) model.compile(loss='binary_crossentropy', optimizer=opt)
実験 ❖ σʔλmidiworld.comͷόοϋͷ2ΠϯϕϯγϣϯͷMIDIϑΝΠϧΛ༻ ❖ LSTMͱGANͰɺ܇࿅༻ָۂ͔ΒΓग़ͨ͠அยΛɺVAEͰ܇࿅༻ָۂͦͷ ͷΛ༻ͨ͠ ❖ LSTMͰ܇࿅༻ָۂͷχϡΞϯεʹ͍ۙϝϩσΟ͕ੜ͞Εͨ ❖ VAEͰ܇࿅༻ָۂΛೖྗʹͨ͠߹ɺݪۂͷχϡΞϯεʹ͍ۙϝϩσΟ͕ੜ
͞ΕΔ͕ɺͦΕҎ֎ͷજࡏۭؒͷΛࢦఆͨ͠߹ʹϥϯμϜੑͷڧ͍ϝ ϩσΟ͕ੜ͞Εͨ ❖ GANͰֶश͕͏·͘Ώ͔ͣɺ࣌ંύλʔϯੑͷڧ͍ϝϩσΟ͕ੜ͞Εͨ ͕ɺ΄ͱΜͲಉ͡Իූͷ܁Γฦ͠ʹͳͬͯ͠·ͬͨ
https://github.com/masa-ita/keras-music-generators https://soundcloud.com/itagakim
宣伝 ❖ python/django meetup in ৽ׁ ❖ 10݄24ʢਫʣ19:00-21:00 @ Prototype
Cafe ❖ https://pyml-niigata.connpass.com/event/104872/ ❖ ΦʔϓϯιʔεΧϯϑΝϨϯε 2018 Niigata ❖ 11݄10ʢʣ11:00-17:30 @ ΄ΜΆʔͱ ❖ https://www.ospn.jp/osc2018-niigata/
Python機械学習勉強会in新潟では、 Slackを使った情報交換を⾏っています。 後ほどconnpassのグループで招待リンクをお送りします。