ディープラーニングで音楽生成

F8865f41777ef3caced0e4e6801ff83a?s=47 masa-ita
October 13, 2018

 ディープラーニングで音楽生成

DL4USの最終課題として、ディープラーニングでの音楽生成を試みた。
LSTMによる予測モデル、VAE、GANを試した。
Python機械学習勉強会in新潟 2018-10-13での発表スライド。

F8865f41777ef3caced0e4e6801ff83a?s=128

masa-ita

October 13, 2018
Tweet

Transcript

  1. Pythonػցֶशษڧձ in ৽ׁ Restart #2 Kerasで ⾳楽を作る 2018/10/13 ൘֞ ਖ਼හ

  2. DL4USに参加し た DL4US͸౦ژେֶদඌ๛ݚڀ ࣨʹΑΔɺҰൠʹެ։͞Εͨɺ σΟʔϓϥʔχϯάͷΦϯϥ Πϯߨ࠲

  3. 7週にわたるカリキュラム ❖ Lesson1: खॻ͖จࣈೝࣝ —- χϡʔϥϧωοτϫʔΫ, Keras, ࠷దԽख๏, աֶशରࡦ ❖

    Lesson2: ৞ΈࠐΈχϡʔϥϧωοτϫʔΫͰը૾ೝࣝ —- CNN, σʔλ֦ு, Batch Normalization, Skip Connection ❖ Lesson3: ܥྻσʔλͰ༧ଌ —- RNNجૅ, LSTM, BPTT, Clipping, γϣʔτΧοτ, ήʔτ ❖ Lesson4: χϡʔϥϧ຋༁Ϟσϧ —- ݴޠϞσϧ, Seq2Seq, Attentionػߏ ❖ Lesson5: ը૾͔ΒΩϟϓγϣϯੜ੒ —- Ωϟϓγϣϯੜ੒, సҠֶश, ϏʔϜαʔν ❖ Lesson6: χϡʔϥϧωοτͰը૾ੜ੒ —- ਂ૚ੜ੒Ϟσϧ, VAE, GAN ❖ Lesson7: χϡʔϥϧωοτͰήʔϜΛ߈ུ͢ΔAI —- DQN, OpenAI Gym, Double DQN, Dueling Network
  4. iLect ΦϯϥΠϯߨ࠲Ͱఏڙ͞Εͨ GPU͕࢖͑ΔԾ૝؀ڥ JupterLabͰڭࡐఏڙ ՝୊͸ίϯςετܗࣜ

  5. 最終課題[レポート] ❖ ʮσΟʔϓϥʔχϯάʹؔ͢Δ͜ͱͳΒ ԿΛ΍ͬͯ΋͍͍Αʯ

  6. ⾳楽⽣成をやってみよう!

  7. 作戦 ❖ ࣮຿Ͱ͸͋·Γ࢖Θͳ͍ੜ੒ϞσϧΛ͍Ζ͍Ζͱࢼͯ͠ ΈΔ ❖ RNNʢLSTMʣʹΑΔ༧ଌϞσϧ ❖ VAEʢVariational Auto EncoderʣʹΑΔੜ੒

    ❖ GANʢGenerative Adversarial NetworkʣʹΑΔੜ੒
  8. LSTM ❖ Long Short Term Memory ❖ γʔέϯγϟϧσʔλΛѻ͏ߏ଄ͷ୅ද֨ http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  9. LSTMによる予測モデル ❖ աڈͷγʔέϯε͔Βɺ࣍ʹԿ͕དྷΔ͔ͷ֬཰Λग़͢ɻ https://towardsdatascience.com/lstm-by-example-using-tensorflow-feb0c1968537

  10. VAE ❖ Auto Encoderʢࣗݾූ߸Խثʣ͸ೖྗͷ࣍ݩΛ࡟ݮͨ͠જࡏۭؒΛɺೖྗͱग़ྗ͕ಉ͡Α͏ʹͳ ΔΑ͏ʹֶश͢Δɻ ❖ Variational Auto Encoderʢม෼ΦʔτΤϯίʔμʣ͸ɺજࡏۭؒΛଟมྔਖ਼ن෼෍ۭؒͱ૝ఆ͠ɺ ͦͷฏۉͱ෼෍ΛֶशʹΑͬͯٻΊΔɻ

    http://mlexplained.com/2017/12/28/an-intuitive-explanation-of-variational-autoencoders-vaes-part-1/
  11. VAE で⽣成される潜在空間 ❖ VAEʹΑͬͯੜ੒͞ΕΔજࡏۭؒʹ͸ʮҙຯͷ͋Δ࠲ඪ࣠ʯ͕ظ଴͞ΕΔɻ ❖ ্ه͸MNISTͷखॻ͖਺ࣈͷྫ͕ͩɺإࣸਅͰ͸ʮײ৘ʯʮϝΨωʯʮͻ͛ʯʮஉ ঁʯͳͲͷ࠲ඪ͕࣠ݟग़͞Ε͍ͯΔɻ https://tiao.io/post/tutorial-on-variational-autoencoders-with-a-concise-keras-implementation/

  12. GAN ❖ GANʢఢରతੜ੒ωοτϫʔΫʣͰ͸ɺʮآ࡞ऀʢGeneratorʣʯͱʮؑఆՈ ʢDiscriminatorʣʯ͕੾᛭ୖຏ͠ͳ͕ΒֶशΛߦ͏ɻ ❖ ݁Ռͱͯ͠ʮآ࡞ऀʯ͕ϥϯμϜͳϊΠζΛ΋ͱʹɺʮؑఆՈʯʹ͸ݟഁΒΕͳ͍Α͏ ͳʮຊ෺Β͍͠࡞඼ʯΛੜ੒͢Δ͜ͱ͕ظ଴͞ΕΔɻ https://skymind.ai/wiki/generative-adversarial-network-gan

  13. LSTMのモデル ❖ LSTMΛ3૚ॏͶͨϞσϧ ❖ աֶशͷ཈੍ʹDropoutΛೖΕ͍ͯΔ͕ɾɾɾ model = Sequential() model.add(LSTM(512, input_shape=(sequence_length,

    n_vocab), return_sequences=True)) model.add(Dropout(0.3)) model.add(LSTM(512, return_sequences=True)) model.add(Dropout(0.3)) model.add(LSTM(512)) model.add(Dense(256)) model.add(Dropout(0.3)) model.add(Dense(n_vocab, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc'])
  14. VAEのモデル # Encoder x = Input(shape=(max_length, n_vocab)) h = LSTM(lstm_dim,

    return_sequences=False, name='lstm_1')(x) z_mean = Dense(latent_dim)(h) # જࡏม਺ͷฏۉ μ z_log_var = Dense(latent_dim)(h) #જࡏม਺ͷ෼ࢄ σͷlog encoder = Model(inputs=x, outputs=[z_mean, z_log_var]) def sampling(args): z_mean, z_log_var = args epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0., stddev=1.0) return z_mean + K.exp(z_log_var) * epsilon z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var]) # Decoder decoder_input = Input(shape=(latent_dim,)) repeated_context = RepeatVector(max_length)(decoder_input) h_decoded = LSTM(lstm_dim, return_sequences=True)(repeated_context) decoder_output = TimeDistributed(Dense(n_vocab, activation='softmax'))(h_decoded) decoder = Model(inputs=decoder_input, outputs=decoder_output) x_decoded = decoder(z)
  15. VAEの損失関数 ❖ VAEͷଛࣦؔ਺͸ɺೖྗͱग़ྗͷؒͷࠩҟΛද͢ʮ෮ݩޡࠩʯʹՃ͑ ͯɺજࡏۭؒͷύϥϝʔλΛنఆ͢Δʮਖ਼ଇԽ߲ʯΛ༻͍Δɻ class CustomVariationalLayer(Layer): # Layer classͷܧঝ def

    __init__(self, **kwargs): self.is_placeholder = True super(CustomVariationalLayer, self).__init__(**kwargs) def vae_loss(self, x, x_decoded): x = K.flatten(x) x_decoded = K.flatten(x_decoded) xent_loss = max_length * metrics.binary_crossentropy(x, x_decoded) # ෮ݩޡࠩ kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1) # ਖ਼ଇԽ߲ return K.mean(xent_loss + kl_loss) def call(self, inputs): x = inputs[0] x_decoded = inputs[1] loss = self.vae_loss(x, x_decoded) self.add_loss(loss, inputs=inputs) # Layer class ͷadd_lossΛར༻ return x # ࣮࣭తʹ͸ग़ྗ͸ར༻͠ͳ͍ y = CustomVariationalLayer()([x, x_decoded]) vae = Model(x, y) # xΛinputʹyΛग़ྗ, ग़ྗ͸࣮࣭ؔ܎ͳ͍ vae.compile(optimizer='rmsprop', loss=None) # CustomVariationalLayerͰ௥Ճͨ͠LossΛར༻͢ΔͷͰ͜͜Ͱͷloss͸Noneͱ͢Δ
  16. GANのモデル ❖ GANͷ܇࿅͸GͱDΛަޓʹֶशͤ͞Δϓϩηε # Generator generator_input = Input(shape=(max_length, latent_dim,)) x

    = LSTM(lstm_dim, return_sequences=True)(generator_input) generator_output = TimeDistributed(Dense(n_vocab, activation='softmax'))(x) generator = Model(generator_input, generator_output) # Discriminator discriminator_input = Input(shape=(max_length, n_vocab)) x = LSTM(lstm_dim)(discriminator_input) dense_output = Dense(256, activation='relu')(x) discriminator_output = Dense(2, activation='softmax')(dense_output) discriminator = Model(discriminator_input, discriminator_output) discriminator.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.1)) # GAN gan_input = Input(shape=(max_length, latent_dim)) x = generator(gan_input) gan_output = discriminator(x) model = Model(gan_input, gan_output) model.compile(loss='binary_crossentropy', optimizer=opt)
  17. 実験 ❖ σʔλ͸midiworld.comͷόοϋͷ2੠ΠϯϕϯγϣϯͷMIDIϑΝΠϧΛ࢖༻ ❖ LSTMͱGANͰ͸ɺ܇࿅༻ָۂ͔Β੾Γग़ͨ͠அยΛɺVAEͰ܇࿅༻ָۂͦͷ ΋ͷΛ࢖༻ͨ͠ ❖ LSTMͰ͸܇࿅༻ָۂͷχϡΞϯεʹ͍ۙϝϩσΟ͕ੜ੒͞Εͨ ❖ VAEͰ͸܇࿅༻ָۂΛೖྗʹͨ͠৔߹ɺݪۂͷχϡΞϯεʹ͍ۙϝϩσΟ͕ੜ

    ੒͞ΕΔ͕ɺͦΕҎ֎ͷજࡏۭؒͷ఺Λࢦఆͨ͠৔߹ʹ͸ϥϯμϜੑͷڧ͍ϝ ϩσΟ͕ੜ੒͞Εͨ ❖ GANͰ͸ֶश͕͏·͘Ώ͔ͣɺ࣌ંύλʔϯੑͷڧ͍ϝϩσΟ͕ੜ੒͞Εͨ ͕ɺ΄ͱΜͲಉ͡Իූͷ܁Γฦ͠౳ʹͳͬͯ͠·ͬͨ
  18. https://github.com/masa-ita/keras-music-generators https://soundcloud.com/itagakim

  19. 宣伝 ❖ python/django meetup in ৽ׁ ❖ 10݄24೔ʢਫʣ19:00-21:00 @ Prototype

    Cafe ❖ https://pyml-niigata.connpass.com/event/104872/ ❖ ΦʔϓϯιʔεΧϯϑΝϨϯε 2018 Niigata ❖ 11݄10೔ʢ౔ʣ11:00-17:30 @ ΄ΜΆʔͱ ❖ https://www.ospn.jp/osc2018-niigata/
  20. Python機械学習勉強会in新潟では、 Slackを使った情報交換を⾏っています。 後ほどconnpassのグループで招待リンクをお送りします。