masa-ita
October 13, 2018
1.1k

# ディープラーニングで音楽生成

DL4USの最終課題として、ディープラーニングでの音楽生成を試みた。
LSTMによる予測モデル、VAE、GANを試した。
Python機械学習勉強会in新潟 2018-10-13での発表スライド。

October 13, 2018

## Transcript

3. ### 7週にわたるカリキュラム ❖ Lesson1: खॻ͖จࣈೝࣝ —- χϡʔϥϧωοτϫʔΫ, Keras, ࠷దԽख๏, աֶशରࡦ ❖

Lesson2: ৞ΈࠐΈχϡʔϥϧωοτϫʔΫͰը૾ೝࣝ —- CNN, σʔλ֦ு, Batch Normalization, Skip Connection ❖ Lesson3: ܥྻσʔλͰ༧ଌ —- RNNجૅ, LSTM, BPTT, Clipping, γϣʔτΧοτ, ήʔτ ❖ Lesson4: χϡʔϥϧ຋༁Ϟσϧ —- ݴޠϞσϧ, Seq2Seq, Attentionػߏ ❖ Lesson5: ը૾͔ΒΩϟϓγϣϯੜ੒ —- Ωϟϓγϣϯੜ੒, సҠֶश, ϏʔϜαʔν ❖ Lesson6: χϡʔϥϧωοτͰը૾ੜ੒ —- ਂ૚ੜ੒Ϟσϧ, VAE, GAN ❖ Lesson7: χϡʔϥϧωοτͰήʔϜΛ߈ུ͢ΔAI —- DQN, OpenAI Gym, Double DQN, Dueling Network

10. ### VAE ❖ Auto Encoderʢࣗݾූ߸Խثʣ͸ೖྗͷ࣍ݩΛ࡟ݮͨ͠જࡏۭؒΛɺೖྗͱग़ྗ͕ಉ͡Α͏ʹͳ ΔΑ͏ʹֶश͢Δɻ ❖ Variational Auto Encoderʢม෼ΦʔτΤϯίʔμʣ͸ɺજࡏۭؒΛଟมྔਖ਼ن෼෍ۭؒͱ૝ఆ͠ɺ ͦͷฏۉͱ෼෍ΛֶशʹΑͬͯٻΊΔɻ

http://mlexplained.com/2017/12/28/an-intuitive-explanation-of-variational-autoencoders-vaes-part-1/

14. ### VAEのモデル # Encoder x = Input(shape=(max_length, n_vocab)) h = LSTM(lstm_dim,

return_sequences=False, name='lstm_1')(x) z_mean = Dense(latent_dim)(h) # જࡏม਺ͷฏۉ μ z_log_var = Dense(latent_dim)(h) #જࡏม਺ͷ෼ࢄ σͷlog encoder = Model(inputs=x, outputs=[z_mean, z_log_var]) def sampling(args): z_mean, z_log_var = args epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0., stddev=1.0) return z_mean + K.exp(z_log_var) * epsilon z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var]) # Decoder decoder_input = Input(shape=(latent_dim,)) repeated_context = RepeatVector(max_length)(decoder_input) h_decoded = LSTM(lstm_dim, return_sequences=True)(repeated_context) decoder_output = TimeDistributed(Dense(n_vocab, activation='softmax'))(h_decoded) decoder = Model(inputs=decoder_input, outputs=decoder_output) x_decoded = decoder(z)
15. ### VAEの損失関数 ❖ VAEͷଛࣦؔ਺͸ɺೖྗͱग़ྗͷؒͷࠩҟΛද͢ʮ෮ݩޡࠩʯʹՃ͑ ͯɺજࡏۭؒͷύϥϝʔλΛنఆ͢Δʮਖ਼ଇԽ߲ʯΛ༻͍Δɻ class CustomVariationalLayer(Layer): # Layer classͷܧঝ def

__init__(self, **kwargs): self.is_placeholder = True super(CustomVariationalLayer, self).__init__(**kwargs) def vae_loss(self, x, x_decoded): x = K.flatten(x) x_decoded = K.flatten(x_decoded) xent_loss = max_length * metrics.binary_crossentropy(x, x_decoded) # ෮ݩޡࠩ kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1) # ਖ਼ଇԽ߲ return K.mean(xent_loss + kl_loss) def call(self, inputs): x = inputs[0] x_decoded = inputs[1] loss = self.vae_loss(x, x_decoded) self.add_loss(loss, inputs=inputs) # Layer class ͷadd_lossΛར༻ return x # ࣮࣭తʹ͸ग़ྗ͸ར༻͠ͳ͍ y = CustomVariationalLayer()([x, x_decoded]) vae = Model(x, y) # xΛinputʹyΛग़ྗ, ग़ྗ͸࣮࣭ؔ܎ͳ͍ vae.compile(optimizer='rmsprop', loss=None) # CustomVariationalLayerͰ௥Ճͨ͠LossΛར༻͢ΔͷͰ͜͜Ͱͷloss͸Noneͱ͢Δ
16. ### GANのモデル ❖ GANͷ܇࿅͸GͱDΛަޓʹֶशͤ͞Δϓϩηε # Generator generator_input = Input(shape=(max_length, latent_dim,)) x

= LSTM(lstm_dim, return_sequences=True)(generator_input) generator_output = TimeDistributed(Dense(n_vocab, activation='softmax'))(x) generator = Model(generator_input, generator_output) # Discriminator discriminator_input = Input(shape=(max_length, n_vocab)) x = LSTM(lstm_dim)(discriminator_input) dense_output = Dense(256, activation='relu')(x) discriminator_output = Dense(2, activation='softmax')(dense_output) discriminator = Model(discriminator_input, discriminator_output) discriminator.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.1)) # GAN gan_input = Input(shape=(max_length, latent_dim)) x = generator(gan_input) gan_output = discriminator(x) model = Model(gan_input, gan_output) model.compile(loss='binary_crossentropy', optimizer=opt)
17. ### 実験 ❖ σʔλ͸midiworld.comͷόοϋͷ2੠ΠϯϕϯγϣϯͷMIDIϑΝΠϧΛ࢖༻ ❖ LSTMͱGANͰ͸ɺ܇࿅༻ָۂ͔Β੾Γग़ͨ͠அยΛɺVAEͰ܇࿅༻ָۂͦͷ ΋ͷΛ࢖༻ͨ͠ ❖ LSTMͰ͸܇࿅༻ָۂͷχϡΞϯεʹ͍ۙϝϩσΟ͕ੜ੒͞Εͨ ❖ VAEͰ͸܇࿅༻ָۂΛೖྗʹͨ͠৔߹ɺݪۂͷχϡΞϯεʹ͍ۙϝϩσΟ͕ੜ

੒͞ΕΔ͕ɺͦΕҎ֎ͷજࡏۭؒͷ఺Λࢦఆͨ͠৔߹ʹ͸ϥϯμϜੑͷڧ͍ϝ ϩσΟ͕ੜ੒͞Εͨ ❖ GANͰ͸ֶश͕͏·͘Ώ͔ͣɺ࣌ંύλʔϯੑͷڧ͍ϝϩσΟ͕ੜ੒͞Εͨ ͕ɺ΄ͱΜͲಉ͡Իූͷ܁Γฦ͠౳ʹͳͬͯ͠·ͬͨ

19. ### 宣伝 ❖ python/django meetup in ৽ׁ ❖ 10݄24೔ʢਫʣ19:00-21:00 @ Prototype

Cafe ❖ https://pyml-niigata.connpass.com/event/104872/ ❖ ΦʔϓϯιʔεΧϯϑΝϨϯε 2018 Niigata ❖ 11݄10೔ʢ౔ʣ11:00-17:30 @ ΄ΜΆʔͱ ❖ https://www.ospn.jp/osc2018-niigata/