Slide 1

Slide 1 text

Character level LSTM Recurrent Neural Networks for language and music modeling Δημήτρης Σπαθής Εξαμηνιαία εργασία – 05 / 2016 Ανάλυση & περιγραφή πολυμεσικών δεδομένων Καθ.: Αναστάσιος Τέφας

Slide 2

Slide 2 text

Eισαγωγή Η ανάπτυξη στην έρευνα των deep neural networks τα τελευταία χρόνια, σε συνδυασμό με τη χρήση των GPU και την πρόσβαση σε μεγάλα δεδομένα εκπαίδευσης, έχουν οδηγήσει σε νέα state-of-the-art αποτελέσματα σε πολλά δύσκολα προβλήματα εικόνας και ακολουθιών. Η παρούσα δουλειά εστιάζει στα Recurrent Neural Networks μακράς βραχυπρόθεσμης μνήμης (LSTM) για τη μοντελοποίηση γλώσσας και μουσικής.

Slide 3

Slide 3 text

“we cut our transcription errors by 49%” SPEECH RECOGNITION

Slide 4

Slide 4 text

“the LSTM did not have difficulty on long sentences” MACHINE TRANSLATION

Slide 5

Slide 5 text

char-RNN GENERATIVE MODELS

Slide 6

Slide 6 text

Tα Recurrent Neural Networks διαθέτουν βρόγχους

Slide 7

Slide 7 text

To πρόβλημα των long term dependencies

Slide 8

Slide 8 text

Ένα απλό RNN class RNN: def step(self, x): # update the hidden state self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x)) # compute the output vector y = np.dot(self.W_hy, self.h) return y

Slide 9

Slide 9 text

Oι πράξεις στο εσωτερικό ενός RNN

Slide 10

Slide 10 text

RNN LSTM

Slide 11

Slide 11 text

Cell state & gates

Slide 12

Slide 12 text

Forget gate layer

Slide 13

Slide 13 text

Input gate layer

Slide 14

Slide 14 text

Update cell state

Slide 15

Slide 15 text

Filtered output

Slide 16

Slide 16 text

“none of the variants can improve upon the standard LSTM architecture” LSTM VARIANTS ANALYSIS

Slide 17

Slide 17 text

Πειράματα

Slide 18

Slide 18 text

Βιβλιοθήκες & frameworks ΠΕΡΙΒΑΛΛΟΝ Python python.org Tensorflow tensorflow.org Docker docker.com LIBRARIES Numpy import numpy

Slide 19

Slide 19 text

Datasets για εκπαίδευση DATA SIZE TRAINING TIME TRAINING LOSS English folk music 14 MB 12 MB 27 hrs 92 hrs 0.6 0.85 Shakespeare 4.4 MB 2 hrs 1.2 Game of Thrones books 10.3 MB 8 hrs 1.1

Slide 20

Slide 20 text

Training output 8600/33992 (epoch 2), train_loss = 1.209, time/batch = 2.264 8601/33992 (epoch 2), train_loss = 1.189, time/batch = 2.205 8602/33992 (epoch 2), train_loss = 1.198, time/batch = 2.482 8603/33992 (epoch 2), train_loss = 1.276, time/batch = 2.410 8604/33992 (epoch 2), train_loss = 1.213, time/batch = 2.367 8605/33992 (epoch 2), train_loss = 1.193, time/batch = 2.264 8606/33992 (epoch 2), train_loss = 1.218, time/batch = 2.291 8607/33992 (epoch 2), train_loss = 1.208, time/batch = 2.323 8608/33992 (epoch 2), train_loss = 1.195, time/batch = 2.336 8609/33992 (epoch 2), train_loss = 1.156, time/batch = 2.378 8610/33992 (epoch 2), train_loss = 1.236, time/batch = 2.468 8611/33992 (epoch 2), train_loss = 1.193, time/batch = 2.214 8612/33992 (epoch 2), train_loss = 1.222, time/batch = 2.368 8613/33992 (epoch 2), train_loss = 1.241, time/batch = 2.595 8614/33992 (epoch 2), train_loss = 1.208, time/batch = 2.730 8615/33992 (epoch 2), train_loss = 1.188, time/batch = 2.571

Slide 21

Slide 21 text

Συμβολική αναπαράσταση μουσικής — ABC NOTATION T:Milo mou kokkino M:7/8 L:1/8 K:C P:A y("C"C/D/E)E ED DC | ("F"D/E/F)F FE ED | ("C"C/D/E)E ED DC | "G"CB,A, G,4 | ("C"[C/E/][D/F/][EG])[EG] [EG][DF] [DF][CE] | ("F"[D/F/][E/G/][FA])[FA] [FA][EG] [EG][DF] | ("C"[C/E/][D/F/][EG])[EG] [EG][DF] [DF][CE] | [C2E2][DF] ("G"[E/G/][D/F/][C/E/][D/F/] "C"[C2E2]) || P:B |: "C"C3 C2 C2 | E2F G2 GF | "F"A3 A2 A2 |1 "C"PG2F E>F ED :|2 "C"PG2F E4 | |: "C"G3 AG FE | "F"PF2E D2 CD | "G"E3 FE DC |1 "C"E2D CDEF :|2 "C"E2D C4 |

Slide 22

Slide 22 text

Συμβολική αναπαράσταση μουσικής — ABC NOTATION T:Milo mou kokkino M:7/8 L:1/8 K:C P:A y("C"C/D/E)E ED DC | ("F"D/E/F)F FE ED | ("C"C/D/E)E ED DC | "G"CB,A, G,4 | ("C"[C/E/][D/F/][EG])[EG] [EG][DF] [DF][CE] | ("F"[D/F/][E/G/][FA])[FA] [FA][EG] [EG][DF] | ("C"[C/E/][D/F/][EG])[EG] [EG][DF] [DF][CE] | [C2E2][DF] ("G"[E/G/][D/F/][C/E/][D/F/] "C"[C2E2]) || P:B |: "C"C3 C2 C2 | E2F G2 GF | "F"A3 A2 A2 |1 "C"PG2F E>F ED :|2 "C"PG2F E4 | |: "C"G3 AG FE | "F"PF2E D2 CD | "G"E3 FE DC |1 "C"E2D CDEF :|2 "C"E2D C4 |

Slide 23

Slide 23 text

Folk sampling 1 — Normalized dataset

Slide 24

Slide 24 text

Folk sampling 2 — Normalized dataset

Slide 25

Slide 25 text

Folk sampling INFINITE LOOP — Normalized dataset

Slide 26

Slide 26 text

Folk sampling 3 — Raw dataset

Slide 27

Slide 27 text

Jukedeck – Cambridge University spinoff — $3.8M Funding

Slide 28

Slide 28 text

Shakespeare sampling 1 him to merquess, see I have night, Whom hast! I do nerless could all found, the head, Bro'llo, to have done of himself and be death I'll come, come up that make him; Thou lives of a mansead oft help! BENVOLIO: Go, knong myself. JOHN OF GAUNT: All, 'twere need of that wime? ISABELLA: O, that met's gabe it of there; Wite that cold from him, eaching a hands; For little dineming to ten accused! CAPULET: Here'st you must stuff'd friends, and mine enember; found! What yields him battle kings; No o

Slide 29

Slide 29 text

Shakespeare sampling 2 WARWICK: And reveist is you have like by head's haste. CLIFFORD: Good of you reportion, to do; speak guide; He not to between Wide be sense my bird. Malale. Who York, and dowry all; Your bejurny is two men. KING RICHARD III: Why, what upon me, I store, made to countenarch! Though not England, by yourself? I darrant thou idst them advice. Nurse: Come to dropp-none, Margaret, we'll know. Sir I have brief, my all me, and of this: and heinoul hit you; yes well had in my budian's

Slide 30

Slide 30 text

Game of Thrones sampling 1 — Infinite Loop the sound of the castle was a short of the castle and the steps of the castle was a short of the castle and the steps of the castle was a short of the castle and the steps of the castle was a short of the castle

Slide 31

Slide 31 text

Game of Thrones sampling 2 It might use no time her and the king was close no sister in your sister, and better than the song of some golden days have a children of the man who ever like the man and for a king king in the armory of the same of Robert Eddard I know I wanted up in wine. The Wall remembered it to get before the red man grown cloak and king to see you to read me some of the horse in the day he began her with the jars were more than she knew for one and you like the gods be done in the starter t

Slide 32

Slide 32 text

Game of Thrones sampling 3 winter is very for his hand. A bath was gone to one can part with Jon had seemed to the day the steps Martell Ser Hyle had been for the world like the Seven Greyjoy said of the Great Selmy and her brother of a woman I promise of men and part of Old Tower Tower and the black hand and not some down on his hand and a man king for part of some for Maester Men, children so hard to be a pair and blood and cloak me her brother made him for a woman in one the horse down his dark and warm and made the red han

Slide 33

Slide 33 text

Game of Thrones sampling 3 who will sit at the iron throne? I fear with a golden men and the end of your House say no queen has you with me and blood king me in the courage.I want to it to prove Bran Snow.s words and said the steps in the fire. A few grey had been the courage was when when Father said no lord in the part of the steps he could want to be just some so she had a chain red and cloak and the steps and said it was the th

Slide 34

Slide 34 text

Further work

Slide 35

Slide 35 text

“long-range dependencies such as line lengths, quotes and brackets.” UNDERSTANDING ACTIVATIONS

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

“automatically learned to fix its gaze on salient objects while generating the corresponding words.” CNN + RNN = ATTENTION

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

References Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. Neural Networks, IEEE Transactions on, 5(2), 157-166. Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. arXiv preprint arXiv:1206.6392. Eck, D., & Schmidhuber, J. (2002). Finding temporal structure in music: Blues improvisation with LSTM recurrent networks. In Neural Networks for Signal Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on (pp. 747- 756). IEEE. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. Sturm Bob L., J. Santos, I. Korshunova. (2015) Folk Music Style Modelling by Recurrent Neural Networks with Long Short Term Memory Units. 16th International Society for Music Information Retrieval Conference.