An Introduction to Natural Language Generation in Python

Slide 1

Slide 1 text

Let the AI do the Talk Adventures with Natural Language Generation @MarcoBonzanini London Python meet-up // September 2018

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

• Sept 2016: Intro to NLP • Sept 2017: Intro to Word Embeddings • Sept 2018: Intro to NLG • Sept 2019: ???

Slide 4

Slide 4 text

NATURAL LANGUAGE GENERATION

Slide 5

Slide 5 text

Natural Language Processing 5

Slide 6

Slide 6 text

Natural Language  Understanding Natural Language  Generation Natural Language Processing 6

Slide 7

Slide 7 text

Natural Language Generation 7

Slide 8

Slide 8 text

The task of generating  Natural Language from a machine representation 8 Natural Language Generation

Slide 9

Slide 9 text

Applications of NLG 9

Slide 10

Slide 10 text

Applications of NLG 10 Summary Generation

Slide 11

Slide 11 text

Applications of NLG 11 Weather Report Generation

Slide 12

Slide 12 text

Applications of NLG 12 Automatic Journalism

Slide 13

Slide 13 text

Applications of NLG 13 Virtual Assistants / Chatbots

Slide 14

Slide 14 text

LANGUAGE  MODELLING

Slide 15

Slide 15 text

Language Model 15

Slide 16

Slide 16 text

Language Model A model that gives you the probability of a sequence of words 16

Slide 17

Slide 17 text

Language Model P(I’m going home)  >  P(Home I’m going) 17

Slide 18

Slide 18 text

Language Model P(I’m going home)  >  P(I’m going house) 18

Slide 19

Slide 19 text

Infinite Monkey Theorem https://en.wikipedia.org/wiki/Infinite_monkey_theorem 19

Slide 20

Slide 20 text

Infinite Monkey Theorem from random import choice from string import printable def monkey_hits_keyboard(n): output = [choice(printable) for _ in range(n)] print("The monkey typed:") print(''.join(output)) 20

Slide 21

Slide 21 text

Infinite Monkey Theorem >>> monkey_hits_keyboard(30) The monkey typed: % a9AK^YKx OkVG)u3.cQ,31("!ac% >>> monkey_hits_keyboard(30) The monkey typed: fWE,ou)cxmV2IZ l}jSV'XxQ**9'| 21

Slide 22

Slide 22 text

n-grams 22

Slide 23

Slide 23 text

n-grams Sequence on N items from a given sample of text 23

Slide 24

Slide 24 text

n-grams >>> from nltk import ngrams >>> list(ngrams("pizza", 3)) 24

Slide 25

Slide 25 text

n-grams >>> from nltk import ngrams >>> list(ngrams("pizza", 3)) [('p', 'i', 'z'), ('i', 'z', 'z'), ('z', 'z', ‘a')] 25

Slide 26

Slide 26 text

n-grams >>> from nltk import ngrams >>> list(ngrams("pizza", 3)) [('p', 'i', 'z'), ('i', 'z', 'z'), ('z', 'z', ‘a')] character-based trigrams 26

Slide 27

Slide 27 text

n-grams >>> s = "The quick brown fox".split() >>> list(ngrams(s, 2)) 27

Slide 28

Slide 28 text

n-grams >>> s = "The quick brown fox".split() >>> list(ngrams(s, 2)) [('The', 'quick'), ('quick', 'brown'), ('brown', 'fox')] 28

Slide 29

Slide 29 text

n-grams >>> s = "The quick brown fox".split() >>> list(ngrams(s, 2)) [('The', 'quick'), ('quick', 'brown'), ('brown', 'fox')] word-based bigrams 29

Slide 30

Slide 30 text

From n-grams to Language Model 30

Slide 31

Slide 31 text

From n-grams to Language Model • Given a large dataset of text • Find all the n-grams • Compute probabilities, e.g. count bigrams:      31

Slide 32

Slide 32 text

Example: Predictive Text in Mobile 32

Slide 33

Slide 33 text

Example: Predictive Text in Mobile 33

Slide 34

Slide 34 text

34 most likely next word Example: Predictive Text in Mobile

Slide 35

Slide 35 text

Marco is …            35 Example: Predictive Text in Mobile

Slide 36

Slide 36 text

Marco is a good time to get the latest ﬂash player is required for video playback is unavailable right now because this video is not sure if you have a great day. 36 Example: Predictive Text in Mobile

Slide 37

Slide 37 text

Limitations of LM so far 37

Slide 38

Slide 38 text

Limitations of LM so far • P(word | full history) is too expensive • P(word | previous few words) is feasible • … Local context only! Lack of global context 38

Slide 39

Slide 39 text

QUICK INTRO TO NEURAL NETWORKS

Slide 40

Slide 40 text

Neural Networks 40

Slide 41

Slide 41 text

Neural Networks 41 x1 x2 h1 y1 h2 h2

Slide 42

Slide 42 text

Neural Networks 42 x1 x2 h1 y1 h2 h3 Input layer Output layer Hidden layer(s)

Slide 43

Slide 43 text

Neurone Example 43

Slide 44

Slide 44 text

Neurone Example 44 x1 w2 w1 x2 ?

Slide 45

Slide 45 text

Neurone Example 45 x1 w2 w1 x2 ? F(w1x1 + w2x2)

Slide 46

Slide 46 text

Training the Network 46

Slide 47

Slide 47 text

Training the Network 47 • Random weight init • Run input through the network • Compute error  (loss function) • Use error to adjust weights  (gradient descent + back-propagation)

Slide 48

Slide 48 text

Slide 49

Slide 49 text

More on Training • Batch size • Iterations and Epochs • e.g. 1,000 data points, if batch size = 100 we need 10 iterations to complete 1 epoch 49

Slide 50

Slide 50 text

RECURRENT  NEURAL NETWORKS

Slide 51

Slide 51 text

Limitation of FFNN 51

Slide 52

Slide 52 text

Limitation of FFNN 52 Input and output of fixed size

Slide 53

Slide 53 text

Recurrent Neural Networks 53

Slide 54

Slide 54 text

Recurrent Neural Networks 54 http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Slide 55

Slide 55 text

Recurrent Neural Networks 55 http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Slide 56

Slide 56 text

Limitation of RNN 56

Slide 57

Slide 57 text

Limitation of RNN 57 “Vanishing gradient” Cannot “remember” what happened long ago

Slide 58

Slide 58 text

Long Short Term Memory 58

Slide 59

Slide 59 text

Long Short Term Memory 59 http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Slide 60

Slide 60 text

60 https://en.wikipedia.org/wiki/Long_short-term_memory

Slide 61

Slide 61 text

A BIT OF PRACTICE

Slide 62

Slide 62 text

Deep Learning in Python 62

Slide 63

Slide 63 text

Deep Learning in Python • Some NN support in scikit-learn • Many low-level frameworks: Theano, PyTorch, TensorFlow • … Keras! • Probably more 63

Slide 64

Slide 64 text

Keras 64

Slide 65

Slide 65 text

Keras • Simple, high-level API • Uses TensorFlow, Theano or CNTK as backend • Runs seamlessly on GPU • Easier to start with 65

Slide 66

Slide 66 text

LSTM Example 66

Slide 67

Slide 67 text

LSTM Example model = Sequential() model.add( LSTM( 128, input_shape=(maxlen,len(chars)) ) ) model.add(Dense(len(chars), activation='softmax')) 67 Define the network

Slide 68

Slide 68 text

LSTM Example optimizer = RMSprop(lr=0.01) model.compile(  loss='categorical_crossentropy',   optimizer=optimizer  ) 68 Configure the network

Slide 69

Slide 69 text

LSTM Example model.fit(x, y, batch_size=128, epochs=60, callbacks=[print_callback]) model.save(‘char_model.h5’) 69 Train the network

Slide 70

Slide 70 text

LSTM Example for i in range(output_size): ... preds = model.predict(x_pred, verbose=0)[0] next_index = sample(preds, diversity) next_char = indices_char[next_index] generated += next_char 70 Generate text

Slide 71

Slide 71 text

Slide 72

Slide 72 text

Sample Output 72

Slide 73

Slide 73 text

Sample Output are the glories it included. Now am I lrA to r ,d?ot praki ynhh kpHu ndst -h ahh umk,hrfheleuloluprffuamdaedospe aeooasak sh frxpaphrNumlpAryoaho (…) 73 Seed text After 1 epoch

Slide 74

Slide 74 text

Sample Output I go from thee: Bear me forthwitht wh, t che f uf ld,hhorfAs c c ff.h scfylhle, rigrya p s lee rmoy, tofhryg dd?ofr hl t y ftrhoodfe- r Py (…) 74 After ~5 epochs

Slide 75

Slide 75 text

Sample Output a wild-goose flies, Unclaim'd of any manwecddeelc uavekeMw gh whacelcwiiaeh xcacwiDac w fioarw ewoc h feicucra h,h, :ewh utiqitilweWy ha.h pc'hr, lagfh eIwislw ofiridete w laecheefb .ics,aicpaweteh fiw?egp t? (…) 75 After 20+ epochs

Slide 76

Slide 76 text

Tuning

Slide 77

Slide 77 text

Tuning • More layers? • More hidden nodes? or less? • More data? • A combination?

Slide 78

Slide 78 text

Wyr feirm hat. meancucd kreukk? , foremee shiciarplle. My, Bnyivlaunef sough bus: Wad vomietlhas nteos thun. lore orain, Ty thee I Boe, I rue. niat 78 Tuning After 1 epoch

Slide 79

Slide 79 text

Second Lord: They would be ruled after this chamber, and my fair nues begun out of the fact, to be conveyed, Whose noble souls I'll have the heart of the wars. Clown: Come, sir, I will make did behold your worship. 79 Tuning Much later http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Slide 80

Slide 80 text

FINAL REMARKS

Slide 81

Slide 81 text

A Couple of Tips

Slide 82

Slide 82 text

A Couple of Tips • You’ll need a GPU • Develop locally on very small dataset  then run on cloud on real data • At least 1M characters in input,  at least 20 epochs for training • model.save() !!!

Slide 83

Slide 83 text

Summary • Natural Language Generation is fun • Simple models vs. Neural Networks • Keras makes your life easier • A lot of trial-and-error!

Slide 84

Slide 84 text

THANK YOU @MarcoBonzanini speakerdeck.com/marcobonzanini GitHub.com/bonzanini marcobonzanini.com

Slide 85

Slide 85 text

• Brandon Rohrer on "Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)": https://www.youtube.com/watch?v=WCUNPb-5EYI • Chris Olah on Understanding LSTM Networks:  http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • Andrej Karpathy on "The Unreasonable Effectiveness of Recurrent Neural Networks":  http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Pics: • Weather forecast icon: https://commons.wikimedia.org/wiki/File:Newspaper_weather_forecast_-_today_and_tomorrow.svg • Stack of papers icon: https://commons.wikimedia.org/wiki/File:Stack_of_papers_tied.svg • Document icon: https://commons.wikimedia.org/wiki/File:Document_icon_(the_Noun_Project_27904).svg • News icon: https://commons.wikimedia.org/wiki/File:PICOL_icon_News.svg • Cortana icon: https://upload.wikimedia.org/wikipedia/commons/thumb/8/89/Microsoft_Cortana_light.svg/1024px- Microsoft_Cortana_light.svg.png • Siri icon: https://commons.wikimedia.org/wiki/File:Siri_icon.svg • Google assistant icon: https://commons.wikimedia.org/wiki/File:Google_mic.svg Readings & Credits