Let the AI Do the Talk: Adventures with Natural Language Generation

Slide 1

Slide 1 text

Let the AI do the Talk Adventures with Natural Language Generation @MarcoBonzanini #PyConX

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

PyData London Conference 12-14 July 2019 @PyDataLondon

Slide 4

Slide 4 text

NATURAL LANGUAGE GENERATION

Slide 5

Slide 5 text

Natural Language Processing 5

Slide 6

Slide 6 text

Natural Language  Understanding Natural Language  Generation Natural Language Processing 6

Slide 7

Slide 7 text

Natural Language Generation 7

Slide 8

Slide 8 text

The task of generating  Natural Language from a machine representation 8 Natural Language Generation

Slide 9

Slide 9 text

Applications of NLG 9

Slide 10

Slide 10 text

Applications of NLG 10 Summary Generation

Slide 11

Slide 11 text

Applications of NLG 11 Weather Report Generation

Slide 12

Slide 12 text

Applications of NLG 12 Automatic Journalism

Slide 13

Slide 13 text

Applications of NLG 13 Virtual Assistants / Chatbots

Slide 14

Slide 14 text

LANGUAGE  MODELLING

Slide 15

Slide 15 text

Language Model 15

Slide 16

Slide 16 text

Language Model A model that gives you the probability of a sequence of words 16

Slide 17

Slide 17 text

Language Model P(I’m going home)  >  P(Home I’m going) 17

Slide 18

Slide 18 text

Language Model P(I’m going home)  >  P(I’m going house) 18

Slide 19

Slide 19 text

Infinite Monkey Theorem https://en.wikipedia.org/wiki/Infinite_monkey_theorem 19

Slide 20

Slide 20 text

Infinite Monkey Theorem from random import choice from string import printable def monkey_hits_keyboard(n): output = [choice(printable) for _ in range(n)] print("The monkey typed:") print(''.join(output)) 20

Slide 21

Slide 21 text

Infinite Monkey Theorem >>> monkey_hits_keyboard(30) The monkey typed: % a9AK^YKx OkVG)u3.cQ,31("!ac% >>> monkey_hits_keyboard(30) The monkey typed: fWE,ou)cxmV2IZ l}jSV'XxQ**9'| 21

Slide 22

Slide 22 text

n-grams 22

Slide 23

Slide 23 text

n-grams Sequence on N items from a given sample of text 23

Slide 24

Slide 24 text

n-grams >>> from nltk import ngrams >>> list(ngrams("pizza", 3)) 24

Slide 25

Slide 25 text

n-grams >>> from nltk import ngrams >>> list(ngrams("pizza", 3)) [('p', 'i', 'z'), ('i', 'z', 'z'), ('z', 'z', ‘a')] 25

Slide 26

Slide 26 text

n-grams >>> from nltk import ngrams >>> list(ngrams("pizza", 3)) [('p', 'i', 'z'), ('i', 'z', 'z'), ('z', 'z', ‘a')] character-based trigrams 26

Slide 27

Slide 27 text

n-grams >>> s = "The quick brown fox".split() >>> list(ngrams(s, 2)) 27

Slide 28

Slide 28 text

n-grams >>> s = "The quick brown fox".split() >>> list(ngrams(s, 2)) [('The', 'quick'), ('quick', 'brown'), ('brown', 'fox')] 28

Slide 29

Slide 29 text

n-grams >>> s = "The quick brown fox".split() >>> list(ngrams(s, 2)) [('The', 'quick'), ('quick', 'brown'), ('brown', 'fox')] word-based bigrams 29

Slide 30

Slide 30 text

From n-grams to Language Model 30

Slide 31

Slide 31 text

From n-grams to Language Model • Given a large dataset of text • Find all the n-grams • Compute probabilities, e.g. count bigrams:      31

Slide 32

Slide 32 text

Example: Predictive Text in Mobile 32

Slide 33

Slide 33 text

Example: Predictive Text in Mobile 33

Slide 34

Slide 34 text

34 most likely next word Example: Predictive Text in Mobile

Slide 35

Slide 35 text

Marco is …            35 Example: Predictive Text in Mobile

Slide 36

Slide 36 text

Marco is a good time to get the latest ﬂash player is required for video playback is unavailable right now because this video is not sure if you have a great day. 36 Example: Predictive Text in Mobile

Slide 37

Slide 37 text

Limitations of LM so far 37

Slide 38

Slide 38 text

Limitations of LM so far • P(word | full history) is too expensive • P(word | previous few words) is feasible • … Local context only! Lack of global context 38

Slide 39

Slide 39 text

QUICK INTRO TO NEURAL NETWORKS

Slide 40

Slide 40 text

Neural Networks 40

Slide 41

Slide 41 text

Neural Networks 41 x1 x2 h1 y1 h2 h3 Input layer Output layer Hidden layer(s)

Slide 42

Slide 42 text

Neurone Example 42

Slide 43

Slide 43 text

Neurone Example 43 x1 w2 w1 x2 ?

Slide 44

Slide 44 text

Neurone Example 44 x1 w2 w1 x2 ? F(w1x1 + w2x2)

Slide 45

Slide 45 text

Training the Network 45

Slide 46

Slide 46 text

Training the Network 46 • Random weight init • Run input through the network • Compute error  (loss function) • Use error to adjust weights  (gradient descent + back-propagation)

Slide 47

Slide 47 text

Slide 48

Slide 48 text

More on Training • Batch size • Iterations and Epochs • e.g. 1,000 data points, if batch size = 100 we need 10 iterations to complete 1 epoch 48

Slide 49

Slide 49 text

RECURRENT  NEURAL NETWORKS

Slide 50

Slide 50 text

Limitation of FFNN 50

Slide 51

Slide 51 text

Limitation of FFNN 51 Input and output of fixed size

Slide 52

Slide 52 text

Recurrent Neural Networks 52

Slide 53

Slide 53 text

Recurrent Neural Networks 53 http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Slide 54

Slide 54 text

Recurrent Neural Networks 54 http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Slide 55

Slide 55 text

Limitation of RNN 55

Slide 56

Slide 56 text

Limitation of RNN 56 “Vanishing gradient” Cannot “remember” what happened long ago

Slide 57

Slide 57 text

Long Short Term Memory 57

Slide 58

Slide 58 text

Long Short Term Memory 58 http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Slide 59

Slide 59 text

59 https://en.wikipedia.org/wiki/Long_short-term_memory

Slide 60

Slide 60 text

A BIT OF PRACTICE

Slide 61

Slide 61 text

Deep Learning in Python 61

Slide 62

Slide 62 text

Deep Learning in Python • Some NN support in scikit-learn • Many low-level frameworks: Theano, PyTorch, TensorFlow • … Keras! • Probably more 62

Slide 63

Slide 63 text

Keras 63

Slide 64

Slide 64 text

Keras • Simple, high-level API • Uses TensorFlow, Theano or CNTK as backend • Runs seamlessly on GPU • Easier to start with 64

Slide 65

Slide 65 text

LSTM Example 65

Slide 66

Slide 66 text

LSTM Example model = Sequential() model.add( LSTM( 128, input_shape=(maxlen,len(chars)) ) ) model.add(Dense(len(chars), activation='softmax')) 66 Define the network

Slide 67

Slide 67 text

LSTM Example optimizer = RMSprop(lr=0.01) model.compile(  loss='categorical_crossentropy',   optimizer=optimizer  ) 67 Configure the network

Slide 68

Slide 68 text

LSTM Example model.fit(x, y, batch_size=128, epochs=60, callbacks=[print_callback]) model.save(‘char_model.h5’) 68 Train the network

Slide 69

Slide 69 text

LSTM Example for i in range(output_size): ... preds = model.predict(x_pred, verbose=0)[0] next_index = sample(preds, diversity) next_char = indices_char[next_index] generated += next_char 69 Generate text

Slide 70

Slide 70 text

Slide 71

Slide 71 text

Sample Output 71

Slide 72

Slide 72 text

Sample Output are the glories it included. Now am I lrA to r ,d?ot praki ynhh kpHu ndst -h ahh umk,hrfheleuloluprffuamdaedospe aeooasak sh frxpaphrNumlpAryoaho (…) 72 Seed text After 1 epoch

Slide 73

Slide 73 text

Sample Output I go from thee: Bear me forthwitht wh, t che f uf ld,hhorfAs c c ff.h scfylhle, rigrya p s lee rmoy, tofhryg dd?ofr hl t y ftrhoodfe- r Py (…) 73 After ~5 epochs

Slide 74

Slide 74 text

Sample Output a wild-goose flies, Unclaim'd of any manwecddeelc uavekeMw gh whacelcwiiaeh xcacwiDac w fioarw ewoc h feicucra h,h, :ewh utiqitilweWy ha.h pc'hr, lagfh eIwislw ofiridete w laecheefb .ics,aicpaweteh fiw?egp t? (…) 74 After 20+ epochs

Slide 75

Slide 75 text

Tuning

Slide 76

Slide 76 text

Tuning • More layers? • More hidden nodes? or less? • More data? • A combination?

Slide 77

Slide 77 text

Wyr feirm hat. meancucd kreukk? , foremee shiciarplle. My, Bnyivlaunef sough bus: Wad vomietlhas nteos thun. lore orain, Ty thee I Boe, I rue. niat 77 Tuning After 1 epoch

Slide 78

Slide 78 text

to Dover, where inshipp'd Commit them to plean me than stand and the woul came the wife marn to the groat pery me Which that the senvose in the sen in the poor The death is and the calperits the should 78 Tuning Much later

Slide 79

Slide 79 text

FINAL REMARKS

Slide 80

Slide 80 text

A Couple of Tips

Slide 81

Slide 81 text

A Couple of Tips • You’ll need a GPU • Develop locally on very small dataset  then run on cloud on real data • At least 1M characters in input,  at least 20 epochs for training • model.save() !!!

Slide 82

Slide 82 text

Summary • Natural Language Generation is fun • Simple models vs. Neural Networks • Keras makes your life easier • A lot of trial-and-error!

Slide 83

Slide 83 text

THANK YOU @MarcoBonzanini speakerdeck.com/marcobonzanini

Slide 84

Slide 84 text

• Brandon Rohrer on "Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)": https://www.youtube.com/watch?v=WCUNPb-5EYI • Chris Olah on Understanding LSTM Networks:  http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • Andrej Karpathy on "The Unreasonable Effectiveness of Recurrent Neural Networks":  http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Pics: • Weather forecast icon: https://commons.wikimedia.org/wiki/File:Newspaper_weather_forecast_-_today_and_tomorrow.svg • Stack of papers icon: https://commons.wikimedia.org/wiki/File:Stack_of_papers_tied.svg • Document icon: https://commons.wikimedia.org/wiki/File:Document_icon_(the_Noun_Project_27904).svg • News icon: https://commons.wikimedia.org/wiki/File:PICOL_icon_News.svg • Cortana icon: https://upload.wikimedia.org/wikipedia/commons/thumb/8/89/Microsoft_Cortana_light.svg/1024px- Microsoft_Cortana_light.svg.png • Siri icon: https://commons.wikimedia.org/wiki/File:Siri_icon.svg • Google assistant icon: https://commons.wikimedia.org/wiki/File:Google_mic.svg Readings & Credits