Let the AI Do the Talk: Adventures with Natural Language Generation

Slide 1

Slide 1 text

Let the AI do the Talk Adventures with Natural Language Generation @MarcoBonzanini PyData Cambridge — 1st meet-up

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

A non-proﬁt that supports and promotes   world-class, innovative, open-source  scientiﬁc computing

Slide 4

Slide 4 text

https://numfocus.org/sponsored-projects

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

PyData London Conference 12-14 July 2019 @PyDataLondon

Slide 7

Slide 7 text

NATURAL LANGUAGE GENERATION

Slide 8

Slide 8 text

Natural Language Processing 8

Slide 9

Slide 9 text

Natural Language  Understanding Natural Language  Generation Natural Language Processing 9

Slide 10

Slide 10 text

Natural Language Generation 10

Slide 11

Slide 11 text

The task of generating  Natural Language from a machine representation 11 Natural Language Generation

Slide 12

Slide 12 text

Applications of NLG 12

Slide 13

Slide 13 text

Applications of NLG 13 Summary Generation

Slide 14

Slide 14 text

Applications of NLG 14 Weather Report Generation

Slide 15

Slide 15 text

Applications of NLG 15 Automatic Journalism

Slide 16

Slide 16 text

Applications of NLG 16 Virtual Assistants / Chatbots

Slide 17

Slide 17 text

LANGUAGE  MODELLING

Slide 18

Slide 18 text

Language Model 18

Slide 19

Slide 19 text

Language Model A model that gives you the probability of a sequence of words 19

Slide 20

Slide 20 text

Language Model P(I’m going home)  >  P(Home I’m going) 20

Slide 21

Slide 21 text

Language Model P(I’m going home)  >  P(I’m going house) 21

Slide 22

Slide 22 text

Infinite Monkey Theorem https://en.wikipedia.org/wiki/Infinite_monkey_theorem 22

Slide 23

Slide 23 text

Infinite Monkey Theorem from random import choice from string import printable def monkey_hits_keyboard(n): output = [choice(printable) for _ in range(n)] print("The monkey typed:") print(''.join(output)) 23

Slide 24

Slide 24 text

Infinite Monkey Theorem >>> monkey_hits_keyboard(30) The monkey typed: % a9AK^YKx OkVG)u3.cQ,31("!ac% >>> monkey_hits_keyboard(30) The monkey typed: fWE,ou)cxmV2IZ l}jSV'XxQ**9'| 24

Slide 25

Slide 25 text

n-grams 25

Slide 26

Slide 26 text

n-grams Sequence on N items from a given sample of text 26

Slide 27

Slide 27 text

n-grams >>> from nltk import ngrams >>> list(ngrams("pizza", 3)) 27

Slide 28

Slide 28 text

n-grams >>> from nltk import ngrams >>> list(ngrams("pizza", 3)) [('p', 'i', 'z'), ('i', 'z', 'z'), ('z', 'z', ‘a')] 28

Slide 29

Slide 29 text

n-grams >>> from nltk import ngrams >>> list(ngrams("pizza", 3)) [('p', 'i', 'z'), ('i', 'z', 'z'), ('z', 'z', ‘a')] character-based trigrams 29

Slide 30

Slide 30 text

n-grams >>> s = "The quick brown fox".split() >>> list(ngrams(s, 2)) 30

Slide 31

Slide 31 text

n-grams >>> s = "The quick brown fox".split() >>> list(ngrams(s, 2)) [('The', 'quick'), ('quick', 'brown'), ('brown', 'fox')] 31

Slide 32

Slide 32 text

n-grams >>> s = "The quick brown fox".split() >>> list(ngrams(s, 2)) [('The', 'quick'), ('quick', 'brown'), ('brown', 'fox')] word-based bigrams 32

Slide 33

Slide 33 text

From n-grams to Language Model 33

Slide 34

Slide 34 text

From n-grams to Language Model • Given a large dataset of text • Find all the n-grams • Compute probabilities, e.g. count bigrams:      34

Slide 35

Slide 35 text

Example: Predictive Text in Mobile 35

Slide 36

Slide 36 text

Example: Predictive Text in Mobile 36

Slide 37

Slide 37 text

37 most likely next word Example: Predictive Text in Mobile

Slide 38

Slide 38 text

Marco is …            38 Example: Predictive Text in Mobile

Slide 39

Slide 39 text

Marco is a good time to get the latest ﬂash player is required for video playback is unavailable right now because this video is not sure if you have a great day. 39 Example: Predictive Text in Mobile

Slide 40

Slide 40 text

Limitations of LM so far 40

Slide 41

Slide 41 text

Limitations of LM so far • P(word | full history) is too expensive • P(word | previous few words) is feasible • … Local context only! Lack of global context 41

Slide 42

Slide 42 text

QUICK INTRO TO NEURAL NETWORKS

Slide 43

Slide 43 text

Neural Networks 43

Slide 44

Slide 44 text

Neural Networks 44 x1 x2 h1 y1 h2 h2

Slide 45

Slide 45 text

Neural Networks 45 x1 x2 h1 y1 h2 h3 Input layer Output layer Hidden layer(s)

Slide 46

Slide 46 text

Neurone Example 46

Slide 47

Slide 47 text

Neurone Example 47 x1 w2 w1 x2 ?

Slide 48

Slide 48 text

Neurone Example 48 x1 w2 w1 x2 ? F(w1x1 + w2x2)

Slide 49

Slide 49 text

Training the Network 49

Slide 50

Slide 50 text

Training the Network 50 • Random weight init • Run input through the network • Compute error  (loss function) • Use error to adjust weights  (gradient descent + back-propagation)

Slide 51

Slide 51 text

Slide 52

Slide 52 text

More on Training • Batch size • Iterations and Epochs • e.g. 1,000 data points, if batch size = 100 we need 10 iterations to complete 1 epoch 52

Slide 53

Slide 53 text

RECURRENT  NEURAL NETWORKS

Slide 54

Slide 54 text

Limitation of FFNN 54

Slide 55

Slide 55 text

Limitation of FFNN 55 Input and output of fixed size

Slide 56

Slide 56 text

Recurrent Neural Networks 56

Slide 57

Slide 57 text

Recurrent Neural Networks 57 http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Slide 58

Slide 58 text

Recurrent Neural Networks 58 http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Slide 59

Slide 59 text

Limitation of RNN 59

Slide 60

Slide 60 text

Limitation of RNN 60 “Vanishing gradient” Cannot “remember” what happened long ago

Slide 61

Slide 61 text

Long Short Term Memory 61

Slide 62

Slide 62 text

Long Short Term Memory 62 http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Slide 63

Slide 63 text

63 https://en.wikipedia.org/wiki/Long_short-term_memory

Slide 64

Slide 64 text

A BIT OF PRACTICE

Slide 65

Slide 65 text

Deep Learning in Python 65

Slide 66

Slide 66 text

Deep Learning in Python • Some NN support in scikit-learn • Many low-level frameworks: Theano, PyTorch, TensorFlow • … Keras! • Probably more 66

Slide 67

Slide 67 text

Keras 67

Slide 68

Slide 68 text

Keras • Simple, high-level API • Uses TensorFlow, Theano or CNTK as backend • Runs seamlessly on GPU • Easier to start with 68

Slide 69

Slide 69 text

LSTM Example 69

Slide 70

Slide 70 text

LSTM Example model = Sequential() model.add( LSTM( 128, input_shape=(maxlen,len(chars)) ) ) model.add(Dense(len(chars), activation='softmax')) 70 Define the network

Slide 71

Slide 71 text

LSTM Example optimizer = RMSprop(lr=0.01) model.compile(  loss='categorical_crossentropy',   optimizer=optimizer  ) 71 Configure the network

Slide 72

Slide 72 text

LSTM Example model.fit(x, y, batch_size=128, epochs=60, callbacks=[print_callback]) model.save(‘char_model.h5’) 72 Train the network

Slide 73

Slide 73 text

LSTM Example for i in range(output_size): ... preds = model.predict(x_pred, verbose=0)[0] next_index = sample(preds, diversity) next_char = indices_char[next_index] generated += next_char 73 Generate text

Slide 74

Slide 74 text

Slide 75

Slide 75 text

Sample Output 75

Slide 76

Slide 76 text

Sample Output are the glories it included. Now am I lrA to r ,d?ot praki ynhh kpHu ndst -h ahh umk,hrfheleuloluprffuamdaedospe aeooasak sh frxpaphrNumlpAryoaho (…) 76 Seed text After 1 epoch

Slide 77

Slide 77 text

Sample Output I go from thee: Bear me forthwitht wh, t che f uf ld,hhorfAs c c ff.h scfylhle, rigrya p s lee rmoy, tofhryg dd?ofr hl t y ftrhoodfe- r Py (…) 77 After ~5 epochs

Slide 78

Slide 78 text

Sample Output a wild-goose flies, Unclaim'd of any manwecddeelc uavekeMw gh whacelcwiiaeh xcacwiDac w fioarw ewoc h feicucra h,h, :ewh utiqitilweWy ha.h pc'hr, lagfh eIwislw ofiridete w laecheefb .ics,aicpaweteh fiw?egp t? (…) 78 After 20+ epochs

Slide 79

Slide 79 text

Tuning

Slide 80

Slide 80 text

Tuning • More layers? • More hidden nodes? or less? • More data? • A combination?

Slide 81

Slide 81 text

Wyr feirm hat. meancucd kreukk? , foremee shiciarplle. My, Bnyivlaunef sough bus: Wad vomietlhas nteos thun. lore orain, Ty thee I Boe, I rue. niat 81 Tuning After 1 epoch

Slide 82

Slide 82 text

to Dover, where inshipp'd Commit them to plean me than stand and the woul came the wife marn to the groat pery me Which that the senvose in the sen in the poor The death is and the calperits the should 82 Tuning Much later

Slide 83

Slide 83 text

FINAL REMARKS

Slide 84

Slide 84 text

A Couple of Tips

Slide 85

Slide 85 text

A Couple of Tips • You’ll need a GPU • Develop locally on very small dataset  then run on cloud on real data • At least 1M characters in input,  at least 20 epochs for training • model.save() !!!

Slide 86

Slide 86 text

Summary • Natural Language Generation is fun • Simple models vs. Neural Networks • Keras makes your life easier • A lot of trial-and-error!

Slide 87

Slide 87 text

THANK YOU @MarcoBonzanini speakerdeck.com/marcobonzanini GitHub.com/bonzanini marcobonzanini.com

Slide 88

Slide 88 text

• Brandon Rohrer on "Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)": https://www.youtube.com/watch?v=WCUNPb-5EYI • Chris Olah on Understanding LSTM Networks:  http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • Andrej Karpathy on "The Unreasonable Effectiveness of Recurrent Neural Networks":  http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Pics: • Weather forecast icon: https://commons.wikimedia.org/wiki/File:Newspaper_weather_forecast_-_today_and_tomorrow.svg • Stack of papers icon: https://commons.wikimedia.org/wiki/File:Stack_of_papers_tied.svg • Document icon: https://commons.wikimedia.org/wiki/File:Document_icon_(the_Noun_Project_27904).svg • News icon: https://commons.wikimedia.org/wiki/File:PICOL_icon_News.svg • Cortana icon: https://upload.wikimedia.org/wikipedia/commons/thumb/8/89/Microsoft_Cortana_light.svg/1024px- Microsoft_Cortana_light.svg.png • Siri icon: https://commons.wikimedia.org/wiki/File:Siri_icon.svg • Google assistant icon: https://commons.wikimedia.org/wiki/File:Google_mic.svg Readings & Credits