Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to Natural Language Generation in Python

Marco Bonzanini
September 27, 2018

An Introduction to Natural Language Generation in Python

Presented at the London Python meet-up, September 2018:
https://www.meetup.com/LondonPython/events/254408773/

Title:
Let the AI Do the Talk: Adventures with Natural Language Generation

Abstract:
Recent advances in Artificial Intelligence have shown how computers can compete with humans in a variety of mundane tasks, but what happens when creativity is required?

This talk introduces the concept of Natural Language Generation, the task of automatically generating text, for examples articles on a particular topic, poems that follow a particular style, or speech transcripts that express some attitude. Specifically, we'll discuss the case for Recurrent Neural Networks, a family of algorithms that can be trained on sequential data, and how they improve on traditional language models.

The talk is for beginners, we'll focus more on the intuitions behind the algorithms and their practical implications, and less on the mathematical details. Practical examples with Python will showcase Keras, a library to quickly prototype deep learning architectures.

Marco Bonzanini

September 27, 2018
Tweet

More Decks by Marco Bonzanini

Other Decks in Programming

Transcript

  1. Let the AI do the Talk
    Adventures with Natural Language Generation
    @MarcoBonzanini
    London Python meet-up // September 2018

    View full-size slide

  2. • Sept 2016: Intro to NLP
    • Sept 2017: Intro to Word Embeddings
    • Sept 2018: Intro to NLG
    • Sept 2019: ???

    View full-size slide

  3. NATURAL LANGUAGE
    GENERATION

    View full-size slide

  4. Natural Language Processing
    5

    View full-size slide

  5. Natural Language

    Understanding
    Natural Language

    Generation
    Natural Language Processing
    6

    View full-size slide

  6. Natural Language Generation
    7

    View full-size slide

  7. The task of generating

    Natural Language
    from a machine representation
    8
    Natural Language Generation

    View full-size slide

  8. Applications of NLG
    9

    View full-size slide

  9. Applications of NLG
    10
    Summary Generation

    View full-size slide

  10. Applications of NLG
    11
    Weather Report Generation

    View full-size slide

  11. Applications of NLG
    12
    Automatic Journalism

    View full-size slide

  12. Applications of NLG
    13
    Virtual Assistants / Chatbots

    View full-size slide

  13. LANGUAGE

    MODELLING

    View full-size slide

  14. Language Model
    15

    View full-size slide

  15. Language Model
    A model that gives you
    the probability of
    a sequence of words
    16

    View full-size slide

  16. Language Model
    P(I’m going home)

    >

    P(Home I’m going)
    17

    View full-size slide

  17. Language Model
    P(I’m going home)

    >

    P(I’m going house)
    18

    View full-size slide

  18. Infinite Monkey Theorem
    https://en.wikipedia.org/wiki/Infinite_monkey_theorem
    19

    View full-size slide

  19. Infinite Monkey Theorem
    from random import choice
    from string import printable
    def monkey_hits_keyboard(n):
    output = [choice(printable) for _ in range(n)]
    print("The monkey typed:")
    print(''.join(output))
    20

    View full-size slide

  20. Infinite Monkey Theorem
    >>> monkey_hits_keyboard(30)
    The monkey typed:
    %
    a9AK^YKx OkVG)u3.cQ,31("!ac%
    >>> monkey_hits_keyboard(30)
    The monkey typed:
    fWE,ou)cxmV2IZ l}jSV'XxQ**9'|
    21

    View full-size slide

  21. n-grams
    Sequence on N items
    from a given sample of text
    23

    View full-size slide

  22. n-grams
    >>> from nltk import ngrams
    >>> list(ngrams("pizza", 3))
    24

    View full-size slide

  23. n-grams
    >>> from nltk import ngrams
    >>> list(ngrams("pizza", 3))
    [('p', 'i', 'z'), ('i', 'z', 'z'),
    ('z', 'z', ‘a')]
    25

    View full-size slide

  24. n-grams
    >>> from nltk import ngrams
    >>> list(ngrams("pizza", 3))
    [('p', 'i', 'z'), ('i', 'z', 'z'),
    ('z', 'z', ‘a')]
    character-based trigrams
    26

    View full-size slide

  25. n-grams
    >>> s = "The quick brown fox".split()
    >>> list(ngrams(s, 2))
    27

    View full-size slide

  26. n-grams
    >>> s = "The quick brown fox".split()
    >>> list(ngrams(s, 2))
    [('The', 'quick'), ('quick', 'brown'),
    ('brown', 'fox')]
    28

    View full-size slide

  27. n-grams
    >>> s = "The quick brown fox".split()
    >>> list(ngrams(s, 2))
    [('The', 'quick'), ('quick', 'brown'),
    ('brown', 'fox')]
    word-based bigrams
    29

    View full-size slide

  28. From n-grams to Language Model
    30

    View full-size slide

  29. From n-grams to Language Model
    • Given a large dataset of text
    • Find all the n-grams
    • Compute probabilities, e.g. count bigrams:



    31

    View full-size slide

  30. Example: Predictive Text in Mobile
    32

    View full-size slide

  31. Example: Predictive Text in Mobile
    33

    View full-size slide

  32. 34
    most likely
    next word
    Example: Predictive Text in Mobile

    View full-size slide

  33. Marco is …






    35
    Example: Predictive Text in Mobile

    View full-size slide

  34. Marco is a good time to
    get the latest flash player
    is required for video
    playback is unavailable
    right now because this
    video is not sure if you
    have a great day.
    36
    Example: Predictive Text in Mobile

    View full-size slide

  35. Limitations of LM so far
    37

    View full-size slide

  36. Limitations of LM so far
    • P(word | full history) is too expensive
    • P(word | previous few words) is feasible
    • … Local context only! Lack of global context
    38

    View full-size slide

  37. QUICK INTRO TO
    NEURAL NETWORKS

    View full-size slide

  38. Neural Networks
    40

    View full-size slide

  39. Neural Networks
    41
    x1
    x2
    h1
    y1
    h2
    h2

    View full-size slide

  40. Neural Networks
    42
    x1
    x2
    h1
    y1
    h2
    h3
    Input layer
    Output layer
    Hidden layer(s)

    View full-size slide

  41. Neurone Example
    43

    View full-size slide

  42. Neurone Example
    44
    x1
    w2
    w1
    x2
    ?

    View full-size slide

  43. Neurone Example
    45
    x1
    w2
    w1
    x2
    ?
    F(w1x1 + w2x2)

    View full-size slide

  44. Training the Network
    46

    View full-size slide

  45. Training the Network
    47
    • Random weight init
    • Run input through the network
    • Compute error

    (loss function)
    • Use error to adjust weights

    (gradient descent + back-propagation)

    View full-size slide

  46. More on Training
    48

    View full-size slide

  47. More on Training
    • Batch size
    • Iterations and Epochs
    • e.g. 1,000 data points, if batch size = 100
    we need 10 iterations to complete 1 epoch
    49

    View full-size slide

  48. RECURRENT

    NEURAL NETWORKS

    View full-size slide

  49. Limitation of FFNN
    51

    View full-size slide

  50. Limitation of FFNN
    52
    Input and output
    of fixed size

    View full-size slide

  51. Recurrent Neural Networks
    53

    View full-size slide

  52. Recurrent Neural Networks
    54
    http://colah.github.io/posts/2015-08-Understanding-LSTMs/

    View full-size slide

  53. Recurrent Neural Networks
    55
    http://colah.github.io/posts/2015-08-Understanding-LSTMs/

    View full-size slide

  54. Limitation of RNN
    56

    View full-size slide

  55. Limitation of RNN
    57
    “Vanishing gradient”
    Cannot “remember”
    what happened long ago

    View full-size slide

  56. Long Short Term Memory
    58

    View full-size slide

  57. Long Short Term Memory
    59
    http://colah.github.io/posts/2015-08-Understanding-LSTMs/

    View full-size slide

  58. 60
    https://en.wikipedia.org/wiki/Long_short-term_memory

    View full-size slide

  59. A BIT OF
    PRACTICE

    View full-size slide

  60. Deep Learning in Python
    62

    View full-size slide

  61. Deep Learning in Python
    • Some NN support in scikit-learn
    • Many low-level frameworks: Theano,
    PyTorch, TensorFlow
    • … Keras!
    • Probably more
    63

    View full-size slide

  62. Keras
    • Simple, high-level API
    • Uses TensorFlow, Theano or CNTK as backend
    • Runs seamlessly on GPU
    • Easier to start with
    65

    View full-size slide

  63. LSTM Example
    66

    View full-size slide

  64. LSTM Example
    model = Sequential()
    model.add(
    LSTM(
    128,
    input_shape=(maxlen,len(chars))
    )
    )
    model.add(Dense(len(chars), activation='softmax'))
    67
    Define the network

    View full-size slide

  65. LSTM Example
    optimizer = RMSprop(lr=0.01)
    model.compile(

    loss='categorical_crossentropy', 

    optimizer=optimizer

    )
    68
    Configure the network

    View full-size slide

  66. LSTM Example
    model.fit(x, y,
    batch_size=128,
    epochs=60,
    callbacks=[print_callback])
    model.save(‘char_model.h5’)
    69
    Train the network

    View full-size slide

  67. LSTM Example
    for i in range(output_size):
    ...
    preds = model.predict(x_pred, verbose=0)[0]
    next_index = sample(preds, diversity)
    next_char = indices_char[next_index]
    generated += next_char
    70
    Generate text

    View full-size slide

  68. LSTM Example
    for i in range(output_size):
    ...
    preds = model.predict(x_pred, verbose=0)[0]
    next_index = sample(preds, diversity)
    next_char = indices_char[next_index]
    generated += next_char
    71
    Seed text

    View full-size slide

  69. Sample Output
    72

    View full-size slide

  70. Sample Output
    are the glories it included.
    Now am I lrA to r ,d?ot praki ynhh
    kpHu ndst -h ahh
    umk,hrfheleuloluprffuamdaedospe
    aeooasak sh frxpaphrNumlpAryoaho (…)
    73
    Seed text After 1 epoch

    View full-size slide

  71. Sample Output
    I go from thee:
    Bear me forthwitht wh, t
    che f uf ld,hhorfAs c c ff.h
    scfylhle, rigrya p s lee
    rmoy, tofhryg dd?ofr hl t y
    ftrhoodfe- r Py (…)
    74
    After ~5 epochs

    View full-size slide

  72. Sample Output
    a wild-goose flies,
    Unclaim'd of any manwecddeelc uavekeMw
    gh whacelcwiiaeh xcacwiDac w
    fioarw ewoc h feicucra
    h,h, :ewh utiqitilweWy ha.h pc'hr,
    lagfh
    eIwislw ofiridete w
    laecheefb .ics,aicpaweteh fiw?egp t? (…)
    75
    After 20+ epochs

    View full-size slide

  73. Tuning
    • More layers?
    • More hidden nodes? or less?
    • More data?
    • A combination?

    View full-size slide

  74. Wyr feirm hat. meancucd kreukk?
    , foremee shiciarplle. My,
    Bnyivlaunef sough bus:
    Wad vomietlhas nteos thun. lore
    orain, Ty thee I Boe,
    I rue. niat
    78
    Tuning
    After 1 epoch

    View full-size slide

  75. Second Lord:
    They would be ruled after this chamber, and
    my fair nues begun out of the fact, to be
    conveyed,
    Whose noble souls I'll have the heart of the
    wars.
    Clown:
    Come, sir, I will make did behold your worship.
    79
    Tuning
    Much later
    http://karpathy.github.io/2015/05/21/rnn-effectiveness/

    View full-size slide

  76. FINAL REMARKS

    View full-size slide

  77. A Couple of Tips

    View full-size slide

  78. A Couple of Tips
    • You’ll need a GPU
    • Develop locally on very small dataset

    then run on cloud on real data
    • At least 1M characters in input,

    at least 20 epochs for training
    • model.save() !!!

    View full-size slide

  79. Summary
    • Natural Language Generation is fun
    • Simple models vs. Neural Networks
    • Keras makes your life easier
    • A lot of trial-and-error!

    View full-size slide

  80. THANK YOU
    @MarcoBonzanini
    speakerdeck.com/marcobonzanini
    GitHub.com/bonzanini
    marcobonzanini.com

    View full-size slide

  81. • Brandon Rohrer on "Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)":
    https://www.youtube.com/watch?v=WCUNPb-5EYI
    • Chris Olah on Understanding LSTM Networks:

    http://colah.github.io/posts/2015-08-Understanding-LSTMs/
    • Andrej Karpathy on "The Unreasonable Effectiveness of Recurrent Neural Networks":

    http://karpathy.github.io/2015/05/21/rnn-effectiveness/
    Pics:
    • Weather forecast icon: https://commons.wikimedia.org/wiki/File:Newspaper_weather_forecast_-_today_and_tomorrow.svg
    • Stack of papers icon: https://commons.wikimedia.org/wiki/File:Stack_of_papers_tied.svg
    • Document icon: https://commons.wikimedia.org/wiki/File:Document_icon_(the_Noun_Project_27904).svg
    • News icon: https://commons.wikimedia.org/wiki/File:PICOL_icon_News.svg
    • Cortana icon: https://upload.wikimedia.org/wikipedia/commons/thumb/8/89/Microsoft_Cortana_light.svg/1024px-
    Microsoft_Cortana_light.svg.png
    • Siri icon: https://commons.wikimedia.org/wiki/File:Siri_icon.svg
    • Google assistant icon: https://commons.wikimedia.org/wiki/File:Google_mic.svg
    Readings & Credits

    View full-size slide