[Paper Reading] Hierarchical Neural Story Generation

Slide 1

Slide 1 text

Hierarchical Neural Story Generation - Angela Fan, Mike Lewis, Yann Dauphin - Facebook AI @ACL2018 Presented by Van Phu Quang Huy 2018/08/19 Paper Reading Festival Summer 2018 1

Slide 2

Slide 2 text

Overview (1/2) Task: Story Generation - creative system that can build coherent and ﬂuent passages of text about a topic Example of a story: A light breeze swept the ground, and carried with it still the distant scents of dust and time-worn stone. The Warrior led the way, heaving her mass of armour and muscle over the uneven terrain. She soon crested the last of the low embankments, which still bore the unmistakable ﬁngerprints of haste and fear. She lifted herself up onto the top the rise, and looked out at the scene before her. [...] Paper Reading Festival Summer 2018 2

Slide 3

Slide 3 text

Overview (2/2) Challenges: • Must remain thematically consistent across the complete document requires modeling very long range dependencies • Requires creativity, need a high level plot Ideas: • First generates a sentence called the prompt describing the topic for the story • Then generates the story conditions on the prompt Paper Reading Festival Summer 2018 3

Slide 4

Slide 4 text

Contributions (1/2) • Make a dataset (including prompts and stories) by collecting from Reddit's WRITINGPROMPTS forum. • Introduce a fusion mechanism (a seq2seq model trained on top of an pretrained seq2seq) to improve the relevance of the generated story to its prompt Paper Reading Festival Summer 2018 4

Slide 5

Slide 5 text

Contributions (2/2) • Introduce a gated self-attention mechanism on top of a convolutional architecture to improve efﬁciency of modeling long documents • Introduce new evaluation metrics for story generation Paper Reading Festival Summer 2018 5

Slide 6

Slide 6 text

Writing Prompts Dataset • Reddit's WRTTINGPROMPTS forum (www.reddit.com/r/WritingPrompts/) • Users write story premises or prompts, other users respond • Each prompt can have multiple story responses • Collecting data and preprocessing • Scraped 3 years of data • Removed automated bot posts, deleted posts, announcements, short stories • Used NLTK for tokenization Paper Reading Festival Summer 2018 6

Slide 7

Slide 7 text

Approach: Hierarchical Generation Paper Reading Festival Summer 2018 7

Slide 8

Slide 8 text

1. Prompt Generation • Use a convolutional language model from Dauphin et al. (2017) Right ﬁgure: illustrates a Gated Convolutional Neural Network (GCNN) using Gated Linear Units (GLU) • Why CNN instead of RNN? Because CNN allows parallelization Paper Reading Festival Summer 2018 8

Slide 9

Slide 9 text

2. Story Generation conditions on Prompt • Based on the convolutional seq2seq model of Gehring et al. (2017) (right ﬁgure) • Improve with: • Gated Multi-Scale Self-attention • Model Fusion Paper Reading Festival Summer 2018 9

Slide 10

Slide 10 text

Modeling Unbounded Context with Gated Multi-Scale Self-attention Improve from self-attention mechanism with: • Multi-scale attention • Gated attention Paper Reading Festival Summer 2018 10

Slide 11

Slide 11 text

Reference: Attention (from Vaswani et al. (2017)) Attention function can be described as mapping a query and a set of key-value pairs to an output • : query, : key, : value, : scaling factor Paper Reading Festival Summer 2018 11

Slide 12

Slide 12 text

Reference: Self-attention Example: self-attention distribution for the word “it” (Image from https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html) Paper Reading Festival Summer 2018 12

Slide 13

Slide 13 text

Gated Attention Use multi-head attention similar to Vaswani et al. (2017) (right ﬁgure) to allow each head to attend to information at different positions. However, queries, keys, values not given by linear projections but by gated deep neural nets with Gated Linear Unit (GLU). Paper Reading Festival Summer 2018 13

Slide 14

Slide 14 text

Multi-scale Attention • Each head using a separate downsampling function in order to attend to different information Paper Reading Festival Summer 2018 14

Slide 15

Slide 15 text

• : contains the hidden states up to time at layer • : gated downsampling networks ( : query, : key, : value ) Paper Reading Festival Summer 2018 15

Slide 16

Slide 16 text

Improving Relevance to Input Prompt with Model Fusion • Train a seq2seq model that has access to the hidden states of a pretrained seq2seq model (can be seen as a type of boosting or residual learning that allows the second model to focus on what the ﬁrst model faled to learn, such as conditioning on the prompt) Paper Reading Festival Summer 2018 16

Slide 17

Slide 17 text

Generation Use top-k random sampling instead of beam search • At each timestep, generate the probability of each word, then randomly sample from most likely candidates from this distribution • (Beam search tends to produce common phrases and repetitive text from the training set) Paper Reading Festival Summer 2018 17

Slide 18

Slide 18 text

Experimental Setup Baselines 1. Gated CNN language model w/ and w/o self-attention 2. seq2seq: LSTMs, Conv seq2seq, Conv seq2seq w/ decoder self-attention 3. Ensemble: of 2 Conv seq2seq w/ self-attention 4. KNN: ﬁnds the closest prompt in the training set for each prompt in the test set Paper Reading Festival Summer 2018 18

Slide 19

Slide 19 text

Evaluation • Automatic evaluation • Perplexity • Prompt ranking accuracy • Human evaluation Paper Reading Festival Summer 2018 19

Slide 20

Slide 20 text

Results Paper Reading Festival Summer 2018 20

Slide 21

Slide 21 text

Generated by proposed model (1/2) Prompt: Aliens start abducting humans Generated story: It has been two weeks, and the last of my kind has gone. It is only a matter of time until there will be nothing left. I’m not sure what the hell is going on... I can’t think. I can hear a distant scream. I think of a strange, alien sound. I try to speak, but am interrupted by something, something that feels like a drum, I ca not tell. I mean I’m just a little bit older than an average human. But I can, and I can feel the vibrations . I hear the sound of a ship approaching. The ground quakes at the force of the impact, and a loud explosion shatters the silence. Paper Reading Festival Summer 2018 21

Slide 22

Slide 22 text

Generated by proposed model (2/2) Prompt: The scientists have discovered something terrible. Generated story: The scientist stood there, a little dazed as he stared. “What is it?” He asked. “This...this...Thing...This is a virus. A chemical that can destroy entire planet and it is a very small, complex, chemical that could destroy any planet.” The scientist replied. His lab assistant looked down at the tablet. “I’ve just discovered it. I can’t believe it. It looks like it’s made of some sort of chemical that’s very dangerous.” “ Well, there’s a virus on the ground. It’s very effective...” “ I can’t believe what it is.” he said, pointing to the scientist . “ We don’t know what this thing is. We haven’t seen anything like it . We can’t even see anything like this. ” Dr. Jones stared at the scientist for a moment. “What do you mean what does it do ?” “It...It ’s a monster.” Paper Reading Festival Summer 2018 22

Slide 23

Slide 23 text

Generated by baseline model (1/2) I’ve always been the curious type. Growing up in the ghetto, I’ve been around for more than a year now. I still haven’t heard it on my own. I’m not sure if I’m a good person. But I’m not the only one. I’ve been to the hospital, but it’s not my ﬁrst time. I’ve been to my psychiatrist’s journals before. “Hey, are you ... are you okay?” I asked. “Yes” I said. “What’s wrong?” she replied, “I was just ... Doing something.” She said, “I’ve never been a bad person.” Paper Reading Festival Summer 2018 23

Slide 24

Slide 24 text

Generated by baseline model (2/2) The man was an accountant. He had to be. He had to be the next president. I looked back over the top and saw that his wife was crying in the kitchen. I looked at the clock. It seemed to be coming slower, but I knew if I did it would not be long before I was in my own home. I wasn’t sure. I had a hard time ﬁnding the right words to say. I was about to leave when he suddenly became angry and began talking to me. “Hello, sir, I’m John. What is your name?” “My name is Manuel and I’m a journalist.” I said Paper Reading Festival Summer 2018 24

Slide 25

Slide 25 text

Perplexity (1/2) Paper Reading Festival Summer 2018 25

Slide 26

Slide 26 text

Perplexity (2/2) Paper Reading Festival Summer 2018 26

Slide 27

Slide 27 text

Prompt ranking Paper Reading Festival Summer 2018 27

Slide 28

Slide 28 text

Human evaluation (1/2) Paper Reading Festival Summer 2018 28

Slide 29

Slide 29 text

Human evaluation (2/2) Paper Reading Festival Summer 2018 29

Slide 30

Slide 30 text

Fusion model evaluation Paper Reading Festival Summer 2018 30

Slide 31

Slide 31 text

Discussion • Proposed model is capable of generating unique text without copying directly from training set compared to the baseline model • Limitation: • Random sampling can produce errors: e.g. can't is tokenized to ca - n't and the model occasionally produces the ﬁrst token but misses the second • Repetition: generates similar text multiple times because the model focuses frequently on what it has recently produced • In generation of prompts: prompts are fairly generic compaird to human prompts, e.g. many prompts start with the man Paper Reading Festival Summer 2018 31

Slide 32

Slide 32 text

Conclusion • new large scale dataset for hierarchical story generation • evaluation metrics for story writing • models to improve generation coherence and relationship with desired premise • data+code: github.com/pytorch/fairseq Paper Reading Festival Summer 2018 32

Slide 33

Slide 33 text

Thank you! Paper Reading Festival Summer 2018 33