[Paper Reading] Hierarchical Neural Story Generation

Hierarchical Neural Story Generation - Angela Fan, Mike Lewis, Yann
Dauphin - Facebook AI @ACL2018 Presented by Van Phu Quang Huy 2018/08/19 Paper Reading Festival Summer 2018 1

Overview (1/2) Task: Story Generation - creative system that can
build coherent and ﬂuent passages of text about a topic Example of a story: A light breeze swept the ground, and carried with it still the distant scents of dust and time-worn stone. The Warrior led the way, heaving her mass of armour and muscle over the uneven terrain. She soon crested the last of the low embankments, which still bore the unmistakable ﬁngerprints of haste and fear. She lifted herself up onto the top the rise, and looked out at the scene before her. [...] Paper Reading Festival Summer 2018 2

Overview (2/2) Challenges: • Must remain thematically consistent across the
complete document requires modeling very long range dependencies • Requires creativity, need a high level plot Ideas: • First generates a sentence called the prompt describing the topic for the story • Then generates the story conditions on the prompt Paper Reading Festival Summer 2018 3

Contributions (1/2) • Make a dataset (including prompts and stories)
by collecting from Reddit's WRITINGPROMPTS forum. • Introduce a fusion mechanism (a seq2seq model trained on top of an pretrained seq2seq) to improve the relevance of the generated story to its prompt Paper Reading Festival Summer 2018 4

Contributions (2/2) • Introduce a gated self-attention mechanism on top
of a convolutional architecture to improve efﬁciency of modeling long documents • Introduce new evaluation metrics for story generation Paper Reading Festival Summer 2018 5

Writing Prompts Dataset • Reddit's WRTTINGPROMPTS forum (www.reddit.com/r/WritingPrompts/) • Users
write story premises or prompts, other users respond • Each prompt can have multiple story responses • Collecting data and preprocessing • Scraped 3 years of data • Removed automated bot posts, deleted posts, announcements, short stories • Used NLTK for tokenization Paper Reading Festival Summer 2018 6

Approach: Hierarchical Generation Paper Reading Festival Summer 2018 7

1. Prompt Generation • Use a convolutional language model from
Dauphin et al. (2017) Right ﬁgure: illustrates a Gated Convolutional Neural Network (GCNN) using Gated Linear Units (GLU) • Why CNN instead of RNN? Because CNN allows parallelization Paper Reading Festival Summer 2018 8

2. Story Generation conditions on Prompt • Based on the
convolutional seq2seq model of Gehring et al. (2017) (right ﬁgure) • Improve with: • Gated Multi-Scale Self-attention • Model Fusion Paper Reading Festival Summer 2018 9

Modeling Unbounded Context with Gated Multi-Scale Self-attention Improve from self-attention
mechanism with: • Multi-scale attention • Gated attention Paper Reading Festival Summer 2018 10

Reference: Attention (from Vaswani et al. (2017)) Attention function can
be described as mapping a query and a set of key-value pairs to an output • : query, : key, : value, : scaling factor Paper Reading Festival Summer 2018 11

Reference: Self-attention Example: self-attention distribution for the word “it” (Image
from https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html) Paper Reading Festival Summer 2018 12

Gated Attention Use multi-head attention similar to Vaswani et al.
(2017) (right ﬁgure) to allow each head to attend to information at different positions. However, queries, keys, values not given by linear projections but by gated deep neural nets with Gated Linear Unit (GLU). Paper Reading Festival Summer 2018 13

Multi-scale Attention • Each head using a separate downsampling function
in order to attend to different information Paper Reading Festival Summer 2018 14

• : contains the hidden states up to time at
layer • : gated downsampling networks ( : query, : key, : value ) Paper Reading Festival Summer 2018 15

Improving Relevance to Input Prompt with Model Fusion • Train
a seq2seq model that has access to the hidden states of a pretrained seq2seq model (can be seen as a type of boosting or residual learning that allows the second model to focus on what the ﬁrst model faled to learn, such as conditioning on the prompt) Paper Reading Festival Summer 2018 16

Generation Use top-k random sampling instead of beam search •
At each timestep, generate the probability of each word, then randomly sample from most likely candidates from this distribution • (Beam search tends to produce common phrases and repetitive text from the training set) Paper Reading Festival Summer 2018 17

Experimental Setup Baselines 1. Gated CNN language model w/ and
w/o self-attention 2. seq2seq: LSTMs, Conv seq2seq, Conv seq2seq w/ decoder self-attention 3. Ensemble: of 2 Conv seq2seq w/ self-attention 4. KNN: ﬁnds the closest prompt in the training set for each prompt in the test set Paper Reading Festival Summer 2018 18

Evaluation • Automatic evaluation • Perplexity • Prompt ranking accuracy
• Human evaluation Paper Reading Festival Summer 2018 19

Results Paper Reading Festival Summer 2018 20

Generated by proposed model (1/2) Prompt: Aliens start abducting humans
Generated story: It has been two weeks, and the last of my kind has gone. It is only a matter of time until there will be nothing left. I’m not sure what the hell is going on... I can’t think. I can hear a distant scream. I think of a strange, alien sound. I try to speak, but am interrupted by something, something that feels like a drum, I ca not tell. I mean I’m just a little bit older than an average human. But I can, and I can feel the vibrations . I hear the sound of a ship approaching. The ground quakes at the force of the impact, and a loud explosion shatters the silence. Paper Reading Festival Summer 2018 21

Generated by proposed model (2/2) Prompt: The scientists have discovered
something terrible. Generated story: The scientist stood there, a little dazed as he stared. “What is it?” He asked. “This...this...Thing...This is a virus. A chemical that can destroy entire planet and it is a very small, complex, chemical that could destroy any planet.” The scientist replied. His lab assistant looked down at the tablet. “I’ve just discovered it. I can’t believe it. It looks like it’s made of some sort of chemical that’s very dangerous.” “ Well, there’s a virus on the ground. It’s very effective...” “ I can’t believe what it is.” he said, pointing to the scientist . “ We don’t know what this thing is. We haven’t seen anything like it . We can’t even see anything like this. ” Dr. Jones stared at the scientist for a moment. “What do you mean what does it do ?” “It...It ’s a monster.” Paper Reading Festival Summer 2018 22

Generated by baseline model (1/2) I’ve always been the curious
type. Growing up in the ghetto, I’ve been around for more than a year now. I still haven’t heard it on my own. I’m not sure if I’m a good person. But I’m not the only one. I’ve been to the hospital, but it’s not my ﬁrst time. I’ve been to my psychiatrist’s journals before. “Hey, are you ... are you okay?” I asked. “Yes” I said. “What’s wrong?” she replied, “I was just ... Doing something.” She said, “I’ve never been a bad person.” Paper Reading Festival Summer 2018 23

Generated by baseline model (2/2) The man was an accountant.
He had to be. He had to be the next president. I looked back over the top and saw that his wife was crying in the kitchen. I looked at the clock. It seemed to be coming slower, but I knew if I did it would not be long before I was in my own home. I wasn’t sure. I had a hard time ﬁnding the right words to say. I was about to leave when he suddenly became angry and began talking to me. “Hello, sir, I’m John. What is your name?” “My name is Manuel and I’m a journalist.” I said Paper Reading Festival Summer 2018 24

Perplexity (1/2) Paper Reading Festival Summer 2018 25

Perplexity (2/2) Paper Reading Festival Summer 2018 26

Prompt ranking Paper Reading Festival Summer 2018 27

Human evaluation (1/2) Paper Reading Festival Summer 2018 28

Human evaluation (2/2) Paper Reading Festival Summer 2018 29

Fusion model evaluation Paper Reading Festival Summer 2018 30

Discussion • Proposed model is capable of generating unique text
without copying directly from training set compared to the baseline model • Limitation: • Random sampling can produce errors: e.g. can't is tokenized to ca - n't and the model occasionally produces the ﬁrst token but misses the second • Repetition: generates similar text multiple times because the model focuses frequently on what it has recently produced • In generation of prompts: prompts are fairly generic compaird to human prompts, e.g. many prompts start with the man Paper Reading Festival Summer 2018 31

Conclusion • new large scale dataset for hierarchical story generation
• evaluation metrics for story writing • models to improve generation coherence and relationship with desired premise • data+code: github.com/pytorch/fairseq Paper Reading Festival Summer 2018 32

Thank you! Paper Reading Festival Summer 2018 33

[Paper Reading] Hierarchical Neural Story Gener...

[Paper Reading] Hierarchical Neural Story Generation

Huy Van

More Decks by Huy Van

Featured

Transcript

Hierarchical Neural Story Generation - Angela Fan, Mike Lewis, Yann

Overview (1/2) Task: Story Generation - creative system that can

Overview (2/2) Challenges: • Must remain thematically consistent across the

Contributions (1/2) • Make a dataset (including prompts and stories)

Contributions (2/2) • Introduce a gated self-attention mechanism on top

Writing Prompts Dataset • Reddit's WRTTINGPROMPTS forum (www.reddit.com/r/WritingPrompts/) • Users

Approach: Hierarchical Generation Paper Reading Festival Summer 2018 7

1. Prompt Generation • Use a convolutional language model from

2. Story Generation conditions on Prompt • Based on the

Modeling Unbounded Context with Gated Multi-Scale Self-attention Improve from self-attention

Reference: Attention (from Vaswani et al. (2017)) Attention function can

Reference: Self-attention Example: self-attention distribution for the word “it” (Image

Gated Attention Use multi-head attention similar to Vaswani et al.

Multi-scale Attention • Each head using a separate downsampling function

• : contains the hidden states up to time at

Improving Relevance to Input Prompt with Model Fusion • Train

Generation Use top-k random sampling instead of beam search •

Experimental Setup Baselines 1. Gated CNN language model w/ and

Evaluation • Automatic evaluation • Perplexity • Prompt ranking accuracy

Results Paper Reading Festival Summer 2018 20

Generated by proposed model (1/2) Prompt: Aliens start abducting humans

Generated by proposed model (2/2) Prompt: The scientists have discovered

Generated by baseline model (1/2) I’ve always been the curious

Generated by baseline model (2/2) The man was an accountant.