Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Paper Reading] Hierarchical Neural Story Generation

Huy Van
August 21, 2018
1.2k

[Paper Reading] Hierarchical Neural Story Generation

Notes from my reading of the ACL 2018 paper: "Hierarchical Neural Story Generation"

Huy Van

August 21, 2018
Tweet

Transcript

  1. Hierarchical Neural Story Generation - Angela Fan, Mike Lewis, Yann

    Dauphin - Facebook AI @ACL2018 Presented by Van Phu Quang Huy 2018/08/19 Paper Reading Festival Summer 2018 1
  2. Overview (1/2) Task: Story Generation - creative system that can

    build coherent and fluent passages of text about a topic Example of a story: A light breeze swept the ground, and carried with it still the distant scents of dust and time-worn stone. The Warrior led the way, heaving her mass of armour and muscle over the uneven terrain. She soon crested the last of the low embankments, which still bore the unmistakable fingerprints of haste and fear. She lifted herself up onto the top the rise, and looked out at the scene before her. [...] Paper Reading Festival Summer 2018 2
  3. Overview (2/2) Challenges: • Must remain thematically consistent across the

    complete document requires modeling very long range dependencies • Requires creativity, need a high level plot Ideas: • First generates a sentence called the prompt describing the topic for the story • Then generates the story conditions on the prompt Paper Reading Festival Summer 2018 3
  4. Contributions (1/2) • Make a dataset (including prompts and stories)

    by collecting from Reddit's WRITINGPROMPTS forum. • Introduce a fusion mechanism (a seq2seq model trained on top of an pretrained seq2seq) to improve the relevance of the generated story to its prompt Paper Reading Festival Summer 2018 4
  5. Contributions (2/2) • Introduce a gated self-attention mechanism on top

    of a convolutional architecture to improve efficiency of modeling long documents • Introduce new evaluation metrics for story generation Paper Reading Festival Summer 2018 5
  6. Writing Prompts Dataset • Reddit's WRTTINGPROMPTS forum (www.reddit.com/r/WritingPrompts/) • Users

    write story premises or prompts, other users respond • Each prompt can have multiple story responses • Collecting data and preprocessing • Scraped 3 years of data • Removed automated bot posts, deleted posts, announcements, short stories • Used NLTK for tokenization Paper Reading Festival Summer 2018 6
  7. 1. Prompt Generation • Use a convolutional language model from

    Dauphin et al. (2017) Right figure: illustrates a Gated Convolutional Neural Network (GCNN) using Gated Linear Units (GLU) • Why CNN instead of RNN? Because CNN allows parallelization Paper Reading Festival Summer 2018 8
  8. 2. Story Generation conditions on Prompt • Based on the

    convolutional seq2seq model of Gehring et al. (2017) (right figure) • Improve with: • Gated Multi-Scale Self-attention • Model Fusion Paper Reading Festival Summer 2018 9
  9. Modeling Unbounded Context with Gated Multi-Scale Self-attention Improve from self-attention

    mechanism with: • Multi-scale attention • Gated attention Paper Reading Festival Summer 2018 10
  10. Reference: Attention (from Vaswani et al. (2017)) Attention function can

    be described as mapping a query and a set of key-value pairs to an output • : query, : key, : value, : scaling factor Paper Reading Festival Summer 2018 11
  11. Reference: Self-attention Example: self-attention distribution for the word “it” (Image

    from https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html) Paper Reading Festival Summer 2018 12
  12. Gated Attention Use multi-head attention similar to Vaswani et al.

    (2017) (right figure) to allow each head to attend to information at different positions. However, queries, keys, values not given by linear projections but by gated deep neural nets with Gated Linear Unit (GLU). Paper Reading Festival Summer 2018 13
  13. Multi-scale Attention • Each head using a separate downsampling function

    in order to attend to different information Paper Reading Festival Summer 2018 14
  14. • : contains the hidden states up to time at

    layer • : gated downsampling networks ( : query, : key, : value ) Paper Reading Festival Summer 2018 15
  15. Improving Relevance to Input Prompt with Model Fusion • Train

    a seq2seq model that has access to the hidden states of a pretrained seq2seq model (can be seen as a type of boosting or residual learning that allows the second model to focus on what the first model faled to learn, such as conditioning on the prompt) Paper Reading Festival Summer 2018 16
  16. Generation Use top-k random sampling instead of beam search •

    At each timestep, generate the probability of each word, then randomly sample from most likely candidates from this distribution • (Beam search tends to produce common phrases and repetitive text from the training set) Paper Reading Festival Summer 2018 17
  17. Experimental Setup Baselines 1. Gated CNN language model w/ and

    w/o self-attention 2. seq2seq: LSTMs, Conv seq2seq, Conv seq2seq w/ decoder self-attention 3. Ensemble: of 2 Conv seq2seq w/ self-attention 4. KNN: finds the closest prompt in the training set for each prompt in the test set Paper Reading Festival Summer 2018 18
  18. Evaluation • Automatic evaluation • Perplexity • Prompt ranking accuracy

    • Human evaluation Paper Reading Festival Summer 2018 19
  19. Generated by proposed model (1/2) Prompt: Aliens start abducting humans

    Generated story: It has been two weeks, and the last of my kind has gone. It is only a matter of time until there will be nothing left. I’m not sure what the hell is going on... I can’t think. I can hear a distant scream. I think of a strange, alien sound. I try to speak, but am interrupted by something, something that feels like a drum, I ca not tell. I mean I’m just a little bit older than an average human. But I can, and I can feel the vibrations . I hear the sound of a ship approaching. The ground quakes at the force of the impact, and a loud explosion shatters the silence. Paper Reading Festival Summer 2018 21
  20. Generated by proposed model (2/2) Prompt: The scientists have discovered

    something terrible. Generated story: The scientist stood there, a little dazed as he stared. “What is it?” He asked. “This...this...Thing...This is a virus. A chemical that can destroy entire planet and it is a very small, complex, chemical that could destroy any planet.” The scientist replied. His lab assistant looked down at the tablet. “I’ve just discovered it. I can’t believe it. It looks like it’s made of some sort of chemical that’s very dangerous.” “ Well, there’s a virus on the ground. It’s very effective...” “ I can’t believe what it is.” he said, pointing to the scientist . “ We don’t know what this thing is. We haven’t seen anything like it . We can’t even see anything like this. ” Dr. Jones stared at the scientist for a moment. “What do you mean what does it do ?” “It...It ’s a monster.” Paper Reading Festival Summer 2018 22
  21. Generated by baseline model (1/2) I’ve always been the curious

    type. Growing up in the ghetto, I’ve been around for more than a year now. I still haven’t heard it on my own. I’m not sure if I’m a good person. But I’m not the only one. I’ve been to the hospital, but it’s not my first time. I’ve been to my psychiatrist’s journals before. “Hey, are you ... are you okay?” I asked. “Yes” I said. “What’s wrong?” she replied, “I was just ... Doing something.” She said, “I’ve never been a bad person.” Paper Reading Festival Summer 2018 23
  22. Generated by baseline model (2/2) The man was an accountant.

    He had to be. He had to be the next president. I looked back over the top and saw that his wife was crying in the kitchen. I looked at the clock. It seemed to be coming slower, but I knew if I did it would not be long before I was in my own home. I wasn’t sure. I had a hard time finding the right words to say. I was about to leave when he suddenly became angry and began talking to me. “Hello, sir, I’m John. What is your name?” “My name is Manuel and I’m a journalist.” I said Paper Reading Festival Summer 2018 24
  23. Discussion • Proposed model is capable of generating unique text

    without copying directly from training set compared to the baseline model • Limitation: • Random sampling can produce errors: e.g. can't is tokenized to ca - n't and the model occasionally produces the first token but misses the second • Repetition: generates similar text multiple times because the model focuses frequently on what it has recently produced • In generation of prompts: prompts are fairly generic compaird to human prompts, e.g. many prompts start with the man Paper Reading Festival Summer 2018 31
  24. Conclusion • new large scale dataset for hierarchical story generation

    • evaluation metrics for story writing • models to improve generation coherence and relationship with desired premise • data+code: github.com/pytorch/fairseq Paper Reading Festival Summer 2018 32