Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Discourse: Theories and computational models

Discourse: Theories and computational models

I presented a talk on a research paper published by a joint collaboration from researchers of CMU. I've studied this subject in my Masters studies.

Mahak Gupta

May 25, 2016
Tweet

More Decks by Mahak Gupta

Other Decks in Education

Transcript

  1. TOWARD ABSTRACTIVE SUMMARIZATION USING SEMANTIC REPRESENTATION (NAACL HLT* 2015) Authors:

    F. Liu, J. Flanigan, S. Thomson, N. Sadeh, N. Smith Presented By: Mahak Gupta Date: 25th May 2016 * North American Chapter of the Association for Computational Linguistics – Human Language Technologies
  2. THINGS TO EXPECT FROM THIS TALK! • Introduction to summarization

    • Abstract Meaning Representation (AMR) • A semantic representation of a language • Semantic representation approach for summarization • Graphs, Integer Linear Programming • Some Mathematical equations (Sorry for that … We’ll not go too much into it). In spite of that, I’ll try to have an interesting session for the next 45- 50 minutes J 2
  3. AGENDA 1. Introduction, Motivation, Concepts Used 2. Model Used a)

    Semantic Summarization b) Subgraph Prediction 3. Generation 4. Dataset Used 5. Results 6. Future work 3
  4. TEXT SUMMARIZATION • Goal: Reducing a text with a computer

    program in order to create a summary that retains the most important points of the original text. • Summarization Applications • News summaries • Reviews 4
  5. DOCUMENT ORIENTED SUMMARIZATION SINGLE VS. MULTIPLE DOCUMENTS §Single Document Summarization

    : • Given a single document produce • Abstract • Outline • Headline § Multiple Document Summarization : • Given a group of document produce a gist of the document • A series of news stories of the same event • A set of webpages about a topic 5 Stanford NLP Course
  6. QUERY-FOCUSED SUMMARIZATION & GENERIC SUMMARIZATION §Generic Summarization • Summarize the

    content of a document §Query-focused Summarization • Summarize a document with respect to an information need expressed in a user query 6 Stanford NLP Course
  7. EXTRACTIVE SUMMARIZATION & ABSTRACTIVE SUMMARIZATION §Extractive Summarization: • Create the

    summary from phrases or sentences in the source document(s) §Abstractive Summarization • Express the ideas in the source document using different words 7 Stanford NLP Course
  8. EXTRACTIVE SUMMARIZATION SEEMS TO BE A SIMPLE APPROACH… IS IT

    ? • Bias : with limit on summary size • ….. Selected sentence may have missed some critical info... • Verbose : may contain irrelevant content 8
  9. CAN YOU SUGGEST SOME APPLICATIONS OF ABSTRACTIVE SUMMARIZATION FROM YOUR

    DAY-TO-DAY LIFE ? • Movie reviews to a friend • Minutes of Meeting (share info. with colleagues) • Share reviews about a book • Last night studies before exam ;) 9
  10. SEMANTIC REPRESENTATION • Semantics = All about meanings. • Sematic

    representation is an abstract language in which meanings can be represented. • Are you aware of any other representations in Semantics ? - First order Predicate Logic • In this paper we’ll discuss mainly about Abstract Meaning Representations (AMR) as a semantic representation. 10
  11. ABSTRACT MEANING REPRESENTATIONS • AMRs are rooted, labeled, directed, acyclic

    graphs that are easy for people to read, and easy for programs to traverse. chase-01 dog cat The dog was chasing a cat Concepts (from PropBank*) Relations ARG-0 ARG-1 11 * Proposition Bank – http://propbank.github.io (Vx)[dog(x) -> (y) [cat(y) & chase(x,y) ] ]
  12. BUT WHAT IS THE LOGIC BEHIND ABSTRACTIVE SUMMARIZATION + SEMANTIC

    REPRESENTATION ? http://cdn.theatlantic.com/static/mt/assets/food/main%20wavebreakmedia%20ltd%20shutterstock_85474510.jpg - Semantic Representation in the Human Brain during Listening And Reading - A continuous semantic space describes the representation of thousands of object and action categories across the human brain 12
  13. AGENDA 1. Introduction, Motivation, Concepts Used 2. Model Used a)

    Semantic Summarization b) Subgraph Prediction 3. Generation 4. Dataset Used 5. Results 6. Future work 13
  14. SEMANTIC SUMMARIZATION Graph Transformation Source Graph Construction Subgraph Prediction Text

    Generation JAMR:- https://github.com/jflanigan/jamr (Flanigan et al., 2014) Collapsing Summary: Joe’s dog was chasing a cat in the garden. Sentence A: I saw Joe’s dog, which was running in the garden. Sentence B: The dog was chasing a cat 14
  15. * SOURCE GRAPH CONSTRUCTION ROOT Summary: Joe’s dog was chasing

    a cat in the garden. Collapse Graph Expansion Concept Merging 15 * * * * * Sentence A: I saw Joe’s dog, which was running in the garden. Sentence B: The dog was chasing a cat
  16. SOURCE GRAPH CONSTRUCTION (CONT’D) • Summary Edge coverage • Percentage

    of summary graph edges that can be covered by an automatically constructed source graph. Summary Edge Coverage (%) Labeled Unlabeled Expand Train 64.8 67.0 75.5 Dev. 77.3 78.6 85.4 Test 63.0 64.7 75.0 16
  17. SUBGRAPH PREDICTION • Including important information without altering its meaning,

    maintaining brevity, and producing fluent language. • Let G(V,E) be a source graph; • We want a subgraph G’(V’,E’) that maximizes the objective function • f(v) = Feature vector for vertices/concepts, g(e) = Feature vector for edges. • Vi = 1 if node i is included in graph, ei,j =1 if edge from node i to node j is included 17
  18. 18 Node Features Concept Identity feature for concept label Freq

    Concept freq in the input sentence set Depth Average and smallest depth of node to the root of the sentence graph. Position Average and foremost position of sentences containing the concept. Span Average and longest word span of concept; binarized using 5 length thresholds. Entity Two binary features indicating whether the concept is a named entity/date entity or not Bias Bias term, 1 for any node NODE FEATURES
  19. 19 EDGE FEATURES Edge Features Label First and second most

    frequent edge labels between concepts. Freq Edge frequency (w/o label, non-expanded edges) in the document sentences Position Average and foremost position of sentences containing the edge (without label) Nodes Node features extracted from the source and target nodes. IsExpanded A binary feature indicating the edge is due to graph expansion or not. Bias Bias term, 1 for any edge
  20. SUBGRAPH PREDICTION – ILP DECODER: CONSTRAINTS 20 • An Edge

    is selected if both endpoints are selected • Subgraph is a Tree • Tree size is limited • Tree size is limited Example ILP Problem
  21. AGENDA 1. Introduction, Motivation, Concepts Used 2. Model Used a)

    Semantic Summarization b) Subgraph Prediction 3. Generation 4. Dataset Used 5. Results 6. Future work 21
  22. SUMMARY GENERATION • Heuristic approach to generate a bag of

    words. • Given a predicted subgraph, a system summary is created by finding the most frequently aligned word span for each concept node (JAMR gives these alignments). • Words in the resulting spans are generated in no particular order. • This is not a natural language summary, it is suitable for unigram-based summarization evaluation methods like ROUGE-1 (Unigram alignments). • Overall, this is a research problem. 22
  23. SUMMARY GENERATION (EXAMPLE) • Joe dog garden chasing cat •

    Joe garden chasing cat dog • Expected output : Joe’s dog was chasing the cat in garden 23 Optimized subgraph (Output of subgraph prediction)
  24. AGENDA 1. Introduction, Motivation, Concepts Used 2. Model Used a)

    Semantic Summarization b) Subgraph Prediction 3. Generation 4. Dataset Used 5. Results 6. Future work 24
  25. DATASET • AMR Bank; Only Proxy report section (like newsarticles)

    • Gold Standard AMR + Summaries #Docs Ave. # Sents Source Graph Summ. Doc. Nodes Edges Expand Train 298 1.5 17.5 127 188 2,670 Dev. 35 1.4 19.2 143 220 3,203 Test 33 1.4 20.5 162 255 4,002 25
  26. AGENDA 1. Introduction, Motivation, Concepts Used 2. Model Used a)

    Semantic Summarization b) Subgraph Prediction 3. Dataset Used 4. Generation 5. Results 6. Future work 26
  27. RESULTS 27 Subgraph Prediction Summarization Nodes Edges ROUGE-1 P (%)

    R (%) F (%) F (%) P (%) R (%) F (%) Gold Standard Parse Perceptron 39.6 46.1 42.6 24.7 41.4 27.1 32.3 Hinge 41.2 47.9 44.2 26.4 42.6 28.3 33.5 Ramp 54.7 63.5 58.7 39.0 51.9 39.0 44.3 Ramp + Expand 53.0 61.3 56.8 36.1 50.4 37.4 42.8 JAMR Parse Perceptron 42.2 48.9 45.2 14.5 46.1 35.0 39.5 Hinge 41.7 48.3 44.7 15.8 44.9 33.6 38.2 Ramp 48.1 55.6 51.5 20.0 50.6 40.0 44.4 Ramp + Expand 47.5 54.6 50.7 19.0 51.2 40.0 44.7
  28. AGENDA 1. Introduction, Motivation, Concepts Used 2. Model Used a)

    Semantic Summarization b) Subgraph Prediction 3. Generation 4. Dataset Used 5. Results 6. Future work 28
  29. FUTURE WORK • Jointly performing subgraph and edge label prediction;

    • Exploring a full-fledged pipeline that consists of an automatic AMR parser, a graph-to-graph summarizer, and a AMR-to-text generator; • Devising an evaluation metric that is better suited to abstractive summarization. • Tense prediction in AMR’S 29
  30. TAKE HOME • Abstractive Summarization vs Extractive Summarization • AMR

    representations • Semantic Representation approach towards Abstractive Summarization • It is an interesting approach taking into account the meaning representation of a language but it doesn’t generalize well on unknown data (may be because its fairly a very new idea). • Other graph approaches scale better and are now tested with ROGUE-2,3,4 scores i.e bigrams, trigrams etc. 30 Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions
  31. OKAY, ITS NOW YOUR TURN TO DO SOME TALKING ;)

    1. We’ve discussed extractive summarization is a fairly simple approach, then What is the motivation for abstractive summarization ? 2. What is the term ROGUE-1. Why is it discussed in this paper ? 31 Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions
  32. REFERENCES • Toward Abstractive Summarization Using Semantic Representations • A

    Discriminative Graph-Based Parser for the Abstract Meaning Representation • Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions • Stanford NLP Course 33