Slide 1

Slide 1 text

Natural Language Processing with Less Data and More Structures Diyi Yang School of Interactive Computing Georgia Tech

Slide 2

Slide 2 text

NLP in the Age of Data ✓ Internet search ✓ Machine translation ✓ Automated assistants ✓ Question answering ✓ Sentiment analysis 2

Slide 3

Slide 3 text

Done Solving NLP ? 3 Complex and subtle language behavior ○ Social and interpersonal content in language Low-resourced scenarios ○ Real world contexts often have limited labeled data Structured knowledge from social interaction ○ Social intelligence goes beyond any fixed corpus (Bisk et al., 2020) ○ How to mine structured data from interactions (Sap et al., 2019)

Slide 4

Slide 4 text

Built upon Systemic Functional Linguistics (Michael Halliday, 1961) and Gricean Maxims Seven Factors for Social NLP by Hovy and Yang, 2021, NAACL Social Support Exchange Yang et al., 2019b, SIGCHI best paper honorable mention Loanword and Borrowing Stewart et al., 2021, Society of Computation in Linguistics Social Role Identification Yang et al., 2019a, SIGCHI, best paper honorable mention Yang et al., 2016, ICWSM, best paper honorable mention Persuasion Yang et al., 2019, NAACL; Chen and Yang, AAAI 2021 Humor Recognition Yang et al., 2015 EMNLP Personalized Text Generation Wu et al., 2021 NAACL 4

Slide 5

Slide 5 text

5 “Speak to our head of sales - he has over 15 years’ experience” “In high demand - only 2 left on our site” “The picture of widow Bunisia holding her baby in front of her meager home brings tears to my eyes.” ✓ Translate theories into measurable language cues, such as scarcity, authority, emotion, reciprocity, etc ✓ Model persuasion via semi-supervised nets ✓ Ordering of rhetorical persuasion strategies on request success What makes language persuasive (NAACL 2019, EMNLP 2020; AAAI 2021)

Slide 6

Slide 6 text

Done Solving NLP ? 6 Complex and subtle language behavior ○ Social and interpersonal content in language Low-resourced scenarios ○ Real world contexts often have limited labeled data Structured knowledge from social interaction ○ Social intelligence goes beyond any fixed corpus (Bisk et al., 2020) ○ How to mine structured data from interactions (Sap et al., 2019)

Slide 7

Slide 7 text

Overview of This Talk 7 ❏ Low-Resourced Scenarios ❏ Text Mixup for Semi-supervised Classification ❏ LADA for Named Entity Recognition ❏ Structured Knowledge from Conversations ❏ Summarization via Conversation Structures ❏ Summarization via Action and Discourse Graphs

Slide 8

Slide 8 text

Overview of This Talk 8 ➢ Low-Resourced Scenarios ➢ Text Mixup for Semi-supervised Classification Jiaao Chen, Zichao Yang, Diyi Yang. MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification. ACL 2020

Slide 9

Slide 9 text

https://swabhs.com/assets/pdf/talks/utaustin-guest-lecture-biases-and-interpretability.pdf

Slide 10

Slide 10 text

Lots of (Socially) Low-Resourced Settings 10 ❏ Rich social information in text ❏ Often unlabeled in real-world settings ❏ How to utilize limited data for learning

Slide 11

Slide 11 text

Prior Work on Semi-Supervised Text Classification ○ Confident predictions on unlabeled data for self-training (Lee, 2013; Grandvalet and Bengio, 2004; Meng et al., 2018) ○ Consistency training on unlabeled data (Miyato et al., 2019, 2017; Xie et al., 2019) ○ Pre-training on unlabeled data, then fine-tuning on labeled data (Devlin et al., 2019) 11

Slide 12

Slide 12 text

Why Is It Not Enough? ❏ Labeled and unlabeled data are treated separately ❏ Models may easily overfit on labeled data while still underfit on the unlabeled data 12

Slide 13

Slide 13 text

Text Mixup built on mixup in CV (Zhang et al., 2017; Berthelot et al., 2019) 13 ✓ performs linear interpolations in textual hidden space between different training sentences ✓ allows information to share across different sentences and creates infinite augmented training samples

Slide 14

Slide 14 text

14 x: sentence 1 x’: sentence 2 y: label 1 y’: label 2 Text Mixup

Slide 15

Slide 15 text

Encode separately 15

Slide 16

Slide 16 text

Encode separately 16 Linear interpolation

Slide 17

Slide 17 text

Encode separately 17 Linear interpolation Forward-passing

Slide 18

Slide 18 text

Encode separately 18 Linear interpolation Forward-passing Interpolate labels

Slide 19

Slide 19 text

Text Mixup: Which layers to mix? Multi-layer encoders (e.g., BERT) capture different types of information in different layers (Jawahar et al., 2019) ● Surface, e.g., sentence length (3, 4) ● Syntactic, e.g., word order (6, 7) ● Semantic, e.g., tense, subject (7, 9, 12) 19

Slide 20

Slide 20 text

MixText = Text Mixup + Consistency Training for Semi-supervised Text Classification 20 Text mixup

Slide 21

Slide 21 text

21 Back-translations German & Russian as intermediate language

Slide 22

Slide 22 text

22

Slide 23

Slide 23 text

23

Slide 24

Slide 24 text

24 Interpolate labeled/unlabeled text Text mixup

Slide 25

Slide 25 text

25 Text mixup

Slide 26

Slide 26 text

Dataset and Baselines Baselines: ● BERT (Devlin et al., 2019) ● UDA (Xie et al., 2019) 26

Slide 27

Slide 27 text

27

Slide 28

Slide 28 text

Main Results 28

Slide 29

Slide 29 text

Main Results 29

Slide 30

Slide 30 text

Main Results 30

Slide 31

Slide 31 text

Ablation on Different Layer Set in Text Mixup Performance on AG News Here, 10 labeled data per class, consistent for other settings on different datasets 31

Slide 32

Slide 32 text

Learning with Limited Data ✓ Text Mixup performs interpolations in hidden space to create augmented data ✓ MixText ( = Text Mixup + Consistency training) works for text classification with limited training data 32 github.com/GT-SALT/MixText

Slide 33

Slide 33 text

Overview of This Talk 33 ➢ Low-resourced scenarios ✓ Text Mixup for Semi-supervised Classification ➢ LADA for Named Entity Recognition Local Additivity Based Data Augmentation for Semi-supervised NER. Jiaao Chen*, Zhenghui Wang*, Ran Tian, Zichao Yang and Diyi Yang. EMNLP, 2020.

Slide 34

Slide 34 text

Prior Work on Data Augmentation for NER 34 On Dec 11,2020 DATE, Pfizer-BioNTech ORG became the first COVID-19 DISEASE vaccine … more than 95% effective against the variants ... in the United Kingdom PLACE and South Africa PLACE.

Slide 35

Slide 35 text

Prior Work on Data Augmentation for NER ● Adversarial attacks at token-levels (Kobayashi, 2018; Wei and Zou, 2019; Lakshmi Narayan et al. 2019) ○ Suffer from creating diverse examples ● Paraphrasing at sentence-levels (Xie et al., 2019; Kumar et al. 2019) ○ Fail to maintain token-level labels ● Interpolation-based (Zhang et al., 2018; Miao et al., 2020; Chen et al. 2020) ○ Inject too much noise from random sampling 35

Slide 36

Slide 36 text

Local Additivity based Data Augmentation (LADA) What if directly using Mixup for NER 36

Slide 37

Slide 37 text

LADA 37

Slide 38

Slide 38 text

LADA 38

Slide 39

Slide 39 text

LADA 39

Slide 40

Slide 40 text

LADA 40

Slide 41

Slide 41 text

LADA 41

Slide 42

Slide 42 text

Local Additivity based Data Augmentation (LADA) What if directly using Mixup for NER? Didn’t work Strategic LADA to help: Intra-LADA and Inter-LADA 42

Slide 43

Slide 43 text

Intra-LADA ● Interpolate each token’s hidden representation with other tokens from the 43 same sentence Random Permutations

Slide 44

Slide 44 text

Inter-LADA ● Interpolate each token’s hidden representation with each token from random sampling k-nearest neighbors 44 other sentences

Slide 45

Slide 45 text

Inter-LADA 45 Israel plays down fears of war with Syria. Sampled Neighbours: 1. Parliament Speaker Berri: Israel is preparing for war against Syria and Lebanon. 2. Fears of an Israeli operation causes the redistribution of Syrian troops locations in Lebanon.

Slide 46

Slide 46 text

Semi-supervised LADA = LADA + Consistency Training 46 Paraphrases Unlabeled Sentence Consistency Training

Slide 47

Slide 47 text

Semi-supervised LADA: Consistency Training 47

Slide 48

Slide 48 text

Semi-supervised LADA: Consistency Training 48 and should have the same number of entities for any given entity type

Slide 49

Slide 49 text

Datasets and Baselines 49 Baselines (pre-trained models) ● Flair (Akbik et al., 2019): BiLSTM-CRF model with pre-trained Flair embeddings ● BERT (Devlin et al., 2019): BERT-base-multilingual-cased Dataset CoNLL GermEval Train 14987 24000 Dev 3466 2200 Test 3684 5100 Entity Types 4 12

Slide 50

Slide 50 text

Results 50

Slide 51

Slide 51 text

Results 51

Slide 52

Slide 52 text

Results 52

Slide 53

Slide 53 text

Takeaways ● LADA performs interpolations in hidden space among close examples to generate augmented data ● The sampling strategies of mixup for sequence learning are important ● Semi-LADA designed for NER improves performances with limited training data 53 https://github.com/GT-SALT/LADA

Slide 54

Slide 54 text

Overview of This Talk 54 ✓ Low-resourced scenarios ✓ Text Mixup for Semi-supervised Classification ✓ LADA for Named Entity Recognition ➢ Structured knowledge from social interaction ➢ Summarization via Conversation Structures Jiaao Chen, Diyi Yang. Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization. EMNLP 2020

Slide 55

Slide 55 text

55 Hannah needs Betty’s number but Amanda does not have it. She needs to contact Larry. James: Hey! I have been thinking about you : ) Hannah: Oh, that’s nice ; ) James: What are you up to? Hannah: I’m about to sleep James: I miss u. I was hoping to see you Hannah: Have to get up early for work tomorrow James: What about tomorrow? Hannah: To be honest I have plans for tomorrow evening James: Oh ok. What about Sat then? Hannah: Yeah. Sure I am available on Sat James: I’ll pick you up at 8? Hannah: Sounds good. See you then

Slide 56

Slide 56 text

Compared to Documents, Conversations: ○ Informal ○ Verbose ○ Repetitive ○ Reconfirmation ○ Hesitations ○ Interruptions 56

Slide 57

Slide 57 text

Classical Views for Conversations 1. Global view treats conversation as a whole 2. Discrete view treats it as multiple utterances 57

Slide 58

Slide 58 text

More Views from Conversation Structures ❏ Topic View One single conversation may cover multiple topics greetings → invitation → party details → rejection 58

Slide 59

Slide 59 text

More Views from Conversation Structures ❏ Topic View One single conversation may cover multiple topics greetings → invitation → party details → rejection ❏ Stage View Conversations develop certain patterns introduction → state problem→ solution → wrap up 59

Slide 60

Slide 60 text

Extracting Conversation Structures Utterance 1 Utterance 2 ... Utterance n Utterance 3 SentBert Representation 1 Representation 2 ... Representation n Representation 3 60

Slide 61

Slide 61 text

Extracting Topic View Utterance 1 Utterance 2 ... Utterance n Utterance 3 SentBert Representation 1 Representation 2 ... Representation n Representation 3 61 Topic 1 Topic 2 Topic k ... Topic 2 C99

Slide 62

Slide 62 text

Extracting Stage View Utterance 1 Utterance 2 ... Utterance n Utterance 3 SentBert Representation 1 Representation 2 ... Representation n Representation 3 62 Stage 1 Stage 1 Stage k ... Stage 2 HMM

Slide 63

Slide 63 text

Conversation James: Hey! I have been thinking about you : ) Hannah: Oh, that’s nice ; ) James: What are you up to? Hannah: I’m about to sleep James: I miss u. I was hoping to see you Hannah: Have to get up early for work tomorrow James: What about tomorrow? Hannah: To be honest I have plans for tomorrow evening James: Oh ok. What about Sat then? Hannah: Yeah. Sure I am available on Sat James: I’ll pick you up at 8? Hannah: Sounds good. See you then

Slide 64

Slide 64 text

Conversation Topic View James: Hey! I have been thinking about you : ) Greetings Hannah: Oh, that’s nice ; ) James: What are you up to? Today’s plan Hannah: I’m about to sleep James: I miss u. I was hoping to see you Hannah: Have to get up early for work tomorrow Plan for tomorrow James: What about tomorrow? Hannah: To be honest I have plans for tomorrow evening James: Oh ok. What about Sat then? Plan for Saturday Hannah: Yeah. Sure I am available on Sat James: I’ll pick you up at 8? Pick up time Hannah: Sounds good. See you then

Slide 65

Slide 65 text

Conversation Topic View Stage View James: Hey! I have been thinking about you : ) Greetings Openings Hannah: Oh, that’s nice ; ) James: What are you up to? Today’s plan Hannah: I’m about to sleep Intentions James: I miss u. I was hoping to see you Hannah: Have to get up early for work tomorrow Plan for tomorrow Discussion James: What about tomorrow? Hannah: To be honest I have plans for tomorrow evening James: Oh ok. What about Sat then? Plan for Saturday Hannah: Yeah. Sure I am available on Sat James: I’ll pick you up at 8? Pick up time Hannah: Sounds good. See you then Conclusion

Slide 66

Slide 66 text

Multi-view Seq2Seq to Summarize Conversations 66

Slide 67

Slide 67 text

Token-Level Encoding 67

Slide 68

Slide 68 text

68 View-Level Encoding

Slide 69

Slide 69 text

69

Slide 70

Slide 70 text

70

Slide 71

Slide 71 text

Dataset SAMSum (Gliwa et al., 2019) & Baselines Baselines: ❏ Pointer Generator(See et al., 2017), and BART Large (Lewis et al., 2019) 71 # Conversations # Participants # Turns Reference Length Train 14732 2.4 (0.83) 11.17 (6.45) 23.44 (12.72) Dev 818 2.39 (0.84) 10.83 (6.37) 23.42 (12.71) Test 819 2.36 (0.83) 11.25 (6.35) 23.12 (12.20)

Slide 72

Slide 72 text

Models Views ROUGE-1 ROUGE-2 ROUGE-L Pointer Generator Discrete 0.401 0.153 0.366 BART Discrete 0.481 0.245 0.451 BART Global 0.482 0.245 0.466 ROUGE compares the machine-generated summary to the reference summary and counts co-occurrence of 1-grams (ROUGE-1), 2-grams (ROUGE-2), and longest common sequence (ROUGE-L). Baselines in Summarizing Conversations

Slide 73

Slide 73 text

Models Views ROUGE-1 ROUGE-2 ROUGE-L Pointer Generator Discrete 0.401 0.153 0.366 BART Discrete 0.481 0.245 0.451 BART Global 0.482 0.245 0.466 BART Stage 0.487 0.251 0.472 BART Topic 0.488 0.251 0.474 ROUGE compares the machine-generated summary to the reference summary and counts co-occurrence of 1-grams (ROUGE-1), 2-grams (ROUGE-2), and longest common sequence (ROUGE-L). Conversation Structure (Single View) Helps

Slide 74

Slide 74 text

Models Views ROUGE-1 ROUGE-2 ROUGE-L Pointer Generator Discrete 0.401 0.153 0.366 BART Discrete 0.481 0.245 0.451 BART Global 0.482 0.245 0.466 BART Stage 0.487 0.251 0.472 BART Topic 0.488 0.251 0.474 Multi-View BART Topic + Stage 0.493 0.256 0.477 ROUGE compares the machine-generated summary to the reference summary and counts co-occurrence of 1-grams (ROUGE-1), 2-grams (ROUGE-2), and longest common sequence (ROUGE-L). Multi-View Models Perform Better

Slide 75

Slide 75 text

Human annotators rate the quality of summaries [-2 , 0, 2] (Gliwa et al. 2019) 75

Slide 76

Slide 76 text

Challenges in Conversation Summarization 1. Informal Language Use 76 Greg: It’s valentine’s day! 😜 Besty: For sombody without partner today is kinda miserable ...

Slide 77

Slide 77 text

1. Informal Language Use 2. Multiple Participants 77 Greg: Do you know guys anything ... Bob: the most important is … Besty: and they will completely … Donald: yeah, mostly gas and oil. ... Challenges in Conversation Summarization

Slide 78

Slide 78 text

1. Informal Language Use 2. Multiple Participants 3. Multiple Turns 78 Challenges in Conversation Summarization Greg: Hiya, I have a favour to ask. Greg: Can you pick up Marcel ... … (16 turns)

Slide 79

Slide 79 text

1. Informal Language Use 2. Multiple Participants 3. Multiple Turns 4. Referral & Coreference 79 Challenges in Conversation Summarization Greg: Good evening Deana! ... Besty: … belong your Cathreen! Greg: No. She says they aren’t hers. ... Greg: Where did you find them? ...

Slide 80

Slide 80 text

1. Informal Language Use 2. Multiple Participants 3. Multiple Turns 4. Referral & Coreference 5. Repetition & Interruption 80 Challenges in Conversation Summarization Greg: Well, could you pick him up? Besty: What if I can’t? Greg: Besty? Besty: What if I can’t? Greg: Can’t you, really? Besty: I can’t. ... ...

Slide 81

Slide 81 text

1. Informal Language Use 2. Multiple Participants 3. Multiple Turns 4. Referral & Coreference 5. Repetition & Interruption 6. Negation & Rhetorical 81 Challenges in Conversation Summarization Greg: I don’t think he likes me Besty: Why not? He likes you Greg: How do u know? He’s not Besty: He’s looking at u Greg: Really? U sure ... ...

Slide 82

Slide 82 text

1. Informal Language Use 2. Multiple Participants 3. Multiple Turns 4. Referral & Coreference 5. Repetition & Interruption 6. Negation & Rhetorical 7. Role & Language Change 82 Challenges in Conversation Summarization Greg: maybe we can meet on 17th? Besty: I won’t also be 17th Greg: OK, get it Besty: But we could meet 14th? Greg: I am not sure ...

Slide 83

Slide 83 text

1. Informal Language Use 2. Multiple Participants 3. Multiple Turns 4. Referral & Coreference 5. Repetition & Interruption 6. Negation & Rhetorical 7. Role & Language Change 83 Challenges in Conversation Summarization

Slide 84

Slide 84 text

Visualizing Challenges Percentage Out of 100 random examples ROUGE-1 ROUGE-2 ROUGE-L Generic 24 0.613 0.384 0.579 1. Informal language 25 0.471 0.241 0.459 2. Multiple participants 10 0.473 0.243 0.461 3. Multiple turns 23 0.432 0.213 0.432 4. Referral & coreference 33 0.445 0.206 0.430 5. Repetition & interruption 18 0.423 0.180 0.415 6. Negations & rhetorical 20 0.458 0.227 0.431 7. Role & language change 30 0.469 0.211 0.450

Slide 85

Slide 85 text

Overview of This Talk 85 ✓ Low-Resourced Scenarios ✓ Text Mixup for Semi-supervised Classification ✓ LADA for Named Entity Recognition ✓ Structured Knowledge from Conversations ✓ Summarization via Conversation Structures ➢ Summarization via Action and Discourse Graphs Jiaao Chen, Diyi Yang. Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs. NAACL 2021

Slide 86

Slide 86 text

Structure in Conversations: Discourse Relations 86

Slide 87

Slide 87 text

Discourse Relation Graph Extraction ● Pre-train a parser on an annotated corpus (Asher et al. 2016) with 77.5 F1 ● Predict discourse edges between utterances 87

Slide 88

Slide 88 text

Structure in Conversations: Action Graphs 88

Slide 89

Slide 89 text

Action Graph Extraction ● Transform first-person point-of-view to third-person ● Utilize OpenIE (Angeli et al., 2015) to extract “WHO-DOING-WHAT” triplets ● Construct the action graph 89

Slide 90

Slide 90 text

Structure-Aware Model 90

Slide 91

Slide 91 text

Utterance Encoder: BART encoder 91

Slide 92

Slide 92 text

Discourse Graph Encoder: GAT 92

Slide 93

Slide 93 text

Action Graph Encoder: GAT 93

Slide 94

Slide 94 text

Multi-granularity Decoder 94

Slide 95

Slide 95 text

Multi-granularity Decoder 95 ReZero

Slide 96

Slide 96 text

Datasets and Baselines Base Models: BART-base(Lewis et al., 2019) 96 # Dialogues # Participants # Turns # Discourse Edges # Action Triples SAMSum Train 14732 2.40 11.17 8.47 6.72 Dev 818 2.39 10.83 8.34 6.48 Test 819 2.36 11.25 8.63 6.81 ADSC Full 45 2.00 7.51 6.51 37.20

Slide 97

Slide 97 text

Experiments Results (in-domain) Models ROUGE-1 ROUGE-2 ROUGE-L Pointer Generator 0.401 0.153 0.366 BART 0.481 0.245 0.451 BART 0.482 0.245 0.466 BART 0.487 0.251 0.472 BART 0.488 0.251 0.474 Multi-View BART 0.493 0.256 0.477 ROUGE compares the machine-generated summary to the reference summary and counts co-occurrence of 1-grams, 2-grams, and longest common sequence. 97

Slide 98

Slide 98 text

Experiments Results (in-domain) Models ROUGE-1 ROUGE-2 ROUGE-L Pointer Generator 40.08 15.28 36.63 BART-base 45.15 21.66 44.46 Multi-view BART-base 45. 0.245 0.466 BART 0.487 0.251 0.472 BART 0.488 0.251 0.474 Multi-View BART 0.493 0.256 0.477 ROUGE compares the machine-generated summary to the reference summary and counts co-occurrence of 1-grams, 2-grams, and longest common sequence. 98 Baseline Results

Slide 99

Slide 99 text

Experiments Results (in-domain) Models ROUGE-1 ROUGE-2 ROUGE-L Pointer Generator 40.08 15.28 36.63 BART-base 45.15 21.66 44.46 S-BART w. Discourse 45.89 22.50 44.83 S-BART w. Action 45.67 22.39 44.86 BART 0.488 0.251 0.474 Multi-View BART 0.493 0.256 0.477 ROUGE compares the machine-generated summary to the reference summary and counts co-occurrence of 1-grams, 2-grams, and longest common sequence. 99 Baseline Results Our Model w. Single Graph

Slide 100

Slide 100 text

Experiments Results (in-domain) Models ROUGE-1 ROUGE-2 ROUGE-L Pointer Generator 40.08 15.28 36.63 BART-base 45.15 21.66 44.46 S-BART w. Discourse 45.89 22.50 44.83 S-BART w. Action 45.67 22.39 44.86 S-BART w. Discourse & Action 46.07 22.60 45.00 ROUGE compares the machine-generated summary to the reference summary and counts co-occurrence of 1-grams, 2-grams, and longest common sequence. 100 Baseline Results Our Model w. Single Graph Our S-BART

Slide 101

Slide 101 text

Experiments Results (out-of-domain) ROUGE compares the machine-generated summary to the reference summary and counts co-occurrence of 1-grams, 2-grams, and longest common sequence. 101 Models ROUGE-1 ROUGE-2 ROUGE-L BART-base 20.90 5.04 21.23 S-BART w. Discourse 22.42 5.58 22.16 S-BART w. Action 30.91 20.64 35.30 S-BART w. Discourse & Action 34.74 23.86 38.69

Slide 102

Slide 102 text

Experiments Results (out-of-domain) ROUGE compares the machine-generated summary to the reference summary and counts co-occurrence of 1-grams, 2-grams, and longest common sequence. 102 Models ROUGE-1 ROUGE-2 ROUGE-L BART-base 20.90 5.04 21.23 S-BART w. Discourse 22.42 5.58 22.16 S-BART w. Action 30.91 20.64 35.30 S-BART w. Discourse & Action 34.74 23.86 38.69

Slide 103

Slide 103 text

Experiments Results (out-of-domain) ROUGE compares the machine-generated summary to the reference summary and counts co-occurrence of 1-grams, 2-grams, and longest common sequence. 103 Models ROUGE-1 ROUGE-2 ROUGE-L BART-base 20.90 5.04 21.23 S-BART w. Discourse 22.42 5.58 22.16 S-BART w. Action 30.91 20.64 35.30 S-BART w. Discourse & Action 34.74 23.86 38.69

Slide 104

Slide 104 text

Human Evaluations (Likert scale from 1 to 5) 104 Models Factualness Succinctness Informativenes s Ground Truth 4.29 4.40 4.06 BART-base 3.90 4.13 3.74 S-BART w. Discourse 4.11 4.42 3.98 S-BART w. Action 4.17 4.29 3.95 S-BART w. Discourse & Action 4.19 4.41 3.91

Slide 105

Slide 105 text

Conclusion on Summarizing Conversations ✓ Conversation structures help summarizations ✓ Structures also improve generalization performances ✓ Dialogue summarizations still face MANY challenges 105 github.com/GT-SALT/Multi-View-Seq2Seq github.com/GT-SALT/Structure-Aware-BART

Slide 106

Slide 106 text

Overview of This Talk 106 ✓ Low-Resourced Scenarios ✓ Text Mixup for Semi-supervised Classification ✓ LADA for Named Entity Recognition ✓ Structured Knowledge from Conversations ✓ Summarization via Conversation Structures ✓ Summarization via Action and Discourse Graphs

Slide 107

Slide 107 text

Natural Language Processing with Less Data and More Structures Diyi Yang Twitter: @Diyi_Yang www.cc.gatech.edu/~dyang888 Thank You