Initiative-Aware Self-Supervised Learning for Knowledge-Grounded Conversation

Initiative-Aware Self-Supervised Learning for Knowledge-Grounded Conversation Paper Reading Masato Umakoshi,
Kyoto University 1

Recap: Knowledge-Grounded Conversation 2

SIGIR2021 3

Motivation (1/2) • Initiative: the ability to drive the direction
of the conversation • Mixed initiative is an intrinsic feature of human-machine conversations • Both user and system take the initiative in suggesting new conversational directions by introducing new topics, asking a question 4

Motivation (2/2) • KS has potential for mixed initiative by
switching KS model depending on the initiative • E.g. if the system should take initiative, select knowledge including a new topic • Problem: There is no labeled dataset for an initiative ØIntroduce self-supervised method inspired by heuristics 5

Proposed Method • Propose mixed-initiative knowledge selection method (MIKe) •
Employ the idea of initiative and mixing two types of KS models • To tackle lack of data for detecting initiative, propose self-supervised method inspired by an observation 6

Overview 7

Key Idea for ISLe 8 • Initiative-aware self- supervised learning
(ISLe) scheme is based on the following two insights 1. If there is an unsmooth knowledge shift, the KS tends to be user-initiative 2. If a piece of knowledge selected at one turn is deleted, the knowledge tends to be unsmooth

Key Idea for ISLe 9 • Initiative-aware self- supervised learning
(ISLe) scheme is based on the following two insights 1. If there is an unsmooth knowledge shift, the KS tends to be user-initiative 2. If a piece of knowledge selected at one turn is deleted, the knowledge tends to be unsmooth

Key Idea for ISLe 10 • Hypothesize that • detecting
missing knowledge is almost equivalent to learning to detect unsmooth knowledge shifts (2) • detecting unsmooth knowledge shifts is almost equivalent to detect user-initiative KS (1) Ødetecting missing knowledge is almost equivalent to detect user-initiative KS

ISLe 11 • Given all pieces of ground-truth chosen knowledge,
• randomly delete a piece of knowledge • train a model to locate missed knowledge • Use this model’s outputs for pseudo labeling of user-initiative KS • Train initiative discriminator by the pseudo label

Locating Missing Knowledge 12 • Input: 𝐾!,#$%, … , 𝐾&'!,#$%,
𝐾&(!,#$%, … , 𝐾 ) ,#$% . One of knowledge is deleted • Predict which one is closely after the missing knowledge

Train (Student) Initiative Discriminator 13 • Input: current utterance of
user ℎ*! and the previously selected knowledge {ℎ+",$%&},-! .'! • Predict the probability of user-initiative KS 𝑃(𝑢.). The label is teacher initiative discriminator’s output

Overview 14

Encoding Layer 15 • Encode the current user’s utterance and
pooled pieces of knowledge by BERT and pooing operation 𝐻*! = 𝐵𝐸𝑅𝑇 𝑋. ∈ ℝ *! ×0, 𝒉𝑿𝝉 = 𝑝𝑜𝑜𝑙𝑖𝑛𝑔(𝐻*!) ∈ ℝ!×0 𝐻+!," = 𝐵𝐸𝑅𝑇 𝐾.,, ∈ ℝ +!," ×0, 𝒉𝑲𝝉,𝒊 = 𝑝𝑜𝑜𝑙𝑖𝑛𝑔(𝐻+!,") ∈ ℝ!×0

Modification to the encoder 16 • Hereafter, use a custom
transformer encoder (TransformerE) with two modifications: • Add special positional embeddings representing turn • Apply left-to-right attention mask such that one can not attend to previous positions

User-initiative selector 17 • Given ℎ!! and {ℎ"!,#, … ,
ℎ"!,|𝒦!|} , predict distribution over knowledge pool P(𝒦# |𝑢𝑠𝑒𝑟) ℎ!!: user’s utterance at turn 𝜏 ℎ"!,#: 𝑖’th knowledge piece at turn 𝜏

System-initiative selector 18 • Given {ℎ+",$%&},-! .'! and {ℎ+!,), …
, ℎ+!,|𝒦!|} , predict distribution over the knowledge pool P(𝒦.|𝑠𝑦𝑠) ℎ!!: user’s utterance at turn 𝜏 ℎ"!,#: 𝑖’th knowledge piece at turn 𝜏

Initiative discriminator 19 • Given ℎ*! and {ℎ+",$%&},-! .'!, predict
the probability of user-initiative KS 𝑃(𝑢. ) (same as page 15) • Then select knowledge weighting by this probability: P 𝒦 = 𝑃 𝑢. P 𝒦. 𝑢𝑠𝑒𝑟 + (1 − 𝑃 𝑢. )P(𝒦.|𝑠𝑦𝑠) ℎ!!: user’s utterance at turn 𝜏 ℎ"!,#: 𝑖’th knowledge piece at turn 𝜏

Decoding Layer 20 • Given the concatenated representation of current
user’s utterance and selected knowledge 𝐻*+! = 𝐻*!; 𝐻+!,$%& ∈ ℝ *+! ×0, decoding with copy mechanism copy generate

Learning Objective 21 • Final learning objective consists of primary
tasks (knowledge selection and response generation) and ISLe

Experiments 22

Dataset 23 • WoW • A piece of knowledge is
defined as a knowledge sentence • 18,430/1,948/1,933 conversations for training/validation/test • The test set is split into two subsets, Test Seen (in-domain) and Test Unseen (out-of-domain) • There are around 67 pieces of knowledge on average in a knowledge pool • Holl-E • A piece of knowledge is defined as a knowledge sentence [1] • 7,228/930/913 conversations for training/validation/test • There are nearly 60 pieces of knowledge on average in a knowledge pool [1] Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue [Kim+, ICLR20]

Baseline 24 • SKT+PIPM+KDBTS: Previously introduced by Kodama-san • DiffKS:
leverage difference between the previous knowledge and candidates • DukeNet: regard tracking the previously selected knowledge and selecting the current knowledge as dual tasks

Automatic Evaluation 25 MIKe (proposed method) beats baselines in all
settings & metrics Automatic evaluation results on WoW dataset. *: Statistically significant (t-test, p<0.05)

Human Evaluation 26 Human evaluation results on WoW dataset MIKe
beats baselines in all settings & metrics

Ablation Study 27 Where the improvements of MIKe come from?
ØAll! ISLe: initiative-aware self-supervised learning ID: initiative discriminator, [U|S]IS: [user|system] initiative selector Ablation study on the WoW dataset

Initiative Discrimination Evaluation 28 • How well initiative discriminator is?
• Does ISLe help initiative discrimination? ØISLe plays an essential role • (Even with ISLe, performance seems a little limited…)

Case Study (1/2) 29 𝐾1: no knowledge used . 𝐾2:
while basketball is most often played as a team sport with five players on each side, two-on-two, and one-on-one competitions are also common. 𝐾3: … 𝐾4: jordan played 15 seasons in the nba for the chicago bulls and washington wizards. User: are you a basketball fan? System: (𝐾2) yes, i am a fan of the five player sport. are you? User: not as much as i used to be. i watched the bulls in the 90s when they were the dream team. MIKe: (𝐾4 ✓) i know that jordan played 15 seasons in the nba for the chicago bulls and washington wizards. SKT+PIPM+KDBTS: (𝐾2 ✗) i’ m not sure but i know that while basketball is most played as a team sport with five players. User-initiative KS Context Knowledge Pool

Case Study (2/2) 30 𝐾1: no knowledge used . 𝐾2:
instagram is a mobile, desktop, and internet-based photo-sharing application and service that allows users to share pictures and videos either publicly, or privately to pre-approved followers. 𝐾3: … 𝐾4: instagram is owned by facebook. User: i hate to admit it but i spend way too much time on instagram! System: (𝐾2) i use it for sharing photos and videos User: do you have a lot of followers? MIKe: (𝐾4 ✓) i have a lot of followers and i do know that it is owned by facebook. SKT+PIPM+KDBTS: (𝐾2 ✗) i have not i have not. System-initiative KS Context Knowledge Pool

Conclusion 31 • Propose mixed-initiative knowledge selection method (MIKe) •
Employ the idea of initiative and mixing two types of KS models • To tackle lack of data for detecting initiative, propose self-supervised method inspired by an observation • Achieve SOTA on two popular benchmark datasets

Comments 32 • Initiative discrimination seems not to work well
(page 33), but still, performance is improved • Author says an improvement on it would be one of future directions • Initiative is clear? How about an agreement between annotators

Initiative-Aware Self-Supervised Learning for K...

Initiative-Aware Self-Supervised Learning for Knowledge-Grounded Conversation

Masato Umakoshi

More Decks by Masato Umakoshi

Other Decks in Programming

Featured

Transcript