Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Initiative-Aware Self-Supervised Learning for K...

Initiative-Aware Self-Supervised Learning for Knowledge-Grounded Conversation

Masato Umakoshi

October 07, 2021
Tweet

More Decks by Masato Umakoshi

Other Decks in Programming

Transcript

  1. Motivation (1/2) • Initiative: the ability to drive the direction

    of the conversation • Mixed initiative is an intrinsic feature of human-machine conversations • Both user and system take the initiative in suggesting new conversational directions by introducing new topics, asking a question 4
  2. Motivation (2/2) • KS has potential for mixed initiative by

    switching KS model depending on the initiative • E.g. if the system should take initiative, select knowledge including a new topic • Problem: There is no labeled dataset for an initiative ØIntroduce self-supervised method inspired by heuristics 5
  3. Proposed Method • Propose mixed-initiative knowledge selection method (MIKe) •

    Employ the idea of initiative and mixing two types of KS models • To tackle lack of data for detecting initiative, propose self-supervised method inspired by an observation 6
  4. Key Idea for ISLe 8 • Initiative-aware self- supervised learning

    (ISLe) scheme is based on the following two insights 1. If there is an unsmooth knowledge shift, the KS tends to be user-initiative 2. If a piece of knowledge selected at one turn is deleted, the knowledge tends to be unsmooth
  5. Key Idea for ISLe 9 • Initiative-aware self- supervised learning

    (ISLe) scheme is based on the following two insights 1. If there is an unsmooth knowledge shift, the KS tends to be user-initiative 2. If a piece of knowledge selected at one turn is deleted, the knowledge tends to be unsmooth
  6. Key Idea for ISLe 10 • Hypothesize that • detecting

    missing knowledge is almost equivalent to learning to detect unsmooth knowledge shifts (2) • detecting unsmooth knowledge shifts is almost equivalent to detect user-initiative KS (1) Ødetecting missing knowledge is almost equivalent to detect user-initiative KS
  7. ISLe 11 • Given all pieces of ground-truth chosen knowledge,

    • randomly delete a piece of knowledge • train a model to locate missed knowledge • Use this model’s outputs for pseudo labeling of user-initiative KS • Train initiative discriminator by the pseudo label
  8. Locating Missing Knowledge 12 • Input: 𝐾!,#$%, … , 𝐾&'!,#$%,

    𝐾&(!,#$%, … , 𝐾 ) ,#$% . One of knowledge is deleted • Predict which one is closely after the missing knowledge
  9. Train (Student) Initiative Discriminator 13 • Input: current utterance of

    user ℎ*! and the previously selected knowledge {ℎ+",$%&},-! .'! • Predict the probability of user-initiative KS 𝑃(𝑢.). The label is teacher initiative discriminator’s output
  10. Encoding Layer 15 • Encode the current user’s utterance and

    pooled pieces of knowledge by BERT and pooing operation 𝐻*! = 𝐵𝐸𝑅𝑇 𝑋. ∈ ℝ *! ×0, 𝒉𝑿𝝉 = 𝑝𝑜𝑜𝑙𝑖𝑛𝑔(𝐻*!) ∈ ℝ!×0 𝐻+!," = 𝐵𝐸𝑅𝑇 𝐾.,, ∈ ℝ +!," ×0, 𝒉𝑲𝝉,𝒊 = 𝑝𝑜𝑜𝑙𝑖𝑛𝑔(𝐻+!,") ∈ ℝ!×0
  11. Modification to the encoder 16 • Hereafter, use a custom

    transformer encoder (TransformerE) with two modifications: • Add special positional embeddings representing turn • Apply left-to-right attention mask such that one can not attend to previous positions
  12. User-initiative selector 17 • Given ℎ!! and {ℎ"!,#, … ,

    ℎ"!,|𝒦!|} , predict distribution over knowledge pool P(𝒦# |𝑢𝑠𝑒𝑟) ℎ!!: user’s utterance at turn 𝜏 ℎ"!,#: 𝑖’th knowledge piece at turn 𝜏
  13. System-initiative selector 18 • Given {ℎ+",$%&},-! .'! and {ℎ+!,), …

    , ℎ+!,|𝒦!|} , predict distribution over the knowledge pool P(𝒦.|𝑠𝑦𝑠) ℎ!!: user’s utterance at turn 𝜏 ℎ"!,#: 𝑖’th knowledge piece at turn 𝜏
  14. Initiative discriminator 19 • Given ℎ*! and {ℎ+",$%&},-! .'!, predict

    the probability of user-initiative KS 𝑃(𝑢. ) (same as page 15) • Then select knowledge weighting by this probability: P 𝒦 = 𝑃 𝑢. P 𝒦. 𝑢𝑠𝑒𝑟 + (1 − 𝑃 𝑢. )P(𝒦.|𝑠𝑦𝑠) ℎ!!: user’s utterance at turn 𝜏 ℎ"!,#: 𝑖’th knowledge piece at turn 𝜏
  15. Decoding Layer 20 • Given the concatenated representation of current

    user’s utterance and selected knowledge 𝐻*+! = 𝐻*!; 𝐻+!,$%& ∈ ℝ *+! ×0, decoding with copy mechanism copy generate
  16. Learning Objective 21 • Final learning objective consists of primary

    tasks (knowledge selection and response generation) and ISLe
  17. Dataset 23 • WoW • A piece of knowledge is

    defined as a knowledge sentence • 18,430/1,948/1,933 conversations for training/validation/test • The test set is split into two subsets, Test Seen (in-domain) and Test Unseen (out-of-domain) • There are around 67 pieces of knowledge on average in a knowledge pool • Holl-E • A piece of knowledge is defined as a knowledge sentence [1] • 7,228/930/913 conversations for training/validation/test • There are nearly 60 pieces of knowledge on average in a knowledge pool [1] Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue [Kim+, ICLR20]
  18. Baseline 24 • SKT+PIPM+KDBTS: Previously introduced by Kodama-san • DiffKS:

    leverage difference between the previous knowledge and candidates • DukeNet: regard tracking the previously selected knowledge and selecting the current knowledge as dual tasks
  19. Automatic Evaluation 25 MIKe (proposed method) beats baselines in all

    settings & metrics Automatic evaluation results on WoW dataset. *: Statistically significant (t-test, p<0.05)
  20. Human Evaluation 26 Human evaluation results on WoW dataset MIKe

    beats baselines in all settings & metrics
  21. Ablation Study 27 Where the improvements of MIKe come from?

    ØAll! ISLe: initiative-aware self-supervised learning ID: initiative discriminator, [U|S]IS: [user|system] initiative selector Ablation study on the WoW dataset
  22. Initiative Discrimination Evaluation 28 • How well initiative discriminator is?

    • Does ISLe help initiative discrimination? ØISLe plays an essential role • (Even with ISLe, performance seems a little limited…)
  23. Case Study (1/2) 29 𝐾1: no knowledge used . 𝐾2:

    while basketball is most often played as a team sport with five players on each side, two-on-two, and one-on-one competitions are also common. 𝐾3: … 𝐾4: jordan played 15 seasons in the nba for the chicago bulls and washington wizards. User: are you a basketball fan? System: (𝐾2) yes, i am a fan of the five player sport. are you? User: not as much as i used to be. i watched the bulls in the 90s when they were the dream team. MIKe: (𝐾4 ✓) i know that jordan played 15 seasons in the nba for the chicago bulls and washington wizards. SKT+PIPM+KDBTS: (𝐾2 ✗) i’ m not sure but i know that while basket- ball is most played as a team sport with five players. User-initiative KS Context Knowledge Pool
  24. Case Study (2/2) 30 𝐾1: no knowledge used . 𝐾2:

    instagram is a mobile, desktop, and internet-based photo-sharing application and service that allows users to share pictures and videos either publicly, or privately to pre-approved followers. 𝐾3: … 𝐾4: instagram is owned by facebook. User: i hate to admit it but i spend way too much time on instagram! System: (𝐾2) i use it for sharing photos and videos User: do you have a lot of followers? MIKe: (𝐾4 ✓) i have a lot of followers and i do know that it is owned by facebook. SKT+PIPM+KDBTS: (𝐾2 ✗) i have not i have not. System-initiative KS Context Knowledge Pool
  25. Conclusion 31 • Propose mixed-initiative knowledge selection method (MIKe) •

    Employ the idea of initiative and mixing two types of KS models • To tackle lack of data for detecting initiative, propose self-supervised method inspired by an observation • Achieve SOTA on two popular benchmark datasets
  26. Comments 32 • Initiative discrimination seems not to work well

    (page 33), but still, performance is improved • Author says an improvement on it would be one of future directions • Initiative is clear? How about an agreement between annotators