Slide 1

Slide 1 text

High-Quality Diversification for Task- Oriented Dialogue Systems Grace Hui Yang Joint work with my students Zhiwen Tang and Hrishikesh Kulkarni March 2, 2023 @ National University of Singapore

Slide 2

Slide 2 text

Task-Oriented Dialogue Systems ● Training the DRL agent with user simulators ○ Training with real human user is expensive ● Most user simulators are rule-based ○ Designed by domain experts ○ Efficient in routine task scenarios ● But they are unable to generate spontaneous responses like real humans 2

Slide 3

Slide 3 text

3

Slide 4

Slide 4 text

Increase dialogue diversity ● Long-lasting research interest motivated by different needs: ○ Avoid dull responses ○ Obtain robust agents ● Ideas to improve diversification: ○ Enforce diversity in objective functions (Li et al., 2016a; Baheti et al., 2018) ○ Perturb language rules (Niu and Bansal, 2019) or environment parameters (Tobin et al., 2017; Ruiz et al., 2019) ○ Randomize trajectory synthesis (Andrychowicz et al., 2017; Lu et al., 2019) ○ Select more diverse data contributors (Stasaski et al., 2020) ○ Sample training trajectories from a diverse set of environments (Chua et al., 2018; Janner et al., 2019) 4

Slide 5

Slide 5 text

Training with Diversified User Models ● An ensemble of diversified user models ● One issue is error propagation ○ Errors propagate from (user) model learning to policy learning 5

Slide 6

Slide 6 text

6

Slide 7

Slide 7 text

We propose to ● Control the level of diversification ○ Control its intensity, frequency, and amount ○ Reach a balance between exploration/diversity and accuracy ● A diversification method for reinforcement learning-supported dialogue agent ○ Intermittent Short Extension Ensemble (I-SEE) 7

Slide 8

Slide 8 text

System Architecture 8

Slide 9

Slide 9 text

Intermittent Short Extension Ensemble (I-SEE) ● Generate a base trajectory by interacting with an expert user simulator 9 Base trajectory 𝛕₁ 𝛕₂ 𝛕₃ 𝛕₄ 𝛕₅ 𝛕₀ ● Intermittently branch the interaction trajectory ● Extend for a short horizon by interacting with a diversified user model ● Both the base trajectories and diversified trajectories are used for training Diversified trajectory

Slide 10

Slide 10 text

10

Slide 11

Slide 11 text

Diversified User Model Ensemble (DUME) Create DUME with imitation learning (e.g. behavior cloning) 11 Dialogue Agent User Simulator Diversified User Model Ensemble (DUME) User models are neural networks with same architecture but with different initialization parameters.

Slide 12

Slide 12 text

Control the Quality of Diversification ● How big is the ensemble (DUME) size? 12 Base trajectory 𝛕₁ 𝛕₂ 𝛕₃ 𝛕₄ 𝛕₅ 𝛕₀ Diversified trajectory ● How long is the branching horizon? ● How much is the branching intensity

Slide 13

Slide 13 text

Policy Learning with I-SEE ● The dialogue agent starts by interacting with the expert simulator from t=0 ● At t=p, the agent switch to interacting with a user model sampled from DUME ● We balance between diversity and quality by controlling ○ The branching intensity ○ The branching horizon ○ The ensemble size 13

Slide 14

Slide 14 text

14

Slide 15

Slide 15 text

15

Slide 16

Slide 16 text

Experiments ● Multiwoz (Budzianowski et al., 2018) ○ 7 domains: train, restaurant, taxi, ... ○ A user goal may involve multiple domains ○ 8,438 dialogues ● Evaluation Metrics ○ Success rate ○ Inform F1 ○ Match ○ #Turns 16 ● Baselines ○ PPO (Schulman et al., 2017) ○ DQN (Mnih et al., 2015) ○ DDQ (DQN + unconstrained diversification) (Peng et al., 2018) ○ GDPL (PPO + IRL, leading performer on Multiwoz) (Takanobu et al., 2019) ○ MADPL (MARL, leading performer on Multiwoz) (Takanobu et al., 2020)

Slide 17

Slide 17 text

Experiment Results 17 Three settings: ● X: the algorithm without diversification ● X + Dvs: full and uncontrolled diversification ● X+I-SEE: the proposed diversification method

Slide 18

Slide 18 text

Analysis of I-SEE 18 (a) Ensemble size (E) (b) Diversification horizon (H) (c) Diversification ratio (𝜂)

Slide 19

Slide 19 text

Analysis of I-SEE 19 Diversity of DUME

Slide 20

Slide 20 text

Summary ● I-SEE builds a diversified user model ensemble with imitation learning ● It randomizes the initialization parameters of the neural networks ● Errors in user model learning may quickly propagate into policy learning ● To control the degree of noise: ○ The agent intermittently interacts with trainable user models ○ The diversified trajectories are of short horizon 20

Slide 21

Slide 21 text

Conclusion ● Diversification works ● However, it is not the more diverse, the better ○ Effectiveness peak when we only use intermittent and short sampled, diversified trajectories ○ Full diversification hurts the performance ● Fully-automatic training dialogue agents with simulators can be effective 21

Slide 22

Slide 22 text

Thank you! Paper: https://arxiv.org/abs/2106.00891 Code: https://github.com/smt-HS/I-SEE Lab website: http://infosense.cs.georgetown.edu/ 22

Slide 23

Slide 23 text

Additional Slides 23