Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High-Quality Diversification for Task-Oriented Dialogue Systems

wing.nus
March 17, 2023
170

High-Quality Diversification for Task-Oriented Dialogue Systems

Many task-oriented dialogue systems use deep reinforcement learning (DRL) to learn policies that respond to the user appropriately and complete the tasks successfully. Training DRL agents with diverse dialogue trajectories prepare them well for rare user requests and unseen situations. One effective diversification method is to let the agent interact with a diverse set of learned user models. However, trajectories created by these artificial user models may contain generation errors, which can quickly propagate into the agent’s policy. It is thus important to control the quality of the diversification and resist the noise. In this paper, we propose a novel dialogue diversification method for task-oriented dialogue systems trained in simulators. Our method, Intermittent Short Extension Ensemble (I-SEE), constrains the intensity to interact with an ensemble of diverse user models and effectively controls the quality of the diversification. Evaluations show that I-SEE successfully boosts the performance of several DRL dialogue agents.

Speaker's bio: Dr. Grace Hui Yang is an Associate Professor in the Department of Computer Science at Georgetown University. Dr. Yang is leading the InfoSense (Information Retrieval and Sense-Making) group at Georgetown University, Washington D.C. Dr. Yang obtained her Ph.D. from Carnegie Mellon University in 2011. Her current research interests include deep reinforcement learning, interactive agents, and human-centered AI. Prior to this, she conducted research on question answering, automatic ontology construction, near-duplicate detection, multimedia information retrieval, and opinion and sentiment detection. Dr. Yang's research has been supported by the Defense Advanced Research Projects Agency (DARPA) and the National Science Foundation (NSF). Dr. Yang co-organized the Text Retrieval Conference (TREC) Dynamic Domain Track from 2015 to 2017 and led the effort for SIGIR privacy-preserving information retrieval workshops from 2014 to 2016. Dr. Yang has served on the editorial boards of ACM TOIS and Information Retrieval Journal (from 2014 to 2017) and has actively served as an organizing or program committee member in many conferences such as SIGIR, ECIR, ACL, AAAI, ICTIR, CIKM, WSDM, and WWW. She is a recipient of the NSF Faculty Early Career Development Program (CAREER) Award.

wing.nus

March 17, 2023
Tweet

More Decks by wing.nus

Transcript

  1. High-Quality Diversification for Task-
    Oriented Dialogue Systems
    Grace Hui Yang
    Joint work with my students Zhiwen Tang and Hrishikesh Kulkarni
    March 2, 2023 @ National University of Singapore

    View Slide

  2. Task-Oriented Dialogue Systems
    ● Training the DRL agent with user simulators
    ○ Training with real human user is expensive
    ● Most user simulators are rule-based
    ○ Designed by domain experts
    ○ Efficient in routine task scenarios
    ● But they are unable to generate spontaneous
    responses like real humans
    2

    View Slide

  3. 3

    View Slide

  4. Increase dialogue diversity
    ● Long-lasting research interest motivated by different needs:
    ○ Avoid dull responses
    ○ Obtain robust agents
    ● Ideas to improve diversification:
    ○ Enforce diversity in objective functions (Li et al., 2016a; Baheti et al., 2018)
    ○ Perturb language rules (Niu and Bansal, 2019) or environment parameters (Tobin
    et al., 2017; Ruiz et al., 2019)
    ○ Randomize trajectory synthesis (Andrychowicz et al., 2017; Lu et al., 2019)
    ○ Select more diverse data contributors (Stasaski et al., 2020)
    ○ Sample training trajectories from a diverse set of environments (Chua et al., 2018;
    Janner et al., 2019)
    4

    View Slide

  5. Training with Diversified User Models
    ● An ensemble of diversified user models
    ● One issue is error propagation
    ○ Errors propagate from (user) model learning to policy learning
    5

    View Slide

  6. 6

    View Slide

  7. We propose to
    ● Control the level of diversification
    ○ Control its intensity, frequency, and amount
    ○ Reach a balance between exploration/diversity and accuracy
    ● A diversification method for reinforcement learning-supported dialogue agent
    ○ Intermittent Short Extension Ensemble (I-SEE)
    7

    View Slide

  8. System Architecture
    8

    View Slide

  9. Intermittent Short Extension Ensemble (I-SEE)
    ● Generate a base trajectory by interacting with an
    expert user simulator
    9
    Base trajectory
    𝛕₁
    𝛕₂
    𝛕₃
    𝛕₄
    𝛕₅
    𝛕₀
    ● Intermittently branch the interaction trajectory
    ● Extend for a short horizon by interacting with a
    diversified user model
    ● Both the base trajectories and diversified trajectories
    are used for training
    Diversified trajectory

    View Slide

  10. 10

    View Slide

  11. Diversified User Model Ensemble (DUME)
    Create DUME with imitation learning (e.g.
    behavior cloning)
    11
    Dialogue Agent
    User Simulator
    Diversified User Model
    Ensemble (DUME)
    User models are neural networks with same architecture
    but with different initialization parameters.

    View Slide

  12. Control the Quality of Diversification
    ● How big is the ensemble
    (DUME) size?
    12
    Base trajectory
    𝛕₁
    𝛕₂
    𝛕₃
    𝛕₄
    𝛕₅
    𝛕₀
    Diversified trajectory
    ● How long is the
    branching horizon?
    ● How much is the branching
    intensity

    View Slide

  13. Policy Learning with I-SEE
    ● The dialogue agent starts by interacting with the
    expert simulator from t=0
    ● At t=p, the agent switch to interacting with a user
    model sampled from DUME
    ● We balance between diversity and quality by
    controlling
    ○ The branching intensity
    ○ The branching horizon
    ○ The ensemble size
    13

    View Slide

  14. 14

    View Slide

  15. 15

    View Slide

  16. Experiments
    ● Multiwoz (Budzianowski et al., 2018)
    ○ 7 domains: train, restaurant, taxi, ...
    ○ A user goal may involve multiple
    domains
    ○ 8,438 dialogues
    ● Evaluation Metrics
    ○ Success rate
    ○ Inform F1
    ○ Match
    ○ #Turns
    16
    ● Baselines
    ○ PPO (Schulman et al., 2017)
    ○ DQN (Mnih et al., 2015)
    ○ DDQ (DQN + unconstrained
    diversification) (Peng et al., 2018)
    ○ GDPL (PPO + IRL, leading
    performer on Multiwoz)
    (Takanobu et al., 2019)
    ○ MADPL (MARL, leading
    performer on Multiwoz)
    (Takanobu et al., 2020)

    View Slide

  17. Experiment Results
    17
    Three settings:
    ● X: the algorithm without diversification
    ● X + Dvs: full and uncontrolled diversification
    ● X+I-SEE: the proposed diversification method

    View Slide

  18. Analysis of I-SEE
    18
    (a) Ensemble size (E) (b) Diversification horizon (H) (c) Diversification ratio (𝜂)

    View Slide

  19. Analysis of I-SEE
    19
    Diversity of DUME

    View Slide

  20. Summary
    ● I-SEE builds a diversified user model ensemble with
    imitation learning
    ● It randomizes the initialization parameters of the
    neural networks
    ● Errors in user model learning may quickly propagate
    into policy learning
    ● To control the degree of noise:
    ○ The agent intermittently interacts with trainable
    user models
    ○ The diversified trajectories are of short horizon
    20

    View Slide

  21. Conclusion
    ● Diversification works
    ● However, it is not the more diverse, the better
    ○ Effectiveness peak when we only use intermittent and short sampled, diversified
    trajectories
    ○ Full diversification hurts the performance
    ● Fully-automatic training dialogue agents with simulators can be effective
    21

    View Slide

  22. Thank you!
    Paper: https://arxiv.org/abs/2106.00891
    Code: https://github.com/smt-HS/I-SEE
    Lab website: http://infosense.cs.georgetown.edu/
    22

    View Slide

  23. Additional Slides
    23

    View Slide