Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Continual Prompt Tuning for Dialog State Tracking

Continual Prompt Tuning for Dialog State Tracking

Seunghyun Hwang

February 16, 2023
Tweet

More Decks by Seunghyun Hwang

Other Decks in Research

Transcript

  1. Continual Prompt Tuning for Dialog State Tracking Presented by Seunghyun

    Hwang 2023. 2. 16. 1 Qi Zhu , Bing Li , Fei Mi , Xiaoyan Zhu , Minlie Huang ACL, 2022 Reading club 3 Research outcome 0
  2. Contents 1. Continual Prompt Tuning for Dialogue State Tracking -

    Overview 2. Background Information 1. Continual learning 2. Prompt-based tuning 3. Dialogue state tracking 3. Continual Prompt Tuning for Dialogue State Tracking - Model Structure(Method) 4. Experiment Result 2
  3. Contents 1. Continual Prompt Tuning for Dialogue State Tracking -

    Overview 2. Background Information 1. Continual learning 2. Prompt-based tuning 3. Dialogue state tracking 3. Continual Prompt Tuning for Dialogue State Tracking - Model Structure(Method) 4. Experiment Result 3
  4. Continual Prompt Tuning for DST 5 Overview Continual Learning Prompt

    Tuning Dialogue State Tracking Model - Catastrophic forgetting problem - Model support new domain service - Parameter-efficient to avoid forgetting - Knowledge transfer between tasks - Crucial for a dialog system to continually learn new tasks - Deployed dialog system is often required above
  5. Contents 1. Continual Prompt Tuning for Dialogue State Tracking -

    Overview 2. Background Information 1. Continual learning 2. Prompt-based tuning 3. Dialogue state tracking 3. Continual Prompt Tuning for Dialogue State Tracking - Model Structure(Method) 4. Experiment Result 8
  6. Continual Learning 9 • Similar concept with incremental learning •

    Continually acquiring knowledge from a data stream and reusing it for future learning while avoiding forgetting • Three methods of continual learning • Rehearsal method[1] • Regularization method[2] • Architectual method[3] [1] Rebuffi, Sylvestre-Alvise, et al. "icarl: Incremental classifier and representation learning." , CVPR 2017 [2] Kirkpatrick, James, et al. "Overcoming catastrophic forgetting in neural networks.“, PNAS 2017 [3] Rusu, Andrei A., et al. "Progressive neural networks.“, 2016 Background Information
  7. Continual Learning in dialogue system 10 • Various general CL

    methods have been applied[1] • AdapterCL[2] • Most related with this paper • Freezes the pre-trained model and learn adapter • Paper method is more parameter-efficient [1 Lee, Sungjin. "Toward continual learning for conversational agents.“, 2017 [2] Madotto, Andrea, et al. "Continual learning in task-oriented dialogue systems.“, 2021 Background Information
  8. Prompt Tuning 11 • Using a textual prompt to convert

    downstream tasks is a more effective way to use finetuning[1] • Prompts whose embeddings are learned through back-propagation[2] • Prompt tuning is parameter-efficient and becomes more competitive with fine-tuning as the model size grows[3] [1] Brown, Tom, et al. "Language models are few-shot learners.“, Neurips 2020 [2] Liu, Xiao, et al. "GPT understands, too.“, (2021) [3] Lester, Brian, Rami Al-Rfou, and Noah Constant. "The power of scale for parameter-efficient prompt tuning “, (2021). Background Information
  9. Prompt Tuning 12 • Prompt tuning differs from embedding adapter[1]

    • Embedding adapter transforms all tokens embeddings but do not affect transformer layers’ computation • Gu[2] and Vu[3] further explore the transferability of soft prompts across tasks • One-step adaptation -> Prompt transfer in the continual learning setting [1] Zhu, Yaoming, et al. "Counter-interference adapter for multilingual machine translation.“, 2021 [2] Gu, Yuxian, et al. "Ppt: Pre-trained prompt tuning for few-shot learning.", 2021 [3] Vu, Tu, et al. "Spot: Better frozen model adaptation through soft prompt transfer.“, 2021 Background Information
  10. Dialogue State Tracking (DST) 13 NLU Natural Language Understanding DST

    Dialogue State Tracking NLG Natural Language Generation DP Dialogue Policy learning people_num=5 Restaurant_Book (Area = Hoegi) Restaurant_Book (Area = Hoegi, people_num = 5) DST is a dialogue-level task that maps partial dialogues into dialogue states. • Input: a dialogue / a turn • Output: dialogue state (e.g. slot-value pairs) Can you help me book a restaurant near Hoegi Station? For five people, thanks! Dialogue state tracking
  11. Dialogue State Tracking 14 • Generation-based models either generate all

    (slot, value) pairs in one pass[1] • Or generate value for each given slot separately[2] • Efficiency vs Incorporating more information • Integrates multiple slot descriptions into a single query and generates all values in one pass [1] Madotto, Andrea, et al. "Continual learning in task-oriented dialogue systems.“, 2021 [2] Wu, Chien-Sheng, et al. "Transferable multi-domain state generator for task-oriented dialogue systems.“, 2019 Background Information
  12. Contents 1. Continual Prompt Tuning for Dialogue State Tracking -

    Overview 2. Background Information 1. Continual learning 2. Prompt-based tuning 3. Dialogue state tracking 3. Continual Prompt Tuning for Dialogue State Tracking - Model Structure(Method) 4. Experiment Result 15
  13. Problem setting 16 Model Overview 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑻𝑻𝟏𝟏 ,

    … 𝑻𝑻𝒕𝒕 𝑎𝑎𝑎𝑎𝑎𝑎 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑫𝑫𝑫𝑫, … 𝑫𝑫𝒕𝒕 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝒚𝒚 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝒙𝒙 𝑎𝑎𝑎𝑎𝑎𝑎 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑻𝑻𝒌𝒌 ∶ 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝒇𝒇 ∶ 𝑋𝑋 × 𝑇𝑇 → 𝑌𝑌 𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑇𝑇𝑘𝑘 ℎ𝑎𝑎𝑎𝑎 𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑺𝑺𝒌𝒌 = 𝑠𝑠1 , … , 𝑠𝑠𝑛𝑛𝑘𝑘 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 𝒙𝒙 𝑖𝑖𝑖𝑖 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑎𝑎𝑎𝑎𝑎𝑎 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝒚𝒚 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠_𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ∶ { 𝑠𝑠1 , 𝑣𝑣1 , … 𝑠𝑠𝑛𝑛𝑘𝑘 , 𝑣𝑣𝑛𝑛𝑘𝑘 } Prompt Tuning 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑡𝑡𝑡𝑡𝑡 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑎𝑎𝑎𝑎𝑎𝑎 𝑎𝑎𝑎𝑎𝑎𝑎 𝑛𝑛𝑛𝑛𝑛𝑛 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑇𝑇𝑘𝑘 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑃𝑃𝑘𝑘 = 𝑃𝑃𝑘𝑘 1𝑃𝑃𝑘𝑘 2 … 𝑃𝑃𝑘𝑘 𝑚𝑚 , 𝒎𝒎 𝑛𝑛𝑛𝑛𝑛𝑛 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 Model Structure
  14. Problem setting 17 DST as Masked spans Recovering 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑔𝑔𝑘𝑘

    : 𝑋𝑋 × 𝑌𝑌 → 𝑉𝑉∗ × 𝑉𝑉∗ 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑽𝑽 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑇𝑇𝑘𝑘 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ∶ � 𝑥𝑥, � 𝑦𝑦 = 𝑔𝑔𝑘𝑘 𝑥𝑥, 𝑦𝑦 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 � 𝒙𝒙, � 𝒚𝒚 𝑎𝑎𝑎𝑎𝑎𝑎 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎𝑎𝑎 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑄𝑄𝑘𝑘 = "𝑑𝑑1 𝑘𝑘: < 𝑀𝑀1 > ⋯ 𝑑𝑑𝑛𝑛𝑘𝑘 𝑘𝑘 : < 𝑀𝑀𝑛𝑛𝑘𝑘 > " � 𝑥𝑥 = 𝑥𝑥; 𝑄𝑄𝑘𝑘 ; 𝑃𝑃𝑘𝑘 � 𝑦𝑦 = " < 𝑀𝑀1 > 𝑣𝑣1 𝑘𝑘 … < 𝑀𝑀𝑛𝑛𝑘𝑘 > 𝑣𝑣𝑛𝑛𝑘𝑘 𝑘𝑘 " ℒ𝜃𝜃𝑃𝑃𝑘𝑘 𝐷𝐷𝑘𝑘 = − � 𝑗𝑗=1 𝐷𝐷𝑘𝑘 log 𝑝𝑝𝜃𝜃 (𝑦𝑦𝑗𝑗 ~𝑘𝑘|[𝑥𝑥𝑗𝑗 𝑘𝑘; 𝑄𝑄𝑘𝑘 ; 𝑃𝑃𝑘𝑘 ]) Model Structure
  15. Continual learning : Forward Transfer 19 • Continual Prompt Initialization

    • CLInit – selects last task’s prompt 𝑃𝑃𝑘𝑘−1 to initialize current task’s prompt 𝑃𝑃𝑘𝑘 • SelectInit - selects the previous prompt with the lowest loss to initialize 𝑃𝑃𝑘𝑘 • Query Fusion • Sample 𝑛𝑛1 slots from 𝑆𝑆𝑘𝑘 randomly, where 𝑛𝑛1 is sample from [1, 𝑆𝑆𝑘𝑘 ] uniformly • Sample 𝑛𝑛2 slots from ⋃𝑖𝑖<𝑘𝑘 𝑆𝑆𝑖𝑖 randomly, where 𝑛𝑛2 is sample from [1, 𝑛𝑛1 ] uniformly • Combine 𝑛𝑛1 and 𝑛𝑛2 slots’ descriptions in a random order : 𝑄𝑄𝑘𝑘 ′ Model Structure
  16. Continual learning : Forward Transfer 20 • Memory Replay •

    Store a few samples for each task and replay them when training on new tasks • Store |𝑀𝑀| samples for each task 𝑇𝑇𝑖𝑖 , 𝑀𝑀𝑖𝑖 • Change loss function to ℒ𝜃𝜃𝑃𝑃𝑘𝑘 𝐷𝐷𝑘𝑘 + 𝑀𝑀<𝑘𝑘 Model Structure
  17. Continual learning : Backward Transfer 21 • Memory-Guided Backward Transfer

    • For each previous task 𝑇𝑇𝑖𝑖 , 𝑖𝑖 < 𝑘𝑘, we initialize a new prompt 𝑃𝑃 𝑖𝑖 (𝑘𝑘) 𝑡𝑡𝑡𝑡 𝑃𝑃𝑖𝑖 • Trained it on current task’s data 𝐷𝐷𝑘𝑘 with memory 𝑀𝑀𝑖𝑖 as regularization • Gradient from data and memory are 𝑔𝑔𝑜𝑜𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎𝑎𝑎 𝑔𝑔𝑟𝑟𝑟𝑟𝑟𝑟 • Update with below gradient Model Structure
  18. Contents 1. Continual Prompt Tuning for Dialogue State Tracking -

    Overview 2. Background Information 1. Continual learning 2. Prompt-based tuning 3. Dialogue state tracking 3. Continual Prompt Tuning for Dialogue State Tracking - Model Structure(Method) 4. Experiment Result 23
  19. Experiment setting 24 • Dataset • Schema-Guided Dialog dataset(SGD)[1] •

    Evaluation Method • Joint Goal Accuracy(JGA)[2] • Effect of forward transfer[3] • Effect of backward transfer[3] Experiment Result [1] Rastogi, Abhinav, et al. "Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset." , AAAI 2020. [2] Wu, Chien-Sheng, et al. "Transferable multi-domain state generator for task-oriented dialogue systems.“, 2019 [3] Lopez-Paz, David, and Marc'Aurelio Ranzato. "Gradient episodic memory for continual learning.“, Neurips 2017
  20. Experiment result 27 Experiment Result JGA with different model size

    and prompt lengths FWT with different model size and prompt lengths