Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Description-Driven Task-Oriented Dialog Model

Description-Driven Task-Oriented Dialog Model

Seunghyun Hwang

November 03, 2022
Tweet

More Decks by Seunghyun Hwang

Other Decks in Research

Transcript

  1. Description-Driven Task-Oriented Dialog Modeling Presented by Seunghyun Hwang 2022. 11.

    3. 1 Jeffrey Zhao, Raghav Gupta, Yuan Cao, Dian Yu, Mingqiu Wang, Harrison Lee, Abhinav Rastogi, Izhak Shafran, Yonghui Wu Google Research, 2022 Reading club 2 Research outcome 0
  2. Contents 1. Dialogue state tracking 2. D3ST - Description-driven task-oriented

    dialog modeling 3. Paper Result 4. DSTC11 Track3 2
  3. Contents 1. Dialogue state tracking 2. D3ST - Description-driven task-oriented

    dialog modeling 3. Paper Result 4. DSTC11 Track3 3
  4. Task-Oriented Dialogue System 4 회기역 근처 5명 음식점 찾아줘 어떤

    음식 종류인가요? 간장밥 “행복은 간장밥” 어떠세요? EX) Part of restaurant reservation Dialogue state tracking
  5. Modularized Task-Oriented Dialogue System 5 • (Task-oriented) Dialog system is

    generally divided into four modules. NLU Natural Language Understanding DST Dialogue State Tracking NLG Natural Language Generation DP Dialogue Policy learning For how many people? Can you help me book a restaurant near Hoegi Station? Dialogue state tracking
  6. Dialogue State Tracking (DST) 6 NLU Natural Language Understanding DST

    Dialogue State Tracking NLG Natural Language Generation DP Dialogue Policy learning people_num=5 Restaurant_Book (Area = Hoegi) Restaurant_Book (Area = Hoegi, people_num = 5) DST is a dialogue-level task that maps partial dialogues into dialogue states. • Input: a dialogue / a turn • Output: dialogue state (e.g. slot-value pairs) Can you help me book a restaurant near Hoegi Station? For five people, thanks! Dialogue state tracking
  7. Dialogue State Tracking (DST) 7 DST is a dialogue-level task

    that maps partial dialogues into dialogue states. • Input: a dialogue / a turn with its previous state • Output: dialogue state (e.g. slot-value pairs) Dialogue state tracking USER : Can you help me book a restaurant near Hoegi Station? SYSTEM : For how many people? USER : For five people, thanks! Input Dialogue Restaurant_Book (Area = Hoegi station) Restaurant_Book (Area = Hoegi station, people_num=5) Output Dialogue State
  8. Dialogue State Tracking (DST) 8 NLU Natural Language Understanding DST

    Dialogue State Tracking NLG Natural Language Generation DP Dialogue Policy learning For how many people? Can you help me book a restaurant near Hoegi Station? Dialogue state tracking
  9. Current Approach of Dialogue System tracking 9 • Inclusion of

    task descriptions • Modify the input data to include a description of slot or examples of slot value[1] • Slot descriptions improve zero-shot transferability[2] • Generally, the inference is high cost because it get slot value one by one in turn. [1] Robust Zero-Shot Cross-Domain Slot Filling with Example Values, ACL 2019 [2] Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue State Tracking, NAACL 2021 Dialogue state tracking
  10. Current Approach of Dialogue System 10 • Prompting language model

    • Powerful language models like GPT[1] demonstrated impressive few-shot learning ability even without fine-tuning • Madotto et al.[2] applied GPT-2 by priming the model with examples for language understanding, state tracking, dialogue policy and language generation tasks respectively • Describe task with questions • Dialogue state tracking as a question answering (QA) or machine reading (MR) problem[3],[4] [1] Language models are unsupervised multitask learners, 2018 [2] Language models as few-shot learner for task-oriented dialogue systems, 2020 [3] Zero-shot generalization in dialog state tracking through generative question answering, EACL 2021 [4] From machine reading comprehension to dialogue state tracking: Bridging the gap, ACL 2020 Dialogue state tracking
  11. Contents 1. Dialogue state tracking 2. D3ST - Description-driven task-oriented

    dialog modeling 3. Paper Result 4. DSTC11 Track3 11
  12. Description-driven Task-oriented Dialog Modeling[1] 12 • Dialogue state tracking [1]

    Description-Driven Task-Oriented Dialog Modeling, 2022 D3ST Input text preprocessing Transformer Output
  13. Description-driven Task-oriented Dialog Modeling 13 • Model • Why seq2seq?

    • Seq2seq can easily handle different formats of language instructions • Seq2seq has been shown to be an effective approach for DST[1] • Use T5[2] from google [1] Effective sequence-tosequence dialogue state tracking, ACL 2021 [2] Exploring the limits of transfer learning with a unified text-to-text transformer, JMLR 2020 D3ST
  14. Description-Driven Modeling 14 Input 𝑑𝑖 𝑠𝑙𝑜𝑡, 𝑖 = 1, …

    𝑁 𝑎𝑛𝑑 𝑑𝑗 𝑖𝑛𝑡𝑒𝑛𝑡, 𝑗 = 1, … 𝑀 be the descriptions for slots, intents 0: 𝑑0 𝑠𝑙𝑜𝑡 … 𝐼: 𝑑𝐼 𝑠𝑙𝑜𝑡, 𝑖0 ∶ 𝑑0 𝑖𝑛𝑡 … 𝑖𝐽: 𝑑𝑗 𝑖𝑛𝑡 − slot descriptions, intent description 𝑢𝑠𝑟 𝑢0 𝑢𝑠𝑟 𝑠𝑦𝑠 𝑢 0 𝑠𝑦𝑠 … 𝑢𝑠𝑟 𝑢𝑇 𝑢𝑠𝑟 𝑠𝑦𝑠 𝑢 𝑇 𝑠𝑦𝑠 − input text Input format ∶ < 𝑠𝑙𝑜𝑡 𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛𝑠 >< 𝑖𝑛𝑡𝑒𝑛𝑡 𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛 >< 𝑖𝑛𝑝𝑢𝑡 𝑡𝑒𝑥𝑡 > D3ST Output Output format ∶ 𝑠𝑡𝑎𝑡𝑒𝑠 𝑎0 𝑠: 𝑣0 𝑠 … 𝑎𝑀 𝑠 : 𝑣𝑀 𝑠 𝑖𝑛𝑡𝑒𝑛𝑡𝑒𝑠 𝑎0 𝑖 … 𝑎𝑁 𝑖
  15. Description-Driven Modeling 15 Input D3ST Output 0:playback device on which

    the song is to be played 0a) bedroom speaker 0b) tv 0c) kitchen speaker 1=name of the artist the song is performed by 2=name of the song 3=album the song belongs to 4=genre of the song i0=search for a song based on the name and optionally other attributes i1=play a song by its name and optionally artist [user] i want to find a movie. [system] what is your location. [user] santa rosa. i want to see it at 3rd street cinema. [system] i found 3 movies. does hellboy, how to train your dragon: the hidden world or the upside interest you? [user] how to train your dragon: the hidden world is perfect. can you find me some songs from the album summer anthems. [states] 1:1a 2:summer anthems 4:no other love [intents] i0
  16. Description-Driven Modeling 16 • Handling categorical slots • Some slots

    are categorical(ex : hotel provides free wi-fi or not -> yes or no) • For categorical slot, provide categorical value. • i: 𝑑𝑖 𝑠𝑙𝑜𝑡𝑖𝑎) 𝑣𝑎 … 𝑖𝑘)𝑣𝑘 • Property • Other description dialogue system hardly rely on description to perform task • But D3ST use index-picking mechanism, model can understand description fully D3ST
  17. Contents 1. Dialogue state tracking 2. D3ST - Description-driven task-oriented

    dialog modeling 3. Paper Result 4. DSTC11 Track3 17
  18. Paper Experiments Setting 18 • Dataset • MultiWoz[1] • SGD

    [2] • Transformer • T5 • Evaluation • Joint goal accuracy(JGA) Paper Result [1] MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. [2] Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset.
  19. Paper Result – Comparison of Description Types 20 Paper Result

    Table - Originial multiwoz data Table - Few SGD data => Random type result meaning description of input is important => If the size of the dataset is small, the language description is much more data- efficient. Language : description, Name : slot name, Random : random word
  20. Paper Result – Zero shot transfer 21 Paper Result Table

    - Cross domain(leave-one-out) multiwoz data => D3ST is good for zero-shot transfer! Table – DSTC11 track3 organizer baseline Models Multiwoz 2.1 - dev DSTC11 Track3 - dev DST(Trade) 58.3 20.1 D3ST 57.5 40.1 => D3ST contains more features than original text
  21. Contents 1. Dialogue state tracking 2. D3ST - Description-driven task-oriented

    dialog modeling 3. Paper Result 4. DSTC11 Track3 22
  22. DSTC11 Track3 23 • Input change • MultiWoz dataset ->

    slot value changed MultiWoz dataset • Original text -> audio, ASR text(user) • Evaluation Method • MultiWoz – DST, POL, NLG • Track3 – DST [1] End-to-End Neural Pipeline for Goal-Oriented Dialogue Systems using GPT-2, ACL 2020 DSTC11 Track3 Models such as GPT-2 end-to-end model[1] are difficult to perform on this task
  23. 1) ASR Error Correction 26 • Rule - based •

    numerical format change (4 → four) • Time format (12:04 am → 0:04 am) • T5 module • Input : ASR text • Output : Origin text with preprocessing with above rule-based DSTC11 Track3 "Actual":"[str] please get me a ticket for one that leaves at 0:49 am and send me the reference number. [end]", "Source":"[str] please get me a ticket for one that leaves at 12:49 a m a m and send me the reference number [end]"
  24. 1) ASR Error Correction 27 • ASR Error correction (rule

    based + T5) result DSTC11 Track3 "Actual":"[str] please get me a ticket for one that leaves at 0:49 am and send me the reference number. [end]", "Prediction":“[str] please get me a ticket for one that leaves at 0:49 am and send me the reference number. [end]", "Source":"[str] please get me a ticket for one that leaves at 12:49 a m a m and send me the reference number [end]"
  25. 2) Text Based Dialogue System – D3ST 28 • Data

    preprocessing • Preprocess the input text after (1) according to the d3st format. • description + ASR Error correction result • Model setting • Transformer : T5 – base, large • Input : Preprocessing text • Output : Dialogue state DSTC11 Track3
  26. 2) Text Based Dialogue System – D3ST 29 DSTC11 Track3

    "src": "0 time of the restaurant booking 1 name of hospital department 2 number of people for the hotel booking 3 number of people booking the restaurant … 34 star rating of the hotel 35 length of stay at the hotel 36 leaving time for the train [user] i need a train leaving from little mountain this thursday. [system] in order to better assist you, may i please have your destination? … [user] please get me a ticket for one that leaves at 0:49 am and send me the reference number.", "tgt": "[states] 6 1 36 0:49 am 29 4:38 pm 21 little mountain 12 12f 18 kings lynn", • Our D3ST format data
  27. 3) Post processor 30 • Create database based on wiki

    • Check token similarity • If token similarity between the database, token is substituted database word DSTC11 Track3 Text origin word similar word similarity i also need a train leaving post falls after 9:33 am going to lignier lignier ligonier 0.9333%
  28. DSTC11 Track3 – Our approach 31 DSTC11 Track3 (1) (2)

    (3) ASR text formatting (special char, wrong verb) D3ST Text based model ASR state formatting (wrong proper noun)
  29. DSTC11 Track3 – Offical result 33 DSTC11 Track3 Table. Results

    of Joint Goal Accuracy Table. Results of Slot Error Rate Figure. Results of Joint Goal Accuracy Figure. Results of Slot Error Rate
  30. Future Work 34 • In D3ST paper, T5 XXL model

    is very powerful for dst task. • However, we could only use the T5 large model as a few epochs. • “We look forward to articles from the participating teams describing their systems. The deadline for paper submissions is November 14th.” • We will experiment above model with many epochs. DSTC11 Track3