Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DUET: How to make AI Reservation Agent via Telephony

DUET: How to make AI Reservation Agent via Telephony

KyoungTae Doh
NAVER Biz AI / DUET TF
https://linedevday.linecorp.com/jp/2019/sessions/A1-1

LINE DevDay 2019

November 20, 2019
Tweet

More Decks by LINE DevDay 2019

Other Decks in Technology

Transcript

  1. 2019 DevDay DUET: How To Make AI Reservation Agent via

    Telephony > KyoungTae Doh > NAVER Biz AI / DUET TF
  2. 1. People are not good at answering phones • Human

    error is a factor in telephony • Not well trained for answering calls • We are not aiming for the sky Restaurants want to cut costs on answering calls
  3. AI Speaker pattern - Command and Control Phone Call pattern

    - Build-up “Clova, Play songs of BTS” “Yes, playing the music” ( play the music) “Hello” “This is A Restaurant. May I help you?” “I would like to make a reservation” “Yes, When?” 2. Having a conversation over the phone with an AI is nothing like interacting with an AI Speaker
  4. 3. Having a conversation over the phone with an AI

    is nothing like interacting with an Chatbot • Chatbot has Visual Display • User can see the entire conversation • User can not remember everything on the phone • Chabot use multi-modal interface • Phone call can use voice only
  5. Conversation space Conversation space for AiCall Conversation Opening Conversation Closing

    Body User Request Agent Response Preclosing Signals Greeting + Cap. Check Phone call situation > No wake word context > Agent always says first > Users usually start conversations with a capability check > Need to detect preclosing signal > A conversation consists of an opening, body and closing > From the opening to the closing - time space
  6. What Do Users Want in Spoken Dialogue? Consider time as

    a critical constraint Time Overlap Silence Balance Mind turn allocation • Conversations can overlap with each other • Turn allocation should be handled naturally in this context Mind the length of utterances • Balancing the length of an utterance is fundamental in spoken dialogue • Spoken prompts should be written considering the length of utterances Mind silence • There could be silence during a conversation • Find ways to minimize silence and get users back on track during silence > Users want natural, continuous conversations > Conversations in the Voice User Interface (VUI) need to consider time as a critical constraint
  7. AiCall Conversation Design Framework 3 activities 1 task 8 turns

    4 sequences A: Hello. This is OUTBACK STEAKHOUSE. What can I help you with? U: Hi. I’d like to make a reservation. A: Okay. When are you coming? U: 7 p.m. tomorrow. A: How many people in your party? U: Four, maybe. A: Okay. Let me see. We have a table for 4 at 7. Want to make a reservation? U: Yes, please. Opening Scheduling Confirmation Reservation Made
  8. ੿ࠁ۝ (UTFS੄౸ױ  4FRVFODF ୶о Q. ݻ द ө૑ ৔স੉ۄҳਃ?

    য٣ ੿ܨ੢੉ਃ? ୶о Q. OVER MEET LESS USER QUESTION AICALL ANSWER WRONG The amount of information (perceived by users) Additional Question Additional Question When do you close? Which subway line?
  9. U: How can I get there? A: You can find

    the store…ten-minute walk distance from the subway station. U: How can I get there? A: You can find the store…ten-minute walk distance from the subway station. U: From where? A: From the subway station. Partial Repeat Request Definition Request U: I am going with my parents, and um can you recommend something for them? A: I think tenderloin steak and Toowoomba pasta will be great for your parents. U: What is Towoomba pasta? A: Toowomba pasta is our premium pasta based on spicy cream sauce.
  10. Definition Usage • Activity as a module • Customized for

    each task • Subset of task • Consists of sequences for completing one explicit action
  11. GATEWAY DM NLU ASR Stream Stream Request Text Voice User

    Speaker Wav Stream Response Text SYNTH
  12. GATEWAY DM NLU ASR Stream Send Full Duplex Receive Request

    Text Voice User Phone Wav Stream Response Text SYNTH
  13. Amount of discourse Pace Overlap Questions Stories Prosodic variation Loudness

    Gesture (Multimodal) High Considerateness High Involvement
  14. Latency       6  

     6     6      6    Sensitive at latency Using mid result Realtime Duplex Stream AsyncIO GPU Optimization Network optimization Stream optimization Usimg mid result
  15. Between design and engineering • Designed Path != What users

    really say • Understand the efficient way of improvement: Design or Engineering Finding a "realistic" path Activity 0 Opening Activity 10 Rescheduling Activity 5 Avail. Check Activity 7 Confirmation <Designed> Activity 0 Opening Activity 10 Rescheduling Activity 5 Avail. Check Activity 7 Confirmation <Real> Activity 1 Scheduling S%!#"  & U% 5!"  S%"  & U% "   !#   S%!   & U%3! 5!$ 
  16. Facing the Real Problem Task jump is real • Conversation

    without borders • Example: Reservation task with FAQ task Continue or not: Problem after task jump • Turn reconstruction • Sequence re-flow • Activity reconstruction Examples Reservation Reservation FAQ FAQ A,"()&  - U,"( - A,'&9(#%*   U, 9(#!   A,"$  - A,10( +   - U,"( - A,'&11(#%*   U,  A, 10( +  -
  17. Redesign both System and Conversation for Improvements Dialog management as

    a traffic light • Interface between UX designer and UX Engineers • DM can be a quick prototyping tool