DUET: How to make AI Reservation Agent via Telephony

DUET: How to make AI Reservation Agent via Telephony

KyoungTae Doh
NAVER Biz AI / DUET TF
https://linedevday.linecorp.com/jp/2019/sessions/A1-1

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

Transcript

  1. 2019 DevDay DUET: How To Make AI Reservation Agent via

    Telephony > KyoungTae Doh > NAVER Biz AI / DUET TF
  2. DUET = Project AiCall = Product

  3. None
  4. Why We Started This Project

  5. Does anyone call the restaurant to make reservations these days?

  6. Which device do people use… …when making reservations?

  7. 65 vs 35

  8. Reality != Guesses, Assumptions Meet the user. Take a different

    approach. Data. Data. Data.
  9. Before development began… We just called… So many times

  10. 1. People are not good at answering phones • Human

    error is a factor in telephony • Not well trained for answering calls • We are not aiming for the sky Restaurants want to cut costs on answering calls
  11. AI Speaker pattern - Command and Control Phone Call pattern

    - Build-up “Clova, Play songs of BTS” “Yes, playing the music” ( play the music) “Hello” “This is A Restaurant. May I help you?” “I would like to make a reservation” “Yes, When?” 2. Having a conversation over the phone with an AI is nothing like interacting with an AI Speaker
  12. 3. Having a conversation over the phone with an AI

    is nothing like interacting with an Chatbot • Chatbot has Visual Display • User can see the entire conversation • User can not remember everything on the phone • Chabot use multi-modal interface • Phone call can use voice only
  13. Conversational UX

  14. UX Designer interview

  15. Understand the Conversation Space

  16. Conversation space Conversation space for AiCall Conversation Opening Conversation Closing

    Body User Request Agent Response Preclosing Signals Greeting + Cap. Check Phone call situation > No wake word context > Agent always says first > Users usually start conversations with a capability check > Need to detect preclosing signal > A conversation consists of an opening, body and closing > From the opening to the closing - time space
  17. Understand the Characteristics of Spoken Dialogue

  18. What Do Users Want in Spoken Dialogue? Consider time as

    a critical constraint Time Overlap Silence Balance Mind turn allocation • Conversations can overlap with each other • Turn allocation should be handled naturally in this context Mind the length of utterances • Balancing the length of an utterance is fundamental in spoken dialogue • Spoken prompts should be written considering the length of utterances Mind silence • There could be silence during a conversation • Find ways to minimize silence and get users back on track during silence > Users want natural, continuous conversations > Conversations in the Voice User Interface (VUI) need to consider time as a critical constraint
  19. AiCall Conversation Design Framework

  20. AiCall Conversation Design Framework Activity Task Turn Sequence

  21. AiCall Conversation Design Framework 3 activities 1 task 8 turns

    4 sequences A: Hello. This is OUTBACK STEAKHOUSE. What can I help you with? U: Hi. I’d like to make a reservation. A: Okay. When are you coming? U: 7 p.m. tomorrow. A: How many people in your party? U: Four, maybe. A: Okay. Let me see. We have a table for 4 at 7. Want to make a reservation? U: Yes, please. Opening Scheduling Confirmation Reservation Made
  22. Sequence and Expansion

  23. ੿ࠁ۝ (UTFS੄౸ױ  4FRVFODF ୶о Q. ݻ द ө૑ ৔স੉ۄҳਃ?

    য٣ ੿ܨ੢੉ਃ? ୶о Q. OVER MEET LESS USER QUESTION AICALL ANSWER WRONG The amount of information (perceived by users) Additional Question Additional Question When do you close? Which subway line?
  24. U: How can I get there? A: You can find

    the store…ten-minute walk distance from the subway station. U: How can I get there? A: You can find the store…ten-minute walk distance from the subway station. U: From where? A: From the subway station. Partial Repeat Request Definition Request U: I am going with my parents, and um can you recommend something for them? A: I think tenderloin steak and Toowoomba pasta will be great for your parents. U: What is Towoomba pasta? A: Toowomba pasta is our premium pasta based on spicy cream sauce.
  25. Activities

  26. Definition Usage • Activity as a module • Customized for

    each task • Subset of task • Consists of sequences for completing one explicit action
  27. Activity Graph "DUJWJUZ "DUJWJUZ "DUJWJUZ "DUJWJUZ "DUJWJUZ "DUJWJUZ "DUJWJUZ ૓ੑ׮ন

    5BTL "DUJPO /PU 6TFE "DUJWJUZ Multi- entry
  28. There’s No Failure in Conversation

  29. AI Technology on Telephony

  30. GATEWAY DM NLU ASR Stream Stream Request Text Voice User

    Speaker Wav Stream Response Text SYNTH
  31. GATEWAY DM NLU ASR Stream Send Full Duplex Receive Request

    Text Voice User Phone Wav Stream Response Text SYNTH
  32. Consider time as a critical constraint Time Overlap Silence Balance

    Barge-in
  33. None
  34. 8K 16K 24K Speech Recognition

  35. Natural Tone via telephony Speaker Voice Actor Tone Synth Tone

    Speech Synthesis
  36. Amount of discourse Pace Overlap Questions Stories Prosodic variation Loudness

    Gesture (Multimodal) High Considerateness High Involvement
  37. Using Models for Telephony Contextual Hints Multi-turn Task Movement Barge-in

  38. Multi-turn NLU vs Single-turn NLU Context Command & control

  39. 72 vs 99

  40. 2 weeks ago API problem Kanji to database

  41. Call NLU

  42. Single turn

  43. Multi-turn

  44. 72 vs 99

  45. 72 vs 99

  46. Engineering Issues

  47. ( Engineer Interview video )

  48. Twilio, Nexmo, Other Local Carriers PSTN VoIP WebSocket

  49. Latency       6  

     6     6      6    Sensitive at latency Using mid result Realtime Duplex Stream AsyncIO GPU Optimization Network optimization Stream optimization Usimg mid result
  50. Why UX Engineering is Important

  51. Between design and engineering • Designed Path != What users

    really say • Understand the efficient way of improvement: Design or Engineering Finding a "realistic" path Activity 0 Opening Activity 10 Rescheduling Activity 5 Avail. Check Activity 7 Confirmation <Designed> Activity 0 Opening Activity 10 Rescheduling Activity 5 Avail. Check Activity 7 Confirmation <Real> Activity 1 Scheduling S%!#"  & U% 5!"  S%"  & U% "   !#   S%!   & U%3! 5!$ 
  52. Facing the Real Problem Task jump is real • Conversation

    without borders • Example: Reservation task with FAQ task Continue or not: Problem after task jump • Turn reconstruction • Sequence re-flow • Activity reconstruction Examples Reservation Reservation FAQ FAQ A,"()&  - U,"( - A,'&9(#%*   U, 9(#!   A,"$  - A,10( +   - U,"( - A,'&11(#%*   U,  A, 10( +  -
  53. Redesign both System and Conversation for Improvements Dialog management as

    a traffic light • Interface between UX designer and UX Engineers • DM can be a quick prototyping tool
  54. Lessons Learned And More…

  55. None
  56. None