Slide 1

Slide 1 text

2019 DevDay DUET: How To Make AI Reservation Agent via Telephony > KyoungTae Doh > NAVER Biz AI / DUET TF

Slide 2

Slide 2 text

DUET = Project AiCall = Product

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Why We Started This Project

Slide 5

Slide 5 text

Does anyone call the restaurant to make reservations these days?

Slide 6

Slide 6 text

Which device do people use… …when making reservations?

Slide 7

Slide 7 text

65 vs 35

Slide 8

Slide 8 text

Reality != Guesses, Assumptions Meet the user. Take a different approach. Data. Data. Data.

Slide 9

Slide 9 text

Before development began… We just called… So many times

Slide 10

Slide 10 text

1. People are not good at answering phones • Human error is a factor in telephony • Not well trained for answering calls • We are not aiming for the sky Restaurants want to cut costs on answering calls

Slide 11

Slide 11 text

AI Speaker pattern - Command and Control Phone Call pattern - Build-up “Clova, Play songs of BTS” “Yes, playing the music” ( play the music) “Hello” “This is A Restaurant. May I help you?” “I would like to make a reservation” “Yes, When?” 2. Having a conversation over the phone with an AI is nothing like interacting with an AI Speaker

Slide 12

Slide 12 text

3. Having a conversation over the phone with an AI is nothing like interacting with an Chatbot • Chatbot has Visual Display • User can see the entire conversation • User can not remember everything on the phone • Chabot use multi-modal interface • Phone call can use voice only

Slide 13

Slide 13 text

Conversational UX

Slide 14

Slide 14 text

UX Designer interview

Slide 15

Slide 15 text

Understand the Conversation Space

Slide 16

Slide 16 text

Conversation space Conversation space for AiCall Conversation Opening Conversation Closing Body User Request Agent Response Preclosing Signals Greeting + Cap. Check Phone call situation > No wake word context > Agent always says first > Users usually start conversations with a capability check > Need to detect preclosing signal > A conversation consists of an opening, body and closing > From the opening to the closing - time space

Slide 17

Slide 17 text

Understand the Characteristics of Spoken Dialogue

Slide 18

Slide 18 text

What Do Users Want in Spoken Dialogue? Consider time as a critical constraint Time Overlap Silence Balance Mind turn allocation • Conversations can overlap with each other • Turn allocation should be handled naturally in this context Mind the length of utterances • Balancing the length of an utterance is fundamental in spoken dialogue • Spoken prompts should be written considering the length of utterances Mind silence • There could be silence during a conversation • Find ways to minimize silence and get users back on track during silence > Users want natural, continuous conversations > Conversations in the Voice User Interface (VUI) need to consider time as a critical constraint

Slide 19

Slide 19 text

AiCall Conversation Design Framework

Slide 20

Slide 20 text

AiCall Conversation Design Framework Activity Task Turn Sequence

Slide 21

Slide 21 text

AiCall Conversation Design Framework 3 activities 1 task 8 turns 4 sequences A: Hello. This is OUTBACK STEAKHOUSE. What can I help you with? U: Hi. I’d like to make a reservation. A: Okay. When are you coming? U: 7 p.m. tomorrow. A: How many people in your party? U: Four, maybe. A: Okay. Let me see. We have a table for 4 at 7. Want to make a reservation? U: Yes, please. Opening Scheduling Confirmation Reservation Made

Slide 22

Slide 22 text

Sequence and Expansion

Slide 23

Slide 23 text

੿ࠁ۝ (UTFS੄౸ױ 4FRVFODF ୶о Q. ݻ द ө૑ ৔স੉ۄҳਃ? য٣ ੿ܨ੢੉ਃ? ୶о Q. OVER MEET LESS USER QUESTION AICALL ANSWER WRONG The amount of information (perceived by users) Additional Question Additional Question When do you close? Which subway line?

Slide 24

Slide 24 text

U: How can I get there? A: You can find the store…ten-minute walk distance from the subway station. U: How can I get there? A: You can find the store…ten-minute walk distance from the subway station. U: From where? A: From the subway station. Partial Repeat Request Definition Request U: I am going with my parents, and um can you recommend something for them? A: I think tenderloin steak and Toowoomba pasta will be great for your parents. U: What is Towoomba pasta? A: Toowomba pasta is our premium pasta based on spicy cream sauce.

Slide 25

Slide 25 text

Activities

Slide 26

Slide 26 text

Definition Usage • Activity as a module • Customized for each task • Subset of task • Consists of sequences for completing one explicit action

Slide 27

Slide 27 text

Activity Graph "DUJWJUZ "DUJWJUZ "DUJWJUZ "DUJWJUZ "DUJWJUZ "DUJWJUZ "DUJWJUZ ૓ੑ׮ন 5BTL "DUJPO /PU 6TFE "DUJWJUZ Multi- entry

Slide 28

Slide 28 text

There’s No Failure in Conversation

Slide 29

Slide 29 text

AI Technology on Telephony

Slide 30

Slide 30 text

GATEWAY DM NLU ASR Stream Stream Request Text Voice User Speaker Wav Stream Response Text SYNTH

Slide 31

Slide 31 text

GATEWAY DM NLU ASR Stream Send Full Duplex Receive Request Text Voice User Phone Wav Stream Response Text SYNTH

Slide 32

Slide 32 text

Consider time as a critical constraint Time Overlap Silence Balance Barge-in

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

8K 16K 24K Speech Recognition

Slide 35

Slide 35 text

Natural Tone via telephony Speaker Voice Actor Tone Synth Tone Speech Synthesis

Slide 36

Slide 36 text

Amount of discourse Pace Overlap Questions Stories Prosodic variation Loudness Gesture (Multimodal) High Considerateness High Involvement

Slide 37

Slide 37 text

Using Models for Telephony Contextual Hints Multi-turn Task Movement Barge-in

Slide 38

Slide 38 text

Multi-turn NLU vs Single-turn NLU Context Command & control

Slide 39

Slide 39 text

72 vs 99

Slide 40

Slide 40 text

2 weeks ago API problem Kanji to database

Slide 41

Slide 41 text

Call NLU

Slide 42

Slide 42 text

Single turn

Slide 43

Slide 43 text

Multi-turn

Slide 44

Slide 44 text

72 vs 99

Slide 45

Slide 45 text

72 vs 99

Slide 46

Slide 46 text

Engineering Issues

Slide 47

Slide 47 text

( Engineer Interview video )

Slide 48

Slide 48 text

Twilio, Nexmo, Other Local Carriers PSTN VoIP WebSocket

Slide 49

Slide 49 text

Latency 6 6 6 6 Sensitive at latency Using mid result Realtime Duplex Stream AsyncIO GPU Optimization Network optimization Stream optimization Usimg mid result

Slide 50

Slide 50 text

Why UX Engineering is Important

Slide 51

Slide 51 text

Between design and engineering • Designed Path != What users really say • Understand the efficient way of improvement: Design or Engineering Finding a "realistic" path Activity 0 Opening Activity 10 Rescheduling Activity 5 Avail. Check Activity 7 Confirmation Activity 0 Opening Activity 10 Rescheduling Activity 5 Avail. Check Activity 7 Confirmation Activity 1 Scheduling S%!#" & U% 5!" S%"& U% " !# S%! & U%3! 5!$

Slide 52

Slide 52 text

Facing the Real Problem Task jump is real • Conversation without borders • Example: Reservation task with FAQ task Continue or not: Problem after task jump • Turn reconstruction • Sequence re-flow • Activity reconstruction Examples Reservation Reservation FAQ FAQ A,"()& - U,"(- A,'&9(#%* U,9(#! A,"$ - A,10( + - U,"(- A,'&11(#%* U, A,10( + -

Slide 53

Slide 53 text

Redesign both System and Conversation for Improvements Dialog management as a traffic light • Interface between UX designer and UX Engineers • DM can be a quick prototyping tool

Slide 54

Slide 54 text

Lessons Learned And More…

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

No content