Confusion Detection

会話における混乱状態検出と、マルチモーダル会話AIプラットフォームの開発佐伯真於早稲⽥⼤学，Equmenopolis Inc.

2 ⾃⼰紹介佐伯真於（さえきまお） l 早稲⽥⼤学⼩林研究室博⼠後期課程 l
Equmenopolis Inc. リサーチサイエンティスト l 興味：対話システム，特にパラ⾔語の理解と⽣成

Confusion Detection for Adaptive Conversational Strategies of An Oral Proficiency
Assessment Interview Agent Mao Saeki[1], Kotoka Miyagi[1], Shinya Fujie[2], Shungo Suzuki[1], Tetsuji Ogawa[1], Tetsunori Kobayashi[1], Yoichi Matsuyama[1] [1]Waseda University, Japan [2]Chiba Institute of Technology, Japan

4 Introduction Why confusion detection l In a conversation, listeners
may become confused under situations such as … l Failing to hear due to noise l Not knowing a word or a concept l Unless resolved, the listener may become uncomfortable, or the conversation may break down

5 Detecting the cause of confusion Introduction Couldn’t hear Don’t
know a word Don’t know a concept repeat rephrase give example l Causes of confusion are various l Knowing the cause can lead to more precise assistance

6 l Research Objective lAutomated detection of confusion l Research
Questions lWhat are the signs of confusion? lIs it possible to predict the cause of confusion? Objectives

7 English assessment interview dialog l Conversation between Japanese English-learner
and virtual agent for assessing speaking proficiency l Interview is conducted online Dialog Setting

8 Confusion Data collection design [1] R. Cumbal et al.,“Detection
of Listener Uncertainty in Robot-Led Second Language Conversation Practice,” in ICMI 2020 [2] W. J. M. Levelt, “Speaking: From Intention to Articulation.” The MIT Press, 26-Aug-1993. l Confusion is fatal when it happens, but is rare l We artificially elicit confusion based on a procedure proposed by Cumbal et al.[1] l The procedure is expanded with 3 additional manipulation, based on the Leveltʼs comprehension process of speech [2]

9 Data collection design Manipulation Procedure Mixing non-existing words Increasing
grammatical complexity

10 Data collection results Participants 47 Japanese English-learners Average interview
duration 6 minutes Confused data samples 155 Not-confused data samples 372

11 Analysis on the cause of confusion Can you tell
me … … system user Confused data sample 2s 5s Predicted True ① ② ③ ④

12 Signs of confusion Signs of confusion Feature extraction method
Increased blinking activity of action unit (AU) 45 Averting gaze from screen absolute distance between the current gaze direction and the screen Rapid head movement rotation angle of the head relative to the screen Rapid eye movement absolute distance between the current gaze direction screen Moving the face towards screen head rotation and horizontal distance between the screen and head. Silence absence of user utterance using VAD Self-talk relative loudness by dividing the current user loudness by the mean loudness of all previous utterances, and head rotation voice activity, relative loudness, AU 45 intensity, gaze distance from the screen, head rotation, and head distance from the screen extracted every 40ms

13 Confusion detection results l Model: LSTM l Majority baseline
accuracy: 0.706

14 Adaptive interview scenario

15 l Goal l Detection of confusion by identifying multimodal
signs l Contributions l Proposed a data collection method to elicit confusion in different steps of speech processing l Showed the difficulty of predicting the cause of confusion using only user video l Identified 7 multimodal signs of confusion and conducted an ablation study to understand their importance Conclusion

InteLLA対話システムと，マルチモーダル会話AI プラットフォームの開発

18 Challenges of virtual agents Virtual agents must… - Know
when to speak (or not to speak) - Understand nonverbal signs - Produce nonverbal signs Not just a text chat with a voice interface! I recently watched … Oh! (Nod) (laughter) Did you watch… While listening people make noises, gestures, and will interrupt!

19 Layered model of conversational processing and protocols Matsuyama 2015,
Multiparty Conversation Facilitation Robots

21 Comparing human and automated interview 0 1 2 3
4 5 I was able to demonstrate my English language ability to the full extent human interview automated interview Strongly agree Strongly disagree The agent was friendly The agent was listening carefully to my speech The agent was respectful of me The agent’s accent and intonation was natural The agent’s speech rate was appropriate The agent’s gestures were natural The conversational flow was natural Turn taking was natural Identified key factors using Backwards Stepwise Regression

Thank you for listening!

Confusion Detection

Confusion Detection

Mao Saeki

Other Decks in Research

Featured

Transcript

会話における混乱状態検出と、マルチモーダル会話AIプラットフォームの開発佐伯真於早稲⽥⼤学，Equmenopolis Inc.

2 ⾃⼰紹介佐伯真於（さえきまお） l 早稲⽥⼤学⼩林研究室博⼠後期課程 l