$30 off During Our Annual Pro Sale. View Details »

Confusion Detection

Mao Saeki
November 16, 2022

Confusion Detection


Mao Saeki

November 16, 2022

Other Decks in Research


  1. 会話における混乱状態検出と、 マルチモーダル会話AIプラット フォームの開発 佐伯真於 早稲⽥⼤学,Equmenopolis Inc.

  2. 2 ⾃⼰紹介 佐伯 真於(さえき まお) l 早稲⽥⼤学 ⼩林研究室 博⼠後期課程 l

    Equmenopolis Inc. リサーチサイエンティスト l 興味:対話システム,特にパラ⾔語の理解と⽣成
  3. Confusion Detection for Adaptive Conversational Strategies of An Oral Proficiency

    Assessment Interview Agent Mao Saeki[1], Kotoka Miyagi[1], Shinya Fujie[2], Shungo Suzuki[1], Tetsuji Ogawa[1], Tetsunori Kobayashi[1], Yoichi Matsuyama[1] [1]Waseda University, Japan [2]Chiba Institute of Technology, Japan
  4. 4 Introduction Why confusion detection l In a conversation, listeners

    may become confused under situations such as … l Failing to hear due to noise l Not knowing a word or a concept l Unless resolved, the listener may become uncomfortable, or the conversation may break down
  5. 5 Detecting the cause of confusion Introduction Couldn’t hear Don’t

    know a word Don’t know a concept repeat rephrase give example l Causes of confusion are various l Knowing the cause can lead to more precise assistance
  6. 6 l Research Objective lAutomated detection of confusion l Research

    Questions lWhat are the signs of confusion? lIs it possible to predict the cause of confusion? Objectives
  7. 7 English assessment interview dialog l Conversation between Japanese English-learner

    and virtual agent for assessing speaking proficiency l Interview is conducted online Dialog Setting
  8. 8 Confusion Data collection design [1] R. Cumbal et al.,“Detection

    of Listener Uncertainty in Robot-Led Second Language Conversation Practice,” in ICMI 2020 [2] W. J. M. Levelt, “Speaking: From Intention to Articulation.” The MIT Press, 26-Aug-1993. l Confusion is fatal when it happens, but is rare l We artificially elicit confusion based on a procedure proposed by Cumbal et al.[1] l The procedure is expanded with 3 additional manipulation, based on the Leveltʼs comprehension process of speech [2]
  9. 9 Data collection design Manipulation Procedure Mixing non-existing words Increasing

    grammatical complexity
  10. 10 Data collection results Participants 47 Japanese English-learners Average interview

    duration 6 minutes Confused data samples 155 Not-confused data samples 372
  11. 11 Analysis on the cause of confusion Can you tell

    me … … system user Confused data sample 2s 5s Predicted True ① ② ③ ④
  12. 12 Signs of confusion Signs of confusion Feature extraction method

    Increased blinking activity of action unit (AU) 45 Averting gaze from screen absolute distance between the current gaze direction and the screen Rapid head movement rotation angle of the head relative to the screen Rapid eye movement absolute distance between the current gaze direction screen Moving the face towards screen head rotation and horizontal distance between the screen and head. Silence absence of user utterance using VAD Self-talk relative loudness by dividing the current user loudness by the mean loudness of all previous utterances, and head rotation voice activity, relative loudness, AU 45 intensity, gaze distance from the screen, head rotation, and head distance from the screen extracted every 40ms
  13. 13 Confusion detection results l Model: LSTM l Majority baseline

    accuracy: 0.706
  14. 14 Adaptive interview scenario

  15. 15 l Goal l Detection of confusion by identifying multimodal

    signs l Contributions l Proposed a data collection method to elicit confusion in different steps of speech processing l Showed the difficulty of predicting the cause of confusion using only user video l Identified 7 multimodal signs of confusion and conducted an ablation study to understand their importance Conclusion
  16. InteLLA対話システムと, マルチモーダル会話AI プラットフォームの開発

  17. None
  18. 18 Challenges of virtual agents Virtual agents must… - Know

    when to speak (or not to speak) - Understand nonverbal signs - Produce nonverbal signs Not just a text chat with a voice interface! I recently watched … Oh! (Nod) (laughter) Did you watch… While listening people make noises, gestures, and will interrupt!
  19. 19 Layered model of conversational processing and protocols Matsuyama 2015,

    Multiparty Conversation Facilitation Robots
  20. 21 Comparing human and automated interview 0 1 2 3

    4 5 I was able to demonstrate my English language ability to the full extent human interview automated interview Strongly agree Strongly disagree The agent was friendly The agent was listening carefully to my speech The agent was respectful of me The agent’s accent and intonation was natural The agent’s speech rate was appropriate The agent’s gestures were natural The conversational flow was natural Turn taking was natural Identified key factors using Backwards Stepwise Regression
  21. Thank you for listening!