Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Confusion Detection

Mao Saeki
November 16, 2022

Confusion Detection

佐伯真於,"会話における混乱状態検出と、マルチモーダル会話AIプラットフォームの開発",第30回NLPコロキウム,(2022.11)

Mao Saeki

November 16, 2022
Tweet

Other Decks in Research

Transcript

  1. 会話における混乱状態検出と、
    マルチモーダル会話AIプラット
    フォームの開発
    佐伯真於
    早稲⽥⼤学,Equmenopolis Inc.

    View Slide

  2. 2
    ⾃⼰紹介
    佐伯 真於(さえき まお)
    l 早稲⽥⼤学 ⼩林研究室 博⼠後期課程
    l Equmenopolis Inc. リサーチサイエンティスト
    l 興味:対話システム,特にパラ⾔語の理解と⽣成

    View Slide

  3. Confusion Detection for Adaptive
    Conversational Strategies of An
    Oral Proficiency Assessment
    Interview Agent
    Mao Saeki[1], Kotoka Miyagi[1], Shinya Fujie[2], Shungo Suzuki[1],
    Tetsuji Ogawa[1], Tetsunori Kobayashi[1], Yoichi Matsuyama[1]
    [1]Waseda University, Japan
    [2]Chiba Institute of Technology, Japan

    View Slide

  4. 4
    Introduction
    Why confusion detection
    l In a conversation, listeners may become confused under
    situations such as …
    l Failing to hear due to noise
    l Not knowing a word or a concept
    l Unless resolved, the listener may become uncomfortable, or the
    conversation may break down

    View Slide

  5. 5
    Detecting the cause of confusion
    Introduction
    Couldn’t hear
    Don’t know a word
    Don’t know a concept
    repeat
    rephrase
    give example
    l Causes of confusion are various
    l Knowing the cause can lead to more precise assistance

    View Slide

  6. 6
    l Research Objective
    lAutomated detection of confusion
    l Research Questions
    lWhat are the signs of confusion?
    lIs it possible to predict the cause of confusion?
    Objectives

    View Slide

  7. 7
    English assessment interview dialog
    l Conversation between Japanese English-learner and virtual
    agent for assessing speaking proficiency
    l Interview is conducted online
    Dialog Setting

    View Slide

  8. 8
    Confusion Data collection design
    [1] R. Cumbal et al.,“Detection of Listener Uncertainty in Robot-Led Second Language Conversation Practice,” in ICMI 2020
    [2] W. J. M. Levelt, “Speaking: From Intention to Articulation.” The MIT Press, 26-Aug-1993.
    l Confusion is fatal when it happens, but is rare
    l We artificially elicit confusion based on a procedure proposed by
    Cumbal et al.[1]
    l The procedure is expanded with 3 additional manipulation, based
    on the Leveltʼs comprehension process of speech [2]

    View Slide

  9. 9
    Data collection design
    Manipulation Procedure
    Mixing non-existing words Increasing grammatical complexity

    View Slide

  10. 10
    Data collection results
    Participants 47 Japanese English-learners
    Average interview duration 6 minutes
    Confused data samples 155
    Not-confused data samples 372

    View Slide

  11. 11
    Analysis on the cause of confusion
    Can you tell me …

    system
    user
    Confused data sample
    2s
    5s
    Predicted
    True




    View Slide

  12. 12
    Signs of confusion
    Signs of confusion Feature extraction method
    Increased blinking activity of action unit (AU) 45
    Averting gaze from screen absolute distance between the current gaze
    direction and the screen
    Rapid head movement rotation angle of the head relative to the screen
    Rapid eye movement absolute distance between the current gaze
    direction screen
    Moving the face towards screen head rotation and horizontal distance between
    the screen and head.
    Silence absence of user utterance using VAD
    Self-talk relative loudness by dividing the current user
    loudness by the mean loudness of all previous
    utterances, and head rotation
    voice activity, relative loudness, AU 45 intensity, gaze distance from the screen,
    head rotation, and head distance from the screen extracted every 40ms

    View Slide

  13. 13
    Confusion detection results
    l Model: LSTM
    l Majority baseline accuracy: 0.706

    View Slide

  14. 14
    Adaptive interview scenario

    View Slide

  15. 15
    l Goal
    l Detection of confusion by identifying multimodal signs
    l Contributions
    l Proposed a data collection method to elicit confusion in different steps
    of speech processing
    l Showed the difficulty of predicting the cause of confusion using only
    user video
    l Identified 7 multimodal signs of confusion and conducted an ablation
    study to understand their importance
    Conclusion

    View Slide

  16. InteLLA対話システムと,
    マルチモーダル会話AI
    プラットフォームの開発

    View Slide

  17. View Slide

  18. 18
    Challenges of virtual agents
    Virtual agents must…
    - Know when to speak (or not to speak)
    - Understand nonverbal signs
    - Produce nonverbal signs
    Not just a text chat
    with a voice
    interface!
    I recently watched …
    Oh!
    (Nod) (laughter) Did you watch…
    While listening people make noises,
    gestures, and will interrupt!

    View Slide

  19. 19
    Layered model of conversational processing and protocols
    Matsuyama 2015, Multiparty Conversation Facilitation Robots

    View Slide

  20. 21
    Comparing human and automated interview
    0
    1
    2
    3
    4
    5
    I was able to demonstrate my English
    language ability to the full extent
    human interview automated interview
    Strongly
    agree
    Strongly
    disagree
    The agent was friendly
    The agent was listening
    carefully to my speech
    The agent was
    respectful of me
    The agent’s accent and
    intonation was natural
    The agent’s speech rate
    was appropriate
    The agent’s gestures
    were natural
    The conversational flow
    was natural
    Turn taking was natural
    Identified key
    factors using
    Backwards
    Stepwise
    Regression

    View Slide

  21. Thank you for listening!

    View Slide