Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EmojiCam: Emoji-assisted video communication system leveraging facial expressions - HCII2021 Paper Presentation

EmojiCam: Emoji-assisted video communication system leveraging facial expressions - HCII2021 Paper Presentation

This presentation was created for the poster session at the HCI International 2021.
http://2021.hci.international/

【Publication】
Kosaku Namikawa, Ippei Suzuki, Ryo Iijima, Sayan Sarcar, Yoichi Ochiai. (2021) EmojiCam: Emoji-Assisted Video Communication System Leveraging Facial Expressions. In: Kurosu M. (eds) Human-Computer Interaction. Design and User Experience Case Studies. HCII 2021. Lecture Notes in Computer Science, vol 12764. Springer, Cham.
https://doi.org/10.1007/978-3-030-78468-3_42
https://digitalnature.slis.tsukuba.ac.jp/2021/07/emojicam-hcii2021/

【Project page】
coming Soon

【Presenter】
Kosaku Namikawa (浪川洪作)
University of Tsukuba
School of Informatics, College of Media Arts, Science and Technology
Digital Nature Group (Yoichi Ochiai)

【Abstract】
This study proposes the design of a communication technique that uses graphical icons, including emojis, as an alternative to facial expressions in video calls. Using graphical icons instead of complex and hard-to-read video expressions simplifies and reduces the amount of information in a video conference. The aim was to facilitate communication by preventing quick and incorrect emotional delivery. In this study, we developed EmojiCam, a system that encodes the emotions of the sender with facial expression recognition or user input and presents graphical icons of the encoded emotions to the receiver. User studies and existing emoji cultures were applied to examine the communication flow and discuss the possibility of using emoji in video calls. Finally, we dis-cuss the new value that this model will bring and how it will change the style of video calling.

Digital Nature Group

July 27, 2021
Tweet

More Decks by Digital Nature Group

Other Decks in Research

Transcript

  1. © R&D Center for Digital Nature, University of Tsukuba
    EmojiCam: Emoji-Assisted Video Communication
    System Leveraging Facial Expressions
    Kosaku Namikawa, Ippei Suzuki, Ryo Iijima, Sayan Sarcar, and Yoichi Ochiai

    View Slide

  2. © R&D Center for Digital Nature, University of Tsukuba
    Outline
    ● Introduction
    ○ Online Conferencing is Increasing
    ○ Related Work: Studies in Computer Mediated Communications
    ● Implementation
    ○ Basic Idea and System Overview
    ○ Designed Communication Flow and System Detail
    ● User Study
    ○ Simple emotion estimation test
    ○ User interview and facial expression when EmojiCam was used for chatting
    ● Discussion
    ● Future Work
    2

    View Slide

  3. © R&D Center for Digital Nature, University of Tsukuba
    3
    Online meeting is more and more increasingly

    View Slide

  4. © R&D Center for Digital Nature, University of Tsukuba
    Current Video Call
    ● Long video call are so tiring
    ○ not as smooth as face-to-face communication
    ○ Keeping looking/being looked at people's faces means maintaining a state of tension
    ● Video call requires strong computer processor and high network bandwidth
    ○ Video encoding and decoding processes have a high processing load
    ○ Many users want to reduce the amount of traffic on their cell phone lines
    ○ Video is too much in some situation such as just only participating (but audio not enough)
    ● “Extension” of video call
    ○ Screen Sharing, Text Chat, Reactions, White Board
    ○ Each function is independent and not linked, causing distraction during a video call
    ○ More possibility
    4
    Extension to replace/assist face-to-face communication without sending video

    View Slide

  5. © R&D Center for Digital Nature, University of Tsukuba
    Related Work
    5
    ● Sending facial expression(nonverbal information) in computer mediated communication
    ○ Increasing interest in recognizing and transmitting nonverbal information
    ○ Visual communication markers such as emoji and sticker are used to represent non-verbal information in text
    communication
    ● Nonverbal communication in video/audio communication
    ○ Due to latency and resolution limitations, nonverbal communication equivalent to F2F has not been achieved
    ○ Synchronous communication was dominated by video and audio
    ■ Supplementary text chat and emoji reactions are used
    ■ Possibility of visual communication markers as synchronous communication
    ● Emoji and stickers have been used in recent years in social video platforms

    View Slide

  6. © R&D Center for Digital Nature, University of Tsukuba
    Basic Idea
    ● Social video platform apps as a video editor
    ○ Social video platform apps make complex video editing easily
    ○ Everyone is movie creator especially young people
    ● Live video editing in video call
    6
    Video Editor
    Video Call

    View Slide

  7. © R&D Center for Digital Nature, University of Tsukuba
    ● Web-based video jack-in application
    ● Facial recognition and facial expression recognition, keyboard input
    ● Overlap emoji on face
    ● Displaying emoji can customize including not emojis (all characters in Unicode, can multiple)
    EmojiCam System Overview
    7

    View Slide

  8. © R&D Center for Digital Nature, University of Tsukuba
    Implementation - communication flow
    ● Type of communication: Recipient understands sender's emotions
    ● Encode: Sender’s emotion → Coded Emotion (via interface)
    ● Decode: Coded Emotion → Recipient’s understanding (via VCMs such as emoji,chinese character)
    ● Coded/Simplified/(Verbalized/Limited) Emotion
    ○ Commonly understood as a set of emotion and expression among senders and receivers
    ○ Ekman’s 7 basic emotion(neutral, happy, sad, angry, disgusted, fearful, surprised)
    ○ Ex.) Emotion: angry, Expression: 😡 Pouting Face
    8

    View Slide

  9. © R&D Center for Digital Nature, University of Tsukuba
    Implementation - system detail
    9
    ● Encode Process:
    ○ auto face recognition
    (for emoji position)
    ○ 2 type of coded emotion input interface
    ■ Manual (keyboard input)
    ■ Facial expression recognition
    ● face-api.js, SSDMobilenetv1
    ● Decode Process
    ○ Unicode input by text box per each coded
    emotion
    ○ Control emoji size, Visibility of emoji

    View Slide

  10. © R&D Center for Digital Nature, University of Tsukuba
    User Study 1: Simple Emotion Estimation Test - Overview
    ● Purpose
    a. Sender’s emotion was conveyed correctly to
    recipient
    ● Procedure
    a. Encoding Process (Questioner)
    i. 11 Participants(Sender) divided 2 Groups
    (Exemplification / No exemplification)
    ii. Sender asked to record a 5 seconds video for
    facial expression of seven emotions
    iii.Create two videos from the recorded video
    ● EmojiCam (emoji was overlapped)
    ● Video (no additional process)
    b. Decoding Process (Respondent)
    i. 21 Participants(Recipient) asked to guess the
    emotion from the seven emotions in the video
    10
    Summary
    2 groups(Exemplification / No)
    2 videos (Video / EmojiCam)
    2 times
    7 emotions
    21 recipients

    View Slide

  11. © R&D Center for Digital Nature, University of Tsukuba
    User Study 1: Simple Emotion Estimation Test - Result & Analysys
    ● chi-square test
    ● Number of correct estimatehigher
    ○ Red: EmojiCam
    ○ Blue: Video
    ● Facial expression recognition
    ○ No exemplification: Lower accuracy
    ■ Difficult to recognize people too
    ■ But people is higher
    ○ Exemplification: Higher accuracy
    ■ Easy to recognize people too
    ■ Emoji’s expression works(sad, surprised)
    ■ The emoji were interpreted in different
    ways (Disgusted)
    11
    Exemplification No Exemplification

    View Slide

  12. © R&D Center for Digital Nature, University of Tsukuba
    ● Purpose
    a. Emotional distribution changes with the use of EmojiCam
    b. Usability verification of EmojiCam through user interview
    ● Procedure
    a. The subjects were divided into two groups
    i. Normal video call
    ii. With EmojiCam (5 min introduction, 5 min practice)
    b. 15 minutes chatting in groups of four people
    c. Analyze the facial expression distribution of the recorded data
    d. User interviews were conducted using the following items
    ● Questionare
    a. What did you feel was different between a normal video call and
    EmojiCam?
    b. What features would you like to see in EmojiCam?
    User Study 2 - Overview
    12

    View Slide

  13. © R&D Center for Digital Nature, University of Tsukuba
    ● Facial expression distribution
    ○ Mann-Whitney’s U-test to facial expression recognition log data
    ○ No significant difference between EmojiCam and Video(p > 0.05)
    ● Q1: What did you feel was different between a normal video call and EmojiCam?
    ○ Enjoying new ways to communicate using EmojiCam
    ○ Started to watch their own videos more than normal video calls
    ○ The anxiety and inconvenience of not being able to see the other person’s or their own facial expressions
    ○ EmojiCam distracting the user and making it difficult to concentrate
    ● Q2: What features would you like to see in EmojiCam?
    ○ Many participants wanted to convey their agreement by recognizing nods
    ○ To adjust the facial expression recognition
    User Study 2 - Result
    13

    View Slide

  14. © R&D Center for Digital Nature, University of Tsukuba
    ● From User Study 2
    ○ Participants wanted to modify the facial expression recognition
    ● The accuracy of EmojiCam in conveying emotions strongly depends on the accuracy of facial
    expression recognition
    ● Manual Input
    ○ Manual input requires a long period of practice
    ○ The problem of being distracted by the keyboard for manual input
    ● Problems in Facial Expression Recognition by Machine Learning
    ○ Datasets are often biased towards Western facial expressions, there are cultural differences
    ○ Personal optimization may be needed
    Discussion - Encode process: accuracy of input method
    14

    View Slide

  15. © R&D Center for Digital Nature, University of Tsukuba
    Discussion - Decode process: differences depends on types of displaying method
    ● From User Study 1
    ○ “Happy” & “Angry” & “Neutral”:No significant differences were found→easy to recognize from Video and
    EmojiCam?
    ○ “Disgusted” & “Fearful”: No significant differences were found→ difficult to recognize from Video and EmojiCam?
    ○ “Sad” & “Surprised”: Video < EmojiCam → Emoji’s expression works
    ● Future Work: Optimization according to usage and purpose
    ○ For accuracy and efficiency, emoji to be displayed should be predetermined, not user-defined
    ○ User definitions are good for playful use with very intimate people
    ■ Similar to the diversified use of emoji in text communication, unique ways of using pictograms are created in the
    course of use
    15

    View Slide

  16. © R&D Center for Digital Nature, University of Tsukuba
    Future Work
    ● Facial expression translation via coded emotion
    ○ cultural/personal differences of facial expression
    ● Audience Reaction Summary
    ○ Visualize the overall mood of video call by presenting statistical data in real time
    16

    View Slide

  17. © R&D Center for Digital Nature, University of Tsukuba
    Thank you for your listening
    Kosaku Namikawa, Ippei Suzuki, Ryo Iijima, Sayan Sarcar, and Yoichi Ochiai

    View Slide