Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EmojiCam: Emoji-assisted video communication system leveraging facial expressions - HCII2021 Paper Presentation

EmojiCam: Emoji-assisted video communication system leveraging facial expressions - HCII2021 Paper Presentation

This presentation was created for the poster session at the HCI International 2021.
http://2021.hci.international/

【Publication】
Kosaku Namikawa, Ippei Suzuki, Ryo Iijima, Sayan Sarcar, Yoichi Ochiai. (2021) EmojiCam: Emoji-Assisted Video Communication System Leveraging Facial Expressions. In: Kurosu M. (eds) Human-Computer Interaction. Design and User Experience Case Studies. HCII 2021. Lecture Notes in Computer Science, vol 12764. Springer, Cham.
https://doi.org/10.1007/978-3-030-78468-3_42
https://digitalnature.slis.tsukuba.ac.jp/2021/07/emojicam-hcii2021/

【Project page】
coming Soon

【Presenter】
Kosaku Namikawa (浪川洪作)
University of Tsukuba
School of Informatics, College of Media Arts, Science and Technology
Digital Nature Group (Yoichi Ochiai)

【Abstract】
This study proposes the design of a communication technique that uses graphical icons, including emojis, as an alternative to facial expressions in video calls. Using graphical icons instead of complex and hard-to-read video expressions simplifies and reduces the amount of information in a video conference. The aim was to facilitate communication by preventing quick and incorrect emotional delivery. In this study, we developed EmojiCam, a system that encodes the emotions of the sender with facial expression recognition or user input and presents graphical icons of the encoded emotions to the receiver. User studies and existing emoji cultures were applied to examine the communication flow and discuss the possibility of using emoji in video calls. Finally, we dis-cuss the new value that this model will bring and how it will change the style of video calling.

6d667e747a4a1cda7d201c3358424257?s=128

Digital Nature Group

July 27, 2021
Tweet

Transcript

  1. © R&D Center for Digital Nature, University of Tsukuba EmojiCam:

    Emoji-Assisted Video Communication System Leveraging Facial Expressions Kosaku Namikawa, Ippei Suzuki, Ryo Iijima, Sayan Sarcar, and Yoichi Ochiai
  2. © R&D Center for Digital Nature, University of Tsukuba Outline

    • Introduction ◦ Online Conferencing is Increasing ◦ Related Work: Studies in Computer Mediated Communications • Implementation ◦ Basic Idea and System Overview ◦ Designed Communication Flow and System Detail • User Study ◦ Simple emotion estimation test ◦ User interview and facial expression when EmojiCam was used for chatting • Discussion • Future Work 2
  3. © R&D Center for Digital Nature, University of Tsukuba 3

    Online meeting is more and more increasingly
  4. © R&D Center for Digital Nature, University of Tsukuba Current

    Video Call • Long video call are so tiring ◦ not as smooth as face-to-face communication ◦ Keeping looking/being looked at people's faces means maintaining a state of tension • Video call requires strong computer processor and high network bandwidth ◦ Video encoding and decoding processes have a high processing load ◦ Many users want to reduce the amount of traffic on their cell phone lines ◦ Video is too much in some situation such as just only participating (but audio not enough) • “Extension” of video call ◦ Screen Sharing, Text Chat, Reactions, White Board ◦ Each function is independent and not linked, causing distraction during a video call ◦ More possibility 4 Extension to replace/assist face-to-face communication without sending video
  5. © R&D Center for Digital Nature, University of Tsukuba Related

    Work 5 • Sending facial expression(nonverbal information) in computer mediated communication ◦ Increasing interest in recognizing and transmitting nonverbal information ◦ Visual communication markers such as emoji and sticker are used to represent non-verbal information in text communication • Nonverbal communication in video/audio communication ◦ Due to latency and resolution limitations, nonverbal communication equivalent to F2F has not been achieved ◦ Synchronous communication was dominated by video and audio ▪ Supplementary text chat and emoji reactions are used ▪ Possibility of visual communication markers as synchronous communication • Emoji and stickers have been used in recent years in social video platforms
  6. © R&D Center for Digital Nature, University of Tsukuba Basic

    Idea • Social video platform apps as a video editor ◦ Social video platform apps make complex video editing easily ◦ Everyone is movie creator especially young people • Live video editing in video call 6 Video Editor Video Call
  7. © R&D Center for Digital Nature, University of Tsukuba •

    Web-based video jack-in application • Facial recognition and facial expression recognition, keyboard input • Overlap emoji on face • Displaying emoji can customize including not emojis (all characters in Unicode, can multiple) EmojiCam System Overview 7
  8. © R&D Center for Digital Nature, University of Tsukuba Implementation

    - communication flow • Type of communication: Recipient understands sender's emotions • Encode: Sender’s emotion → Coded Emotion (via interface) • Decode: Coded Emotion → Recipient’s understanding (via VCMs such as emoji,chinese character) • Coded/Simplified/(Verbalized/Limited) Emotion ◦ Commonly understood as a set of emotion and expression among senders and receivers ◦ Ekman’s 7 basic emotion(neutral, happy, sad, angry, disgusted, fearful, surprised) ◦ Ex.) Emotion: angry, Expression: 😡 Pouting Face 8
  9. © R&D Center for Digital Nature, University of Tsukuba Implementation

    - system detail 9 • Encode Process: ◦ auto face recognition (for emoji position) ◦ 2 type of coded emotion input interface ▪ Manual (keyboard input) ▪ Facial expression recognition • face-api.js, SSDMobilenetv1 • Decode Process ◦ Unicode input by text box per each coded emotion ◦ Control emoji size, Visibility of emoji
  10. © R&D Center for Digital Nature, University of Tsukuba User

    Study 1: Simple Emotion Estimation Test - Overview • Purpose a. Sender’s emotion was conveyed correctly to recipient • Procedure a. Encoding Process (Questioner) i. 11 Participants(Sender) divided 2 Groups (Exemplification / No exemplification) ii. Sender asked to record a 5 seconds video for facial expression of seven emotions iii.Create two videos from the recorded video • EmojiCam (emoji was overlapped) • Video (no additional process) b. Decoding Process (Respondent) i. 21 Participants(Recipient) asked to guess the emotion from the seven emotions in the video 10 Summary 2 groups(Exemplification / No) 2 videos (Video / EmojiCam) 2 times 7 emotions 21 recipients
  11. © R&D Center for Digital Nature, University of Tsukuba User

    Study 1: Simple Emotion Estimation Test - Result & Analysys • chi-square test • Number of correct estimatehigher ◦ Red: EmojiCam ◦ Blue: Video • Facial expression recognition ◦ No exemplification: Lower accuracy ▪ Difficult to recognize people too ▪ But people is higher ◦ Exemplification: Higher accuracy ▪ Easy to recognize people too ▪ Emoji’s expression works(sad, surprised) ▪ The emoji were interpreted in different ways (Disgusted) 11 Exemplification No Exemplification
  12. © R&D Center for Digital Nature, University of Tsukuba •

    Purpose a. Emotional distribution changes with the use of EmojiCam b. Usability verification of EmojiCam through user interview • Procedure a. The subjects were divided into two groups i. Normal video call ii. With EmojiCam (5 min introduction, 5 min practice) b. 15 minutes chatting in groups of four people c. Analyze the facial expression distribution of the recorded data d. User interviews were conducted using the following items • Questionare a. What did you feel was different between a normal video call and EmojiCam? b. What features would you like to see in EmojiCam? User Study 2 - Overview 12
  13. © R&D Center for Digital Nature, University of Tsukuba •

    Facial expression distribution ◦ Mann-Whitney’s U-test to facial expression recognition log data ◦ No significant difference between EmojiCam and Video(p > 0.05) • Q1: What did you feel was different between a normal video call and EmojiCam? ◦ Enjoying new ways to communicate using EmojiCam ◦ Started to watch their own videos more than normal video calls ◦ The anxiety and inconvenience of not being able to see the other person’s or their own facial expressions ◦ EmojiCam distracting the user and making it difficult to concentrate • Q2: What features would you like to see in EmojiCam? ◦ Many participants wanted to convey their agreement by recognizing nods ◦ To adjust the facial expression recognition User Study 2 - Result 13
  14. © R&D Center for Digital Nature, University of Tsukuba •

    From User Study 2 ◦ Participants wanted to modify the facial expression recognition • The accuracy of EmojiCam in conveying emotions strongly depends on the accuracy of facial expression recognition • Manual Input ◦ Manual input requires a long period of practice ◦ The problem of being distracted by the keyboard for manual input • Problems in Facial Expression Recognition by Machine Learning ◦ Datasets are often biased towards Western facial expressions, there are cultural differences ◦ Personal optimization may be needed Discussion - Encode process: accuracy of input method 14
  15. © R&D Center for Digital Nature, University of Tsukuba Discussion

    - Decode process: differences depends on types of displaying method • From User Study 1 ◦ “Happy” & “Angry” & “Neutral”:No significant differences were found→easy to recognize from Video and EmojiCam? ◦ “Disgusted” & “Fearful”: No significant differences were found→ difficult to recognize from Video and EmojiCam? ◦ “Sad” & “Surprised”: Video < EmojiCam → Emoji’s expression works • Future Work: Optimization according to usage and purpose ◦ For accuracy and efficiency, emoji to be displayed should be predetermined, not user-defined ◦ User definitions are good for playful use with very intimate people ▪ Similar to the diversified use of emoji in text communication, unique ways of using pictograms are created in the course of use 15
  16. © R&D Center for Digital Nature, University of Tsukuba Future

    Work • Facial expression translation via coded emotion ◦ cultural/personal differences of facial expression • Audience Reaction Summary ◦ Visualize the overall mood of video call by presenting statistical data in real time 16
  17. © R&D Center for Digital Nature, University of Tsukuba Thank

    you for your listening Kosaku Namikawa, Ippei Suzuki, Ryo Iijima, Sayan Sarcar, and Yoichi Ochiai