EmojiCam: Emoji-assisted video communication system leveraging facial expressions - HCII2021 Paper Presentation

© R&D Center for Digital Nature, University of Tsukuba EmojiCam:
Emoji-Assisted Video Communication System Leveraging Facial Expressions Kosaku Namikawa, Ippei Suzuki, Ryo Iijima, Sayan Sarcar, and Yoichi Ochiai

© R&D Center for Digital Nature, University of Tsukuba Outline
• Introduction ◦ Online Conferencing is Increasing ◦ Related Work: Studies in Computer Mediated Communications • Implementation ◦ Basic Idea and System Overview ◦ Designed Communication Flow and System Detail • User Study ◦ Simple emotion estimation test ◦ User interview and facial expression when EmojiCam was used for chatting • Discussion • Future Work 2

© R&D Center for Digital Nature, University of Tsukuba 3
Online meeting is more and more increasingly

© R&D Center for Digital Nature, University of Tsukuba Current
Video Call • Long video call are so tiring ◦ not as smooth as face-to-face communication ◦ Keeping looking/being looked at people's faces means maintaining a state of tension • Video call requires strong computer processor and high network bandwidth ◦ Video encoding and decoding processes have a high processing load ◦ Many users want to reduce the amount of traffic on their cell phone lines ◦ Video is too much in some situation such as just only participating (but audio not enough) • “Extension” of video call ◦ Screen Sharing, Text Chat, Reactions, White Board ◦ Each function is independent and not linked, causing distraction during a video call ◦ More possibility 4 Extension to replace/assist face-to-face communication without sending video

© R&D Center for Digital Nature, University of Tsukuba Related
Work 5 • Sending facial expression(nonverbal information) in computer mediated communication ◦ Increasing interest in recognizing and transmitting nonverbal information ◦ Visual communication markers such as emoji and sticker are used to represent non-verbal information in text communication • Nonverbal communication in video/audio communication ◦ Due to latency and resolution limitations, nonverbal communication equivalent to F2F has not been achieved ◦ Synchronous communication was dominated by video and audio ▪ Supplementary text chat and emoji reactions are used ▪ Possibility of visual communication markers as synchronous communication • Emoji and stickers have been used in recent years in social video platforms

© R&D Center for Digital Nature, University of Tsukuba Basic
Idea • Social video platform apps as a video editor ◦ Social video platform apps make complex video editing easily ◦ Everyone is movie creator especially young people • Live video editing in video call 6 Video Editor Video Call

© R&D Center for Digital Nature, University of Tsukuba •
Web-based video jack-in application • Facial recognition and facial expression recognition, keyboard input • Overlap emoji on face • Displaying emoji can customize including not emojis (all characters in Unicode, can multiple) EmojiCam System Overview 7

© R&D Center for Digital Nature, University of Tsukuba Implementation
- communication flow • Type of communication: Recipient understands sender's emotions • Encode: Sender’s emotion → Coded Emotion (via interface) • Decode: Coded Emotion → Recipient’s understanding (via VCMs such as emoji,chinese character) • Coded/Simplified/(Verbalized/Limited) Emotion ◦ Commonly understood as a set of emotion and expression among senders and receivers ◦ Ekman’s 7 basic emotion(neutral, happy, sad, angry, disgusted, fearful, surprised) ◦ Ex.) Emotion: angry, Expression: 😡 Pouting Face 8

© R&D Center for Digital Nature, University of Tsukuba Implementation
- system detail 9 • Encode Process: ◦ auto face recognition (for emoji position) ◦ 2 type of coded emotion input interface ▪ Manual (keyboard input) ▪ Facial expression recognition • face-api.js, SSDMobilenetv1 • Decode Process ◦ Unicode input by text box per each coded emotion ◦ Control emoji size, Visibility of emoji

© R&D Center for Digital Nature, University of Tsukuba User
Study 1: Simple Emotion Estimation Test - Overview • Purpose a. Sender’s emotion was conveyed correctly to recipient • Procedure a. Encoding Process (Questioner) i. 11 Participants(Sender) divided 2 Groups (Exemplification / No exemplification) ii. Sender asked to record a 5 seconds video for facial expression of seven emotions iii.Create two videos from the recorded video • EmojiCam (emoji was overlapped) • Video (no additional process) b. Decoding Process (Respondent) i. 21 Participants(Recipient) asked to guess the emotion from the seven emotions in the video 10 Summary 2 groups(Exemplification / No) 2 videos (Video / EmojiCam) 2 times 7 emotions 21 recipients

© R&D Center for Digital Nature, University of Tsukuba User
Study 1: Simple Emotion Estimation Test - Result & Analysys • chi-square test • Number of correct estimatehigher ◦ Red: EmojiCam ◦ Blue: Video • Facial expression recognition ◦ No exemplification: Lower accuracy ▪ Difficult to recognize people too ▪ But people is higher ◦ Exemplification: Higher accuracy ▪ Easy to recognize people too ▪ Emoji’s expression works(sad, surprised) ▪ The emoji were interpreted in different ways (Disgusted) 11 Exemplification No Exemplification

Purpose a. Emotional distribution changes with the use of EmojiCam b. Usability verification of EmojiCam through user interview • Procedure a. The subjects were divided into two groups i. Normal video call ii. With EmojiCam (5 min introduction, 5 min practice) b. 15 minutes chatting in groups of four people c. Analyze the facial expression distribution of the recorded data d. User interviews were conducted using the following items • Questionare a. What did you feel was different between a normal video call and EmojiCam? b. What features would you like to see in EmojiCam? User Study 2 - Overview 12

Facial expression distribution ◦ Mann-Whitney’s U-test to facial expression recognition log data ◦ No significant difference between EmojiCam and Video(p > 0.05) • Q1: What did you feel was different between a normal video call and EmojiCam? ◦ Enjoying new ways to communicate using EmojiCam ◦ Started to watch their own videos more than normal video calls ◦ The anxiety and inconvenience of not being able to see the other person’s or their own facial expressions ◦ EmojiCam distracting the user and making it difficult to concentrate • Q2: What features would you like to see in EmojiCam? ◦ Many participants wanted to convey their agreement by recognizing nods ◦ To adjust the facial expression recognition User Study 2 - Result 13

From User Study 2 ◦ Participants wanted to modify the facial expression recognition • The accuracy of EmojiCam in conveying emotions strongly depends on the accuracy of facial expression recognition • Manual Input ◦ Manual input requires a long period of practice ◦ The problem of being distracted by the keyboard for manual input • Problems in Facial Expression Recognition by Machine Learning ◦ Datasets are often biased towards Western facial expressions, there are cultural differences ◦ Personal optimization may be needed Discussion - Encode process: accuracy of input method 14

© R&D Center for Digital Nature, University of Tsukuba Discussion
- Decode process: differences depends on types of displaying method • From User Study 1 ◦ “Happy” & “Angry” & “Neutral”:No significant differences were found→easy to recognize from Video and EmojiCam? ◦ “Disgusted” & “Fearful”: No significant differences were found→ difficult to recognize from Video and EmojiCam? ◦ “Sad” & “Surprised”: Video < EmojiCam → Emoji’s expression works • Future Work: Optimization according to usage and purpose ◦ For accuracy and efficiency, emoji to be displayed should be predetermined, not user-defined ◦ User definitions are good for playful use with very intimate people ▪ Similar to the diversified use of emoji in text communication, unique ways of using pictograms are created in the course of use 15

© R&D Center for Digital Nature, University of Tsukuba Future
Work • Facial expression translation via coded emotion ◦ cultural/personal differences of facial expression • Audience Reaction Summary ◦ Visualize the overall mood of video call by presenting statistical data in real time 16

© R&D Center for Digital Nature, University of Tsukuba Thank
you for your listening Kosaku Namikawa, Ippei Suzuki, Ryo Iijima, Sayan Sarcar, and Yoichi Ochiai

EmojiCam: Emoji-assisted video communication sy...

EmojiCam: Emoji-assisted video communication system leveraging facial expressions - HCII2021 Paper Presentation

Digital Nature Group

More Decks by Digital Nature Group

Other Decks in Research

Featured

Transcript

© R&D Center for Digital Nature, University of Tsukuba EmojiCam:

© R&D Center for Digital Nature, University of Tsukuba Outline

© R&D Center for Digital Nature, University of Tsukuba 3

© R&D Center for Digital Nature, University of Tsukuba Current

© R&D Center for Digital Nature, University of Tsukuba Related

© R&D Center for Digital Nature, University of Tsukuba Basic

© R&D Center for Digital Nature, University of Tsukuba •

© R&D Center for Digital Nature, University of Tsukuba Implementation

© R&D Center for Digital Nature, University of Tsukuba Implementation

© R&D Center for Digital Nature, University of Tsukuba User

© R&D Center for Digital Nature, University of Tsukuba User

© R&D Center for Digital Nature, University of Tsukuba •

© R&D Center for Digital Nature, University of Tsukuba •

© R&D Center for Digital Nature, University of Tsukuba •

© R&D Center for Digital Nature, University of Tsukuba Discussion

© R&D Center for Digital Nature, University of Tsukuba Future

© R&D Center for Digital Nature, University of Tsukuba Thank