Dance Practice System that Shows What You Would Look Like if You Could Master the Dance

Dance Practice System   that Shows What You Would Look
Like   if You Could Master the Dance Shuhei Tsuchida†1, Mao Haomin†1, Hieaki Okamoto†2, Yuma Suzuki†2, Rintaro Kanada†2, Takayuki Hori†2, Tsutomu Terada†1, Masahiko Tsukamoto†1 †1 Kobe University †2 Softbank Corp. 8th International Conference on Movement and Computing Conferenc e 22-24 June, 2022

2 Demo Video Mirror Deepfake video

Background

4 Learn dance movements There are many studies to support
the acquisition of dance movements. Haptic feedback  [Schönauer et al., ICMI2012] Mirror-based system  [Andreson et al., UIST2013] Robot  [Nakamura et al., IROS2005] Auditory feedback  [Großhauser et al., AES Journal2012]

5 Video self-modeling A technique of showing a video of
oneself mastering dance [Fujimoto et al., ACHI2012] It has been used in rehabilitation [Steel et al., Journal of Motor Behavior 2017] and skill learning in sports [Ste-Marie et al., Frontiers in Psychology 2011] . ɹˠ Reported to be potentially effective.

6 Problem Creating an ideal dance movement video is  
time-consuming and tedious for learners. ʁ Reference video Original video Ideal movement ʁ

7 Deepfake technique Everybody dance now [Chan et al., ICCV2019]
https://youtu.be/PCBTZh41Ris

8 Research purpose Our goal is to support dance learning
by creating a video of oneself who has mastered dancing using deep learning technology and practising by watching the video.

Proposed method

10 Reference video Skeleton information Skeleton information model Learned model
Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now,   Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation We utilized the offical library of Everybody dance now [1] paper.

11 We utilized the offical library of Everybody dance now
[1] paper. Reference video Skeleton information Skeleton information model Learned model Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now,   Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation Comparison

Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now,   Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation We utilized the offical library of Everybody dance now [1] paper. c c

Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now,   Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation We utilized the offical library of Everybody dance now [1] paper.

Preliminary investigation   (ⅰ) Types of the movement of input
video  (ⅰⅰ) Characteristics of the movement of reference video

(ⅰ) Types of the movement of input video 15 Output
Input Learners imitate the dance movements according to the reference video Learners move freely

(ⅰ) Types of the movement of input video 16 Output
Input Learners imitate the dance movements according to the reference video Learners move freely ݟຊө૾ͱಉ͡ಈ࡞Λߦͳ͍ͬͯΔө૾Λ ֶशͨ͠ํ͕ग़ྗө૾ͷΫΦϦςΟ͕ߴ͍ The output video was of higher quality when we input the video   in which the learner repeated the same dance movements.

(ⅰⅰ) Characteristics of the movement of reference video 17 Rotate
the arm Up-and-down Depth × One's arm disappears × Change one's position × Depth representaion We should use the reference video that omits these motions in the user study.

User study

User study 19 Purpos e Our goal is to verify
whether watching videos of themselves performing an expert dancer's movements teaches movement skills effectively . Dance movements (Learning target)ɹ Three dance movementsʢeasy - intermidiate - difficultʣ   Participant 20 university students in their 20s (19 males and one female ) Conditio n Original video presentation group  Deepfake video presentation group

Dance movements (Learning target) 20 Targets do not contai n
• movemetns in depth • movements in up-and-dow n • arm rotating movement s • turn one’s back movements  Dance 1  Dance 2  Dance 3 

Presentation group 21 Original video   presentation group Deepfake video 
presentation group

22 Original video   presentation group Deepfake video  presentation group
Presentation group

23 Original video   presentation group Deepfake video  presentation group
Presentation group

Experimental process 24 Day 1 PreTraining PreTest 5 min. 3
times Day 2 Training PostTest 10 min. 3 times Day 3 RetentionTest 3 times Scoring based on Day 1 to 3 videos Deepfake video Original video Original video Deepfake video model Genrated video

Evaluation index 29 Skeleton We scored the DTW distance between
the participant’s video and the reference video. Reference video Video at test x y Feature vector Unit vector All frames Unit vector information  for all skeletons

the participant’s video and the reference video. Reference video Video at test x y Feature vector Unit vector All frames Unit vector information  for all skeletons 292 314 308

Questionnaire 33 • Difficulty of learning each dance movement  1:
very easy to 7: very difficult  • I think I can learn to dance if I keep practicing . ɹ 1: strongly disagree to 5: strongly agree  • I could master the dance . ɹ 1: strongly disagree to 5: strongly agree  • I felt as if I were dancing.ʢonly deepfale presentation groupʣ ɹ 1: strongly disagree to 5: strongly agree

Result

Result 35

Average DTW costs for Dance 1 to Dance 3 36
There is no significant difference.

Questionnaire 1 37 “I felt as if were dancing.” Strong
disagree 1 – 2 – 3 – 4 – 5 Strong agree The responses varied among participants.

Questionnaire 1 38 “I felt as if were dancing.” Strong
disagree 1 – 2 – 3 – 4 – 5 Strong agree The responses varied among participants. A person Three people

39 Average DTW costs per participant based on the responses

40 I did not feel as if I were dancing.
I felt as if I were dancing. Participants who felt as if they werre dancing may be more likely to learn through deepfake video. Average DTW costs per participant based on the responses

Questionnaire 2 41 “I think I can learn to dance 
if I keep practising.” "I think I can learn to dance   if I continue to practise" The deepfake video presentation group tended to have lower self-efficacy.

Questionnaire 2 42 “I think I can learn to dance 
if I keep practising.” "I think I can learn to dance   if I continue to practise" The deepfake video presentation group tended to have lower self-efficacy. Discussion: Self-efficacy may decrease if the desired movement   is too far from one's skill level.

Participants’ positive comment (1/3) 43 • I was able to
see myself dancing, so it was easy to know how to move . • In the reference video, I cannot understand what kind of movement the dancer was doing. In the deepfake video, the complex movements seemed to be a little easier . • I thought I was able to notice more differences between my movements and the dancer in the PreTraining . • I felt that it was easier to compare the movements of each part of the body because the deepfake had the same body shape. These comments indicate the ability of the deepfake video   in improving the understanding of movements.

Participants’ positive comment (2/3) 44 • It was a strange
feeling because it was a video of myself doing a movement that I should not have been able to do, but it was easy to visualize the movement in my brain . • I was moving my body thinking that I was dancing like in the deepfake video. Sometimes I looked at the mirror image of myself and compared it with the deepfake, and I noticed the points where I was not moving well. The comments indicate the possibility of supporting the movement.

Participants’ positive comment (3/3) 45 • My motivation went up
because I could see myself getting better . • It was interesting to see a video of myself dancing perfectly   because I felt strange . • I was motivated by the fact that I could see how well I was doing . • I could see myself dancing well in the video, so I can enjoy practicing with the illusion that I am dancing well. These comments indicate the ability of deepsake   to increase learner motivation.

Participants’ negative comment 46 • There were some noises in
the deepfake video compared with   the reference video, so that I could not understand   some details of the movements . • In the video of Dance1, it was difficult to figure out   which foot was in front of another one.   If the dance movements contain difficult parts,   even if using a deepfake video, it was difficult to imagine it . • The quality of the image was not very good,   so it was difficult to see the detailed movements of the fingers. Most of these comments were related to   the low image quality generation

Discussion 47 The low quality of the generated video, as
seen in the participants' comments, may have affected the participants' learning . Generating a system with higher quality images that remove the unnaturalness of the images is required.  There was no significant difference between the groups for each type of dance. The role of dancing difficulty levels in these videos needs further investigation.  The small screen size (90 cm x 50 cm) may have diminished its role as   a mirror and prevented the effect of self-modeling . Using a relatively large display will reduce the error rate.

Summary 48 • We propose a learning method that uses
deep learning to generate and present a deepfake video that performs the same movements as those of a dancer in a reference video . • We tested whether the deepfake videos generated automatically are effective for dance learning . • The experimental results showed that there was no significant difference between the presentation groups. Reference video Skeleton information learned model Output video

Future plan 49 Skill Morphing : Skill gap still exists.
Novice Expert

Future plan 50 Skill Morphing : By generating and presenting
dance movements of an intermediate level between novices and experts, we can practice while referring to dance movements that are one step ahead of our own dance level . 100% Novice 100% Exper t WJTVBM NPUJPO

Summary 52 • ਂ૚ֶशΛ༻͍ͯݟຊө૾தͷμϯαʔͷಈ࡞ͱ  ಉ͡ಈ࡞Λߦ͏ࣗݾө૾Λੜ੒ɾఏࣔ͢Δֶशख๏ΛఏҊɽ • ࣗಈੜ੒ͨࣗ͠ݾө૾͕ɼμϯεֶशʹ༗ޮ͔Ͳ͏͔ݕূɽ • ࣮ݧͷ݁Ռɼఏࣔάϧʔϓؒʹ༗ҙͳࠩ͸ݟΒΕͳ͔ͬͨɽ •
Ξϯέʔτ݁ՌΛ΋ͱʹࣗݾө૾ఏࣔͷ՝୊Λ੔ཧɽ Reference video Skeleton information learned model Output video

ࠓޙͷ՝୊ 53 εΩϧϞʔϑΟϯά  ɹॳ৺ऀͱ্ڃऀͷதؒϨϕϧͷμϯεಈ࡞Λੜ੒͢Δٕज़    ɹࣗ෼ͷμϯεϨϕϧΑΓҰาઌͷμϯεͷಈ͖  ɹΛࢀߟʹ͠ͳ͕Β࿅शͰ͖Δ    100% Novice
100% Expert visual motion

Dance Practice System that Shows What You Would...

Dance Practice System that Shows What You Would Look Like if You Could Master the Dance

More Decks by Shuhei Tsuchida

Other Decks in Technology

Featured

Transcript