Slide 1

Slide 1 text

Dance Practice System 
 that Shows What You Would Look Like 
 if You Could Master the Dance Shuhei Tsuchida†1, Mao Haomin†1, Hieaki Okamoto†2, Yuma Suzuki†2, Rintaro Kanada†2, Takayuki Hori†2, Tsutomu Terada†1, Masahiko Tsukamoto†1 †1 Kobe University †2 Softbank Corp. 8th International Conference on Movement and Computing Conferenc e 22-24 June, 2022

Slide 2

Slide 2 text

2 Demo Video Mirror Deepfake video

Slide 3

Slide 3 text

Background

Slide 4

Slide 4 text

4 Learn dance movements There are many studies to support the acquisition of dance movements. Haptic feedback
 [Schönauer et al., ICMI2012] Mirror-based system
 [Andreson et al., UIST2013] Robot
 [Nakamura et al., IROS2005] Auditory feedback
 [Großhauser et al., AES Journal2012]

Slide 5

Slide 5 text

5 Video self-modeling A technique of showing a video of oneself mastering dance [Fujimoto et al., ACHI2012] It has been used in rehabilitation [Steel et al., Journal of Motor Behavior 2017] and skill learning in sports [Ste-Marie et al., Frontiers in Psychology 2011] . ɹˠ Reported to be potentially effective.

Slide 6

Slide 6 text

6 Problem Creating an ideal dance movement video is 
 time-consuming and tedious for learners. ʁ Reference video Original video Ideal movement ʁ

Slide 7

Slide 7 text

7 Deepfake technique Everybody dance now [Chan et al., ICCV2019] https://youtu.be/PCBTZh41Ris

Slide 8

Slide 8 text

8 Research purpose Our goal is to support dance learning by creating a video of oneself who has mastered dancing using deep learning technology and practising by watching the video.

Slide 9

Slide 9 text

Proposed method

Slide 10

Slide 10 text

10 Reference video Skeleton information Skeleton information model Learned model Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 
 Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation We utilized the offical library of Everybody dance now [1] paper.

Slide 11

Slide 11 text

11 We utilized the offical library of Everybody dance now [1] paper. Reference video Skeleton information Skeleton information model Learned model Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 
 Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation Comparison

Slide 12

Slide 12 text

12 Reference video Skeleton information Skeleton information model Learned model Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 
 Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation We utilized the offical library of Everybody dance now [1] paper. c c

Slide 13

Slide 13 text

13 Reference video Skeleton information Skeleton information model Learned model Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 
 Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation We utilized the offical library of Everybody dance now [1] paper.

Slide 14

Slide 14 text

Preliminary investigation 
 (ⅰ) Types of the movement of input video
 (ⅰⅰ) Characteristics of the movement of reference video

Slide 15

Slide 15 text

(ⅰ) Types of the movement of input video 15 Output Input Learners imitate the dance movements according to the reference video Learners move freely

Slide 16

Slide 16 text

(ⅰ) Types of the movement of input video 16 Output Input Learners imitate the dance movements according to the reference video Learners move freely ݟຊө૾ͱಉ͡ಈ࡞Λߦͳ͍ͬͯΔө૾Λ ֶशͨ͠ํ͕ग़ྗө૾ͷΫΦϦςΟ͕ߴ͍ The output video was of higher quality when we input the video 
 in which the learner repeated the same dance movements.

Slide 17

Slide 17 text

(ⅰⅰ) Characteristics of the movement of reference video 17 Rotate the arm Up-and-down Depth × One's arm disappears × Change one's position × Depth representaion We should use the reference video that omits these motions in the user study.

Slide 18

Slide 18 text

User study

Slide 19

Slide 19 text

User study 19 Purpos e Our goal is to verify whether watching videos of themselves performing an expert dancer's movements teaches movement skills effectively . Dance movements (Learning target)ɹ Three dance movementsʢeasy - intermidiate - difficultʣ 
 Participant 20 university students in their 20s (19 males and one female ) Conditio n Original video presentation group
 Deepfake video presentation group

Slide 20

Slide 20 text

Dance movements (Learning target) 20 Targets do not contai n • movemetns in depth • movements in up-and-dow n • arm rotating movement s • turn one’s back movements
 Dance 1
 Dance 2
 Dance 3


Slide 21

Slide 21 text

Presentation group 21 Original video 
 presentation group Deepfake video
 presentation group

Slide 22

Slide 22 text

22 Original video 
 presentation group Deepfake video
 presentation group Presentation group

Slide 23

Slide 23 text

23 Original video 
 presentation group Deepfake video
 presentation group Presentation group

Slide 24

Slide 24 text

Experimental process 24 Day 1 PreTraining PreTest 5 min. 3 times Day 2 Training PostTest 10 min. 3 times Day 3 RetentionTest 3 times Scoring based on Day 1 to 3 videos Deepfake video Original video Original video Deepfake video model Genrated video

Slide 25

Slide 25 text

Experimental process 25 Day 1 PreTraining PreTest 5 min. 3 times Day 2 Training PostTest 10 min. 3 times Day 3 RetentionTest 3 times Scoring based on Day 1 to 3 videos Deepfake video Original video Original video Deepfake video model Genrated video

Slide 26

Slide 26 text

Experimental process 26 Day 1 PreTraining PreTest 5 min. 3 times Day 2 Training PostTest 10 min. 3 times Day 3 RetentionTest 3 times Scoring based on Day 1 to 3 videos Deepfake video Original video Original video Deepfake video model Genrated video

Slide 27

Slide 27 text

Experimental process 27 Day 1 PreTraining PreTest 5 min. 3 times Day 2 Training PostTest 10 min. 3 times Day 3 RetentionTest 3 times Scoring based on Day 1 to 3 videos Deepfake video Original video Original video Deepfake video model Genrated video

Slide 28

Slide 28 text

Experimental process 28 Day 1 PreTraining PreTest 5 min. 3 times Day 2 Training PostTest 10 min. 3 times Day 3 RetentionTest 3 times Scoring based on Day 1 to 3 videos Deepfake video Original video Original video Deepfake video model Genrated video

Slide 29

Slide 29 text

Evaluation index 29 Skeleton We scored the DTW distance between the participant’s video and the reference video. Reference video Video at test x y Feature vector Unit vector All frames Unit vector information
 for all skeletons

Slide 30

Slide 30 text

Evaluation index 30 Skeleton We scored the DTW distance between the participant’s video and the reference video. Reference video Video at test x y Feature vector Unit vector All frames Unit vector information
 for all skeletons

Slide 31

Slide 31 text

Evaluation index 31 Skeleton We scored the DTW distance between the participant’s video and the reference video. Reference video Video at test x y Feature vector Unit vector All frames Unit vector information
 for all skeletons

Slide 32

Slide 32 text

Evaluation index 32 Skeleton We scored the DTW distance between the participant’s video and the reference video. Reference video Video at test x y Feature vector Unit vector All frames Unit vector information
 for all skeletons 292 314 308

Slide 33

Slide 33 text

Questionnaire 33 • Difficulty of learning each dance movement
 1: very easy to 7: very difficult
 • I think I can learn to dance if I keep practicing . ɹ 1: strongly disagree to 5: strongly agree
 • I could master the dance . ɹ 1: strongly disagree to 5: strongly agree
 • I felt as if I were dancing.ʢonly deepfale presentation groupʣ ɹ 1: strongly disagree to 5: strongly agree

Slide 34

Slide 34 text

Result

Slide 35

Slide 35 text

Result 35

Slide 36

Slide 36 text

Average DTW costs for Dance 1 to Dance 3 36 There is no significant difference.

Slide 37

Slide 37 text

Questionnaire 1 37 “I felt as if were dancing.” Strong disagree 1 – 2 – 3 – 4 – 5 Strong agree The responses varied among participants.

Slide 38

Slide 38 text

Questionnaire 1 38 “I felt as if were dancing.” Strong disagree 1 – 2 – 3 – 4 – 5 Strong agree The responses varied among participants. A person Three people

Slide 39

Slide 39 text

39 Average DTW costs per participant based on the responses

Slide 40

Slide 40 text

40 I did not feel as if I were dancing. I felt as if I were dancing. Participants who felt as if they werre dancing may be more likely to learn through deepfake video. Average DTW costs per participant based on the responses

Slide 41

Slide 41 text

Questionnaire 2 41 “I think I can learn to dance
 if I keep practising.” "I think I can learn to dance 
 if I continue to practise" The deepfake video presentation group tended to have lower self-efficacy.

Slide 42

Slide 42 text

Questionnaire 2 42 “I think I can learn to dance
 if I keep practising.” "I think I can learn to dance 
 if I continue to practise" The deepfake video presentation group tended to have lower self-efficacy. Discussion: Self-efficacy may decrease if the desired movement 
 is too far from one's skill level.

Slide 43

Slide 43 text

Participants’ positive comment (1/3) 43 • I was able to see myself dancing, so it was easy to know how to move . • In the reference video, I cannot understand what kind of movement the dancer was doing. In the deepfake video, the complex movements seemed to be a little easier . • I thought I was able to notice more differences between my movements and the dancer in the PreTraining . • I felt that it was easier to compare the movements of each part of the body because the deepfake had the same body shape. These comments indicate the ability of the deepfake video 
 in improving the understanding of movements.

Slide 44

Slide 44 text

Participants’ positive comment (2/3) 44 • It was a strange feeling because it was a video of myself doing a movement that I should not have been able to do, but it was easy to visualize the movement in my brain . • I was moving my body thinking that I was dancing like in the deepfake video. Sometimes I looked at the mirror image of myself and compared it with the deepfake, and I noticed the points where I was not moving well. The comments indicate the possibility of supporting the movement.

Slide 45

Slide 45 text

Participants’ positive comment (3/3) 45 • My motivation went up because I could see myself getting better . • It was interesting to see a video of myself dancing perfectly 
 because I felt strange . • I was motivated by the fact that I could see how well I was doing . • I could see myself dancing well in the video, so I can enjoy practicing with the illusion that I am dancing well. These comments indicate the ability of deepsake 
 to increase learner motivation.

Slide 46

Slide 46 text

Participants’ negative comment 46 • There were some noises in the deepfake video compared with 
 the reference video, so that I could not understand 
 some details of the movements . • In the video of Dance1, it was difficult to figure out 
 which foot was in front of another one. 
 If the dance movements contain difficult parts, 
 even if using a deepfake video, it was difficult to imagine it . • The quality of the image was not very good, 
 so it was difficult to see the detailed movements of the fingers. Most of these comments were related to 
 the low image quality generation

Slide 47

Slide 47 text

Discussion 47 The low quality of the generated video, as seen in the participants' comments, may have affected the participants' learning . Generating a system with higher quality images that remove the unnaturalness of the images is required.
 There was no significant difference between the groups for each type of dance. The role of dancing difficulty levels in these videos needs further investigation.
 The small screen size (90 cm x 50 cm) may have diminished its role as 
 a mirror and prevented the effect of self-modeling . Using a relatively large display will reduce the error rate.

Slide 48

Slide 48 text

Summary 48 • We propose a learning method that uses deep learning to generate and present a deepfake video that performs the same movements as those of a dancer in a reference video . • We tested whether the deepfake videos generated automatically are effective for dance learning . • The experimental results showed that there was no significant difference between the presentation groups. Reference video Skeleton information learned model Output video

Slide 49

Slide 49 text

Future plan 49 Skill Morphing : Skill gap still exists. Novice Expert

Slide 50

Slide 50 text

Future plan 50 Skill Morphing : By generating and presenting dance movements of an intermediate level between novices and experts, we can practice while referring to dance movements that are one step ahead of our own dance level . 100% Novice 100% Exper t WJTVBM NPUJPO

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

Summary 52 • ਂ૚ֶशΛ༻͍ͯݟຊө૾தͷμϯαʔͷಈ࡞ͱ
 ಉ͡ಈ࡞Λߦ͏ࣗݾө૾Λੜ੒ɾఏࣔ͢Δֶशख๏ΛఏҊɽ • ࣗಈੜ੒ͨࣗ͠ݾө૾͕ɼμϯεֶशʹ༗ޮ͔Ͳ͏͔ݕূɽ • ࣮ݧͷ݁Ռɼఏࣔάϧʔϓؒʹ༗ҙͳࠩ͸ݟΒΕͳ͔ͬͨɽ • Ξϯέʔτ݁ՌΛ΋ͱʹࣗݾө૾ఏࣔͷ՝୊Λ੔ཧɽ Reference video Skeleton information learned model Output video

Slide 53

Slide 53 text

ࠓޙͷ՝୊ 53 εΩϧϞʔϑΟϯά
 ɹॳ৺ऀͱ্ڃऀͷதؒϨϕϧͷμϯεಈ࡞Λੜ੒͢Δٕज़
 
 ɹࣗ෼ͷμϯεϨϕϧΑΓҰาઌͷμϯεͷಈ͖
 ɹΛࢀߟʹ͠ͳ͕Β࿅शͰ͖Δ
 
 100% Novice 100% Expert visual motion