Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dance Practice System that Shows What You Would Look Like if You Could Master the Dance

Dance Practice System that Shows What You Would Look Like if You Could Master the Dance

This study proposes a dance practice system allowing users to learn dancing by watching videos in which they have mastered the movements of a professional dancer. Video self-modeling, which encourages learners to improve their behavior by watching videos of exemplary behavior by themselves, effectively teaches movement skills. However, creating an ideal dance movement video
is time-consuming and tedious for learners. To solve this problem, we utilize a video generation technique based on deepfake to automatically generate a video of the learners dancing the same movement as the dancer in the reference video. We conducted a user study with 20 participants to verify whether the deepfake video effectively teaches dance movements. The results showed no
significant difference between the groups learning with the original and deepfake videos. In addition, the group using the deepfake video had significantly lower self-efficacy. Based on these experimental results, we discussed the design implications of the system using the deepfake video to support learning dance movements.

shuhei2306

June 29, 2022
Tweet

More Decks by shuhei2306

Other Decks in Technology

Transcript

  1. Dance Practice System 
 that Shows What You Would Look

    Like 
 if You Could Master the Dance Shuhei Tsuchida†1, Mao Haomin†1, Hieaki Okamoto†2, Yuma Suzuki†2, Rintaro Kanada†2, Takayuki Hori†2, Tsutomu Terada†1, Masahiko Tsukamoto†1 †1 Kobe University †2 Softbank Corp. 8th International Conference on Movement and Computing Conferenc e 22-24 June, 2022
  2. 2 Demo Video Mirror Deepfake video

  3. Background

  4. 4 Learn dance movements There are many studies to support

    the acquisition of dance movements. Haptic feedback
 [Schönauer et al., ICMI2012] Mirror-based system
 [Andreson et al., UIST2013] Robot
 [Nakamura et al., IROS2005] Auditory feedback
 [Großhauser et al., AES Journal2012]
  5. 5 Video self-modeling A technique of showing a video of

    oneself mastering dance [Fujimoto et al., ACHI2012] It has been used in rehabilitation [Steel et al., Journal of Motor Behavior 2017] and skill learning in sports [Ste-Marie et al., Frontiers in Psychology 2011] . ɹˠ Reported to be potentially effective.
  6. 6 Problem Creating an ideal dance movement video is 


    time-consuming and tedious for learners. ʁ Reference video Original video Ideal movement ʁ
  7. 7 Deepfake technique Everybody dance now [Chan et al., ICCV2019]

    https://youtu.be/PCBTZh41Ris
  8. 8 Research purpose Our goal is to support dance learning

    by creating a video of oneself who has mastered dancing using deep learning technology and practising by watching the video.
  9. Proposed method

  10. 10 Reference video Skeleton information Skeleton information model Learned model

    Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 
 Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation We utilized the offical library of Everybody dance now [1] paper.
  11. 11 We utilized the offical library of Everybody dance now

    [1] paper. Reference video Skeleton information Skeleton information model Learned model Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 
 Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation Comparison
  12. 12 Reference video Skeleton information Skeleton information model Learned model

    Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 
 Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation We utilized the offical library of Everybody dance now [1] paper. c c
  13. 13 Reference video Skeleton information Skeleton information model Learned model

    Output video Input video Restored video [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 
 Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019). Deepfake video generation We utilized the offical library of Everybody dance now [1] paper.
  14. Preliminary investigation 
 (ⅰ) Types of the movement of input

    video
 (ⅰⅰ) Characteristics of the movement of reference video
  15. (ⅰ) Types of the movement of input video 15 Output

    Input Learners imitate the dance movements according to the reference video Learners move freely
  16. (ⅰ) Types of the movement of input video 16 Output

    Input Learners imitate the dance movements according to the reference video Learners move freely ݟຊө૾ͱಉ͡ಈ࡞Λߦͳ͍ͬͯΔө૾Λ ֶशͨ͠ํ͕ग़ྗө૾ͷΫΦϦςΟ͕ߴ͍ The output video was of higher quality when we input the video 
 in which the learner repeated the same dance movements.
  17. (ⅰⅰ) Characteristics of the movement of reference video 17 Rotate

    the arm Up-and-down Depth × One's arm disappears × Change one's position × Depth representaion We should use the reference video that omits these motions in the user study.
  18. User study

  19. User study 19 Purpos e Our goal is to verify

    whether watching videos of themselves performing an expert dancer's movements teaches movement skills effectively . Dance movements (Learning target)ɹ Three dance movementsʢeasy - intermidiate - difficultʣ 
 Participant 20 university students in their 20s (19 males and one female ) Conditio n Original video presentation group
 Deepfake video presentation group
  20. Dance movements (Learning target) 20 Targets do not contai n

    • movemetns in depth • movements in up-and-dow n • arm rotating movement s • turn one’s back movements
 Dance 1
 Dance 2
 Dance 3

  21. Presentation group 21 Original video 
 presentation group Deepfake video


    presentation group
  22. 22 Original video 
 presentation group Deepfake video
 presentation group

    Presentation group
  23. 23 Original video 
 presentation group Deepfake video
 presentation group

    Presentation group
  24. Experimental process 24 Day 1 PreTraining PreTest 5 min. 3

    times Day 2 Training PostTest 10 min. 3 times Day 3 RetentionTest 3 times Scoring based on Day 1 to 3 videos Deepfake video Original video Original video Deepfake video model Genrated video
  25. Experimental process 25 Day 1 PreTraining PreTest 5 min. 3

    times Day 2 Training PostTest 10 min. 3 times Day 3 RetentionTest 3 times Scoring based on Day 1 to 3 videos Deepfake video Original video Original video Deepfake video model Genrated video
  26. Experimental process 26 Day 1 PreTraining PreTest 5 min. 3

    times Day 2 Training PostTest 10 min. 3 times Day 3 RetentionTest 3 times Scoring based on Day 1 to 3 videos Deepfake video Original video Original video Deepfake video model Genrated video
  27. Experimental process 27 Day 1 PreTraining PreTest 5 min. 3

    times Day 2 Training PostTest 10 min. 3 times Day 3 RetentionTest 3 times Scoring based on Day 1 to 3 videos Deepfake video Original video Original video Deepfake video model Genrated video
  28. Experimental process 28 Day 1 PreTraining PreTest 5 min. 3

    times Day 2 Training PostTest 10 min. 3 times Day 3 RetentionTest 3 times Scoring based on Day 1 to 3 videos Deepfake video Original video Original video Deepfake video model Genrated video
  29. Evaluation index 29 Skeleton We scored the DTW distance between

    the participant’s video and the reference video. Reference video Video at test x y Feature vector Unit vector All frames Unit vector information
 for all skeletons
  30. Evaluation index 30 Skeleton We scored the DTW distance between

    the participant’s video and the reference video. Reference video Video at test x y Feature vector Unit vector All frames Unit vector information
 for all skeletons
  31. Evaluation index 31 Skeleton We scored the DTW distance between

    the participant’s video and the reference video. Reference video Video at test x y Feature vector Unit vector All frames Unit vector information
 for all skeletons
  32. Evaluation index 32 Skeleton We scored the DTW distance between

    the participant’s video and the reference video. Reference video Video at test x y Feature vector Unit vector All frames Unit vector information
 for all skeletons 292 314 308
  33. Questionnaire 33 • Difficulty of learning each dance movement
 1:

    very easy to 7: very difficult
 • I think I can learn to dance if I keep practicing . ɹ 1: strongly disagree to 5: strongly agree
 • I could master the dance . ɹ 1: strongly disagree to 5: strongly agree
 • I felt as if I were dancing.ʢonly deepfale presentation groupʣ ɹ 1: strongly disagree to 5: strongly agree
  34. Result

  35. Result 35

  36. Average DTW costs for Dance 1 to Dance 3 36

    There is no significant difference.
  37. Questionnaire 1 37 “I felt as if were dancing.” Strong

    disagree 1 – 2 – 3 – 4 – 5 Strong agree The responses varied among participants.
  38. Questionnaire 1 38 “I felt as if were dancing.” Strong

    disagree 1 – 2 – 3 – 4 – 5 Strong agree The responses varied among participants. A person Three people
  39. 39 Average DTW costs per participant based on the responses

  40. 40 I did not feel as if I were dancing.

    I felt as if I were dancing. Participants who felt as if they werre dancing may be more likely to learn through deepfake video. Average DTW costs per participant based on the responses
  41. Questionnaire 2 41 “I think I can learn to dance


    if I keep practising.” "I think I can learn to dance 
 if I continue to practise" The deepfake video presentation group tended to have lower self-efficacy.
  42. Questionnaire 2 42 “I think I can learn to dance


    if I keep practising.” "I think I can learn to dance 
 if I continue to practise" The deepfake video presentation group tended to have lower self-efficacy. Discussion: Self-efficacy may decrease if the desired movement 
 is too far from one's skill level.
  43. Participants’ positive comment (1/3) 43 • I was able to

    see myself dancing, so it was easy to know how to move . • In the reference video, I cannot understand what kind of movement the dancer was doing. In the deepfake video, the complex movements seemed to be a little easier . • I thought I was able to notice more differences between my movements and the dancer in the PreTraining . • I felt that it was easier to compare the movements of each part of the body because the deepfake had the same body shape. These comments indicate the ability of the deepfake video 
 in improving the understanding of movements.
  44. Participants’ positive comment (2/3) 44 • It was a strange

    feeling because it was a video of myself doing a movement that I should not have been able to do, but it was easy to visualize the movement in my brain . • I was moving my body thinking that I was dancing like in the deepfake video. Sometimes I looked at the mirror image of myself and compared it with the deepfake, and I noticed the points where I was not moving well. The comments indicate the possibility of supporting the movement.
  45. Participants’ positive comment (3/3) 45 • My motivation went up

    because I could see myself getting better . • It was interesting to see a video of myself dancing perfectly 
 because I felt strange . • I was motivated by the fact that I could see how well I was doing . • I could see myself dancing well in the video, so I can enjoy practicing with the illusion that I am dancing well. These comments indicate the ability of deepsake 
 to increase learner motivation.
  46. Participants’ negative comment 46 • There were some noises in

    the deepfake video compared with 
 the reference video, so that I could not understand 
 some details of the movements . • In the video of Dance1, it was difficult to figure out 
 which foot was in front of another one. 
 If the dance movements contain difficult parts, 
 even if using a deepfake video, it was difficult to imagine it . • The quality of the image was not very good, 
 so it was difficult to see the detailed movements of the fingers. Most of these comments were related to 
 the low image quality generation
  47. Discussion 47 The low quality of the generated video, as

    seen in the participants' comments, may have affected the participants' learning . Generating a system with higher quality images that remove the unnaturalness of the images is required.
 There was no significant difference between the groups for each type of dance. The role of dancing difficulty levels in these videos needs further investigation.
 The small screen size (90 cm x 50 cm) may have diminished its role as 
 a mirror and prevented the effect of self-modeling . Using a relatively large display will reduce the error rate.
  48. Summary 48 • We propose a learning method that uses

    deep learning to generate and present a deepfake video that performs the same movements as those of a dancer in a reference video . • We tested whether the deepfake videos generated automatically are effective for dance learning . • The experimental results showed that there was no significant difference between the presentation groups. Reference video Skeleton information learned model Output video
  49. Future plan 49 Skill Morphing : Skill gap still exists.

    Novice Expert
  50. Future plan 50 Skill Morphing : By generating and presenting

    dance movements of an intermediate level between novices and experts, we can practice while referring to dance movements that are one step ahead of our own dance level . 100% Novice 100% Exper t WJTVBM NPUJPO
  51. None
  52. Summary 52 • ਂ૚ֶशΛ༻͍ͯݟຊө૾தͷμϯαʔͷಈ࡞ͱ
 ಉ͡ಈ࡞Λߦ͏ࣗݾө૾Λੜ੒ɾఏࣔ͢Δֶशख๏ΛఏҊɽ • ࣗಈੜ੒ͨࣗ͠ݾө૾͕ɼμϯεֶशʹ༗ޮ͔Ͳ͏͔ݕূɽ • ࣮ݧͷ݁Ռɼఏࣔάϧʔϓؒʹ༗ҙͳࠩ͸ݟΒΕͳ͔ͬͨɽ •

    Ξϯέʔτ݁ՌΛ΋ͱʹࣗݾө૾ఏࣔͷ՝୊Λ੔ཧɽ Reference video Skeleton information learned model Output video
  53. ࠓޙͷ՝୊ 53 εΩϧϞʔϑΟϯά
 ɹॳ৺ऀͱ্ڃऀͷதؒϨϕϧͷμϯεಈ࡞Λੜ੒͢Δٕज़
 
 ɹࣗ෼ͷμϯεϨϕϧΑΓҰาઌͷμϯεͷಈ͖
 ɹΛࢀߟʹ͠ͳ͕Β࿅शͰ͖Δ
 
 100% Novice

    100% Expert visual motion