Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dance Practice System that Shows What You Would Look Like if You Could Master the Dance

Dance Practice System that Shows What You Would Look Like if You Could Master the Dance

This study proposes a dance practice system allowing users to learn dancing by watching videos in which they have mastered the movements of a professional dancer. Video self-modeling, which encourages learners to improve their behavior by watching videos of exemplary behavior by themselves, effectively teaches movement skills. However, creating an ideal dance movement video
is time-consuming and tedious for learners. To solve this problem, we utilize a video generation technique based on deepfake to automatically generate a video of the learners dancing the same movement as the dancer in the reference video. We conducted a user study with 20 participants to verify whether the deepfake video effectively teaches dance movements. The results showed no
significant difference between the groups learning with the original and deepfake videos. In addition, the group using the deepfake video had significantly lower self-efficacy. Based on these experimental results, we discussed the design implications of the system using the deepfake video to support learning dance movements.

Shuhei Tsuchida

June 29, 2022
Tweet

More Decks by Shuhei Tsuchida

Other Decks in Technology

Transcript

  1. Dance Practice System 

    that Shows What You Would Look Like 

    if You Could Master the Dance
    Shuhei Tsuchida†1, Mao Haomin†1, Hieaki Okamoto†2, Yuma Suzuki†2,


    Rintaro Kanada†2, Takayuki Hori†2, Tsutomu Terada†1, Masahiko Tsukamoto†1
    †1 Kobe University
    †2 Softbank Corp.
    8th International Conference on Movement and Computing Conferenc
    e

    22-24 June, 2022

    View Slide

  2. 2
    Demo Video
    Mirror
    Deepfake
    video

    View Slide

  3. Background

    View Slide

  4. 4
    Learn dance movements
    There are many studies to support the acquisition of dance movements.
    Haptic feedback

    [Schönauer et al., ICMI2012]
    Mirror-based system

    [Andreson et al., UIST2013]
    Robot

    [Nakamura et al., IROS2005]
    Auditory feedback

    [Großhauser et al., AES Journal2012]

    View Slide

  5. 5
    Video self-modeling
    A technique of showing a video of oneself mastering dance [Fujimoto et al., ACHI2012]
    It has been used in rehabilitation [Steel et al., Journal of Motor Behavior 2017] and
    skill learning in sports [Ste-Marie et al., Frontiers in Psychology 2011]
    .

    ɹˠ Reported to be potentially effective.

    View Slide

  6. 6
    Problem
    Creating an ideal dance movement video is 

    time-consuming and tedious for learners.
    ʁ
    Reference video
    Original video Ideal movement
    ʁ

    View Slide

  7. 7
    Deepfake technique
    Everybody dance now [Chan et al., ICCV2019]
    https://youtu.be/PCBTZh41Ris

    View Slide

  8. 8
    Research purpose
    Our goal is to support dance learning by creating a video of
    oneself who has mastered dancing using deep learning
    technology and practising by watching the video.

    View Slide

  9. Proposed method

    View Slide

  10. 10
    Reference video Skeleton information
    Skeleton information
    model
    Learned model
    Output video
    Input video Restored video
    [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019).
    Deepfake video generation
    We utilized the offical library of Everybody dance now [1] paper.

    View Slide

  11. 11
    We utilized the offical library of Everybody dance now [1] paper.
    Reference video Skeleton information
    Skeleton information
    model
    Learned model
    Output video
    Input video Restored video
    [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019).
    Deepfake video generation
    Comparison

    View Slide

  12. 12
    Reference video Skeleton information
    Skeleton information
    model
    Learned model
    Output video
    Input video Restored video
    [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019).
    Deepfake video generation
    We utilized the offical library of Everybody dance now [1] paper.
    c c

    View Slide

  13. 13
    Reference video Skeleton information
    Skeleton information
    model
    Learned model
    Output video
    Input video Restored video
    [1] Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros: Everybody Dance Now, 

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5933–5942 (2019).
    Deepfake video generation
    We utilized the offical library of Everybody dance now [1] paper.

    View Slide

  14. Preliminary investigation

    (ⅰ) Types of the movement of input video

    (ⅰⅰ) Characteristics of the movement of reference video

    View Slide

  15. (ⅰ) Types of the movement of input video 15
    Output
    Input
    Learners imitate the dance
    movements according to the
    reference video
    Learners move freely

    View Slide

  16. (ⅰ) Types of the movement of input video 16
    Output
    Input
    Learners imitate the dance
    movements according to the
    reference video
    Learners move freely
    ݟຊө૾ͱಉ͡ಈ࡞Λߦͳ͍ͬͯΔө૾Λ
    ֶशͨ͠ํ͕ग़ྗө૾ͷΫΦϦςΟ͕ߴ͍
    The output video was of higher quality when we input the video 

    in which the learner repeated the same dance movements.

    View Slide

  17. (ⅰⅰ) Characteristics of the movement of reference video 17
    Rotate the arm
    Up-and-down
    Depth
    × One's arm disappears
    × Change one's position
    × Depth representaion
    We should use the reference video that omits these motions in the user study.

    View Slide

  18. User study

    View Slide

  19. User study 19
    Purpos
    e

    Our goal is to verify whether watching videos of themselves performing an
    expert dancer's movements teaches movement skills effectively
    .

    Dance movements (Learning target)ɹ
    Three dance movementsʢeasy - intermidiate - difficultʣ

    Participant
    20 university students in their 20s (19 males and one female
    )

    Conditio
    n

    Original video presentation group

    Deepfake video presentation group

    View Slide

  20. Dance movements (Learning target) 20
    Targets do not contai
    n

    • movemetns in depth


    • movements in up-and-dow
    n

    • arm rotating movement
    s

    • turn one’s back movements

    Dance 1

    Dance 2

    Dance 3


    View Slide

  21. Presentation group 21
    Original video 

    presentation group
    Deepfake video

    presentation group

    View Slide

  22. 22
    Original video 

    presentation group
    Deepfake video

    presentation group
    Presentation group

    View Slide

  23. 23
    Original video 

    presentation group
    Deepfake video

    presentation group
    Presentation group

    View Slide

  24. Experimental process 24
    Day 1
    PreTraining PreTest
    5 min. 3 times
    Day 2
    Training PostTest
    10 min. 3 times
    Day 3
    RetentionTest
    3 times
    Scoring based on Day 1 to 3 videos
    Deepfake video
    Original video
    Original video
    Deepfake video
    model Genrated video

    View Slide

  25. Experimental process 25
    Day 1
    PreTraining PreTest
    5 min. 3 times
    Day 2
    Training PostTest
    10 min. 3 times
    Day 3
    RetentionTest
    3 times
    Scoring based on Day 1 to 3 videos
    Deepfake video
    Original video
    Original video
    Deepfake video
    model Genrated video

    View Slide

  26. Experimental process 26
    Day 1
    PreTraining PreTest
    5 min. 3 times
    Day 2
    Training PostTest
    10 min. 3 times
    Day 3
    RetentionTest
    3 times
    Scoring based on Day 1 to 3 videos
    Deepfake video
    Original video
    Original video
    Deepfake video
    model Genrated video

    View Slide

  27. Experimental process 27
    Day 1
    PreTraining PreTest
    5 min. 3 times
    Day 2
    Training PostTest
    10 min. 3 times
    Day 3
    RetentionTest
    3 times
    Scoring based on Day 1 to 3 videos
    Deepfake video
    Original video
    Original video
    Deepfake video
    model Genrated video

    View Slide

  28. Experimental process 28
    Day 1
    PreTraining PreTest
    5 min. 3 times
    Day 2
    Training PostTest
    10 min. 3 times
    Day 3
    RetentionTest
    3 times
    Scoring based on Day 1 to 3 videos
    Deepfake video
    Original video
    Original video
    Deepfake video
    model Genrated video

    View Slide

  29. Evaluation index 29
    Skeleton
    We scored the DTW distance between the participant’s video and the reference video.
    Reference video
    Video at test
    x
    y
    Feature vector
    Unit vector
    All frames
    Unit vector information

    for all skeletons

    View Slide

  30. Evaluation index 30
    Skeleton
    We scored the DTW distance between the participant’s video and the reference video.
    Reference video
    Video at test
    x
    y
    Feature vector
    Unit vector
    All frames
    Unit vector information

    for all skeletons

    View Slide

  31. Evaluation index 31
    Skeleton
    We scored the DTW distance between the participant’s video and the reference video.
    Reference video
    Video at test
    x
    y
    Feature vector
    Unit vector
    All frames
    Unit vector information

    for all skeletons

    View Slide

  32. Evaluation index 32
    Skeleton
    We scored the DTW distance between the participant’s video and the reference video.
    Reference video
    Video at test
    x
    y
    Feature vector
    Unit vector
    All frames
    Unit vector information

    for all skeletons
    292
    314
    308

    View Slide

  33. Questionnaire 33
    • Difficulty of learning each dance movement

    1: very easy to 7: very difficult

    • I think I can learn to dance if I keep practicing
    .

    ɹ 1: strongly disagree to 5: strongly agree

    • I could master the dance
    .

    ɹ 1: strongly disagree to 5: strongly agree

    • I felt as if I were dancing.ʢonly deepfale presentation groupʣ
    ɹ 1: strongly disagree to 5: strongly agree

    View Slide

  34. Result

    View Slide

  35. Result 35

    View Slide

  36. Average DTW costs for Dance 1 to Dance 3 36
    There is no significant difference.

    View Slide

  37. Questionnaire 1 37
    “I felt as if were dancing.”
    Strong disagree 1 – 2 – 3 – 4 – 5 Strong agree
    The responses varied among participants.

    View Slide

  38. Questionnaire 1 38
    “I felt as if were dancing.”
    Strong disagree 1 – 2 – 3 – 4 – 5 Strong agree
    The responses varied among participants.
    A person Three people

    View Slide

  39. 39
    Average DTW costs per participant based on the responses

    View Slide

  40. 40
    I did not feel as if I were dancing. I felt as if I were dancing.
    Participants who felt as if they werre dancing may be
    more likely to learn through deepfake video.
    Average DTW costs per participant based on the responses

    View Slide

  41. Questionnaire 2 41
    “I think I can learn to dance

    if I keep practising.”
    "I think I can learn to dance 

    if I continue to practise"
    The deepfake video presentation group tended to have lower self-efficacy.

    View Slide

  42. Questionnaire 2 42
    “I think I can learn to dance

    if I keep practising.”
    "I think I can learn to dance 

    if I continue to practise"
    The deepfake video presentation group tended to have lower self-efficacy.
    Discussion: Self-efficacy may decrease if the desired movement 

    is too far from one's skill level.

    View Slide

  43. Participants’ positive comment (1/3) 43
    • I was able to see myself dancing, so it was easy to know how to move
    .

    • In the reference video, I cannot understand what kind of movement the
    dancer was doing. In the deepfake video, the complex movements
    seemed to be a little easier
    .

    • I thought I was able to notice more differences between my movements
    and the dancer in the PreTraining
    .

    • I felt that it was easier to compare the movements of each part of the
    body because the deepfake had the same body shape.
    These comments indicate the ability of the deepfake video 

    in improving the understanding of movements.

    View Slide

  44. Participants’ positive comment (2/3) 44
    • It was a strange feeling because it was a video of myself doing a
    movement that I should not have been able to do, but it was easy to
    visualize the movement in my brain
    .

    • I was moving my body thinking that I was dancing like in the deepfake
    video. Sometimes I looked at the mirror image of myself and compared it
    with the deepfake, and I noticed the points where I was not moving well.
    The comments indicate the possibility of supporting the movement.

    View Slide

  45. Participants’ positive comment (3/3) 45
    • My motivation went up because I could see myself getting better
    .

    • It was interesting to see a video of myself dancing perfectly 

    because I felt strange
    .

    • I was motivated by the fact that I could see how well I was doing
    .

    • I could see myself dancing well in the video, so I can enjoy practicing
    with the illusion that I am dancing well.
    These comments indicate the ability of deepsake 

    to increase learner motivation.

    View Slide

  46. Participants’ negative comment 46
    • There were some noises in the deepfake video compared with 

    the reference video, so that I could not understand 

    some details of the movements
    .

    • In the video of Dance1, it was difficult to figure out 

    which foot was in front of another one. 

    If the dance movements contain difficult parts, 

    even if using a deepfake video, it was difficult to imagine it
    .

    • The quality of the image was not very good, 

    so it was difficult to see the detailed movements of the fingers.
    Most of these comments were related to 

    the low image quality generation

    View Slide

  47. Discussion 47
    The low quality of the generated video, as seen in the participants'
    comments, may have affected the participants' learning
    .

    Generating a system with higher quality images that remove the
    unnaturalness of the images is required.

    There was no significant difference between the groups for each type of
    dance.


    The role of dancing difficulty levels in these videos needs further
    investigation.

    The small screen size (90 cm x 50 cm) may have diminished its role as 

    a mirror and prevented the effect of self-modeling
    .

    Using a relatively large display will reduce the error rate.

    View Slide

  48. Summary 48
    • We propose a learning method that uses deep learning to generate and
    present a deepfake video that performs the same movements as those of
    a dancer in a reference video
    .

    • We tested whether the deepfake videos generated automatically are
    effective for dance learning
    .

    • The experimental results showed that there was no significant difference
    between the presentation groups.
    Reference
    video
    Skeleton
    information
    learned model
    Output video

    View Slide

  49. Future plan 49
    Skill Morphing
    :

    Skill gap still exists.
    Novice Expert

    View Slide

  50. Future plan 50
    Skill Morphing
    :

    By generating and presenting dance movements of an intermediate level
    between novices and experts, we can practice while referring to dance
    movements that are one step ahead of our own dance level
    .

    100% Novice 100% Exper
    t

    WJTVBM
    NPUJPO

    View Slide

  51. View Slide

  52. Summary 52
    • ਂ૚ֶशΛ༻͍ͯݟຊө૾தͷμϯαʔͷಈ࡞ͱ

    ಉ͡ಈ࡞Λߦ͏ࣗݾө૾Λੜ੒ɾఏࣔ͢Δֶशख๏ΛఏҊɽ
    • ࣗಈੜ੒ͨࣗ͠ݾө૾͕ɼμϯεֶशʹ༗ޮ͔Ͳ͏͔ݕূɽ
    • ࣮ݧͷ݁Ռɼఏࣔάϧʔϓؒʹ༗ҙͳࠩ͸ݟΒΕͳ͔ͬͨɽ
    • Ξϯέʔτ݁ՌΛ΋ͱʹࣗݾө૾ఏࣔͷ՝୊Λ੔ཧɽ
    Reference
    video
    Skeleton
    information
    learned model
    Output video

    View Slide

  53. ࠓޙͷ՝୊ 53
    εΩϧϞʔϑΟϯά

    ɹॳ৺ऀͱ্ڃऀͷதؒϨϕϧͷμϯεಈ࡞Λੜ੒͢Δٕज़


    ɹࣗ෼ͷμϯεϨϕϧΑΓҰาઌͷμϯεͷಈ͖

    ɹΛࢀߟʹ͠ͳ͕Β࿅शͰ͖Δ


    100% Novice 100% Expert
    visual
    motion

    View Slide