Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music

A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music

Relative to other datasets, state-of-the-art tempo estimation algorithms perform poorly on the GiantSteps Tempo dataset for electronic dance music (EDM). In order to investigate why, we conducted a large-scale, crowdsourced experiment involving 266 participants from two distinct groups. The quality of the collected data was evaluated with regard to the participants’ input devices and background. In the data itself we observed significant tempo ambiguities, which we attribute to annotator subjectivity and tempo instability. As a further contribution, we then constructed new annotations consisting of tempo distributions for each track. Using these annotations, we re-evaluated two recent state-of-the-art tempo estimation systems achieving significantly improved results. The main conclusions of this investigation are that current tempo estimation systems perform better than previously thought and that evaluation quality needs to be improved. The new crowdsourced annotations will be released for evaluation purposes.
https://www.youtube.com/watch?v=2kqBMgbXzBI

Hendrik Schreiber

September 25, 2018
Tweet

More Decks by Hendrik Schreiber

Other Decks in Science

Transcript

  1. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    A Crowdsourced Experiment for Tempo Estimation of
    Electronic Dance Music
    Hendrik Schreiber

    tagtraum industries incorporated
    [email protected] / @h_schreiber
    Meinard Müller

    International AudioLabs Erlangen
    [email protected]

    View Slide

  2. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    EDM Tempo Estimation
    Piece of Cake?

    View Slide

  3. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    GiantSteps Tempo
    The only EDM Dataset
    • Released by Knees et al. in 2015
    • 664 Beatport previews (2min)
    • Created by scraping a forum
    • Used for benchmarking
    Knees, Peter, et al. "Two Data Sets for Tempo Estimation and Key Detection in Electronic Dance Music Annotated from User Corrections." ISMIR. 2015.

    View Slide

  4. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Results
    GiantSteps
    0% 25% 50% 75% 100%
    90.2% Cross DJ
    Reported Accuracy2 results

    View Slide

  5. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    ACM Mirum
    Gtzan
    ISMIR Songs
    GiantSteps
    0% 25% 50% 75% 100%
    90.2%
    93.3%
    95%
    97.6%
    Böck et al.
    Cross DJ
    Results
    Reported Accuracy2 results
    Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015.

    View Slide

  6. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    ACM Mirum
    Gtzan
    ISMIR Songs
    GiantSteps
    0% 25% 50% 75% 100%
    90.2%
    93.3%
    95%
    97.6%
    Böck et al.
    Cross DJ
    Results
    Reported Accuracy2 results
    ?
    Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015.

    View Slide

  7. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Why is the tempo estimation
    accuracy so low?

    View Slide

  8. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Experiment
    • 266 Participants tapped along
    to half-overlapping 30s
    segments of GiantSteps
    tracks
    • We collected tapping data
    • 18,684 segment submissions,
    ~28/track

    View Slide

  9. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Results
    • Three tracks have no
    real beat

    View Slide

  10. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    50 100 150 200
    0
    0.2
    0.4
    0.6
    0.8
    BPM
    salience
    segments 1-3
    segment 4
    segments 5-7
    Figure 5: Tempo salience distributions for segments of
    the track ‘Rude Boy feat. Omar LinX Union Vocal Mix’
    by Zeds Dead (Beatport id 1728723). The track’s tempo
    changes in segment 4, leading to four distinct peaks. With
    JSD = 0.44 its Jensen-Shannon divergence is high.
    0 0.2 0.4 0.6 0.8
    0
    10
    20
    30 µJSD µJSD + 2 JSD
    JSD
    tracks in %
    Figure 6: Distribution of tracks in the dataset per
    terval with a bin width of 0.05. The blue line sho
    and the red line shows µJSD + 2 JSD.
    Genre A(Tseg) A(Ttrack
    )
    all 0.25 0.26
    Tapped Tempo Distribution
    Results
    • Three tracks have no
    real beat
    • Many tracks exhibit
    perceptual tempo
    ambiguities (genre-
    dependent)
    Peaks related
    by factor 2

    View Slide

  11. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    50 100 150 200
    0
    0.2
    0.4
    0.6
    0.8
    BPM
    salience
    segments 1-3
    segment 4
    segments 5-7
    Figure 5: Tempo salience distributions for segments of
    the track ‘Rude Boy feat. Omar LinX Union Vocal Mix’
    by Zeds Dead (Beatport id 1728723). The track’s tempo
    changes in segment 4, leading to four distinct peaks. With
    JSD = 0.44 its Jensen-Shannon divergence is high.
    0 0.2 0.4 0.6 0.8
    0
    10
    20
    30 µJSD µJSD + 2 JSD
    JSD
    tracks in %
    Figure 6: Distribution of tracks in the dataset per
    terval with a bin width of 0.05. The blue line sho
    and the red line shows µJSD + 2 JSD.
    Genre A(Tseg) A(Ttrack
    )
    all 0.25 0.26
    Tapped Tempo Distribution
    Results
    • Three tracks have no
    real beat
    • Many tracks exhibit
    perceptual tempo
    ambiguities (genre-
    dependent)
    • Some tracks contain
    tempo changes/no
    global tempo
    Tempo1
    Tempo2

    View Slide

  12. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Results
    Accuracy1
    Accuracy2
    0% 25% 50% 75% 100%
    91.1%
    81.5%
    Original ground-truth vs. newly derived ground-truth
    18.5%
    8.1%

    View Slide

  13. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Results
    Accuracy1
    Accuracy2
    0% 25% 50% 75% 100%
    91.1%
    81.5%
    Original ground-truth vs. newly derived ground-truth
    18.5%
    8.1%
    Large
    disagreement

    View Slide

  14. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Results
    Accuracy1
    Accuracy2
    0% 25% 50% 75% 100%
    91.1%
    81.5%
    Original ground-truth vs. newly derived ground-truth
    18.5%
    8.1%
    Large
    disagreement
    Only partially
    explained by
    octave error

    View Slide

  15. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Results
    Accuracy1 for two estimation systems:

    original ground-truth vs. new ground truth
    0% 25% 50% 75% 100%
    70.2%
    64.8%
    63.1%
    58.9%
    Original New
    5.9%
    7.1%
    böck et al.
    schreiber
    Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015.

    Schreiber, Hendrik, and M. Müller. "A post-processing procedure for improving music tempo estimates using supervised learning." ISMIR, 2017.

    View Slide

  16. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Results
    Accuracy1 for two estimation systems:

    original ground-truth vs. new ground truth
    0% 25% 50% 75% 100%
    70.2%
    64.8%
    63.1%
    58.9%
    Original New
    5.9%
    7.1%
    böck et al.
    schreiber
    Two different systems
    perform much better
    on new ground-truth
    Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015.

    Schreiber, Hendrik, and M. Müller. "A post-processing procedure for improving music tempo estimates using supervised learning." ISMIR, 2017.

    View Slide

  17. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Results
    Accuracy2 for two estimation systems:

    original ground-truth vs. new ground truth
    0% 25% 50% 75% 100%
    95.2%
    94%
    88.7%
    86.4%
    Original New
    7.6%
    6.5%
    böck et al.
    schreiber
    Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015.

    Schreiber, Hendrik, and M. Müller. "A post-processing procedure for improving music tempo estimates using supervised learning." ISMIR, 2017.

    View Slide

  18. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Results
    Accuracy2 for two estimation systems:

    original ground-truth vs. new ground truth
    0% 25% 50% 75% 100%
    95.2%
    94%
    88.7%
    86.4%
    Original New
    7.6%
    6.5%
    böck et al.
    schreiber
    !
    Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015.

    Schreiber, Hendrik, and M. Müller. "A post-processing procedure for improving music tempo estimates using supervised learning." ISMIR, 2017.

    View Slide

  19. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Conclusions
    • Some tracks are not suitable for global tempo estimation

    View Slide

  20. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Conclusions
    • Some tracks are not suitable for global tempo estimation
    • Considerable number of bad annotations

    View Slide

  21. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Conclusions
    • Some tracks are not suitable for global tempo estimation
    • Considerable number of bad annotations
    • Accuracy of State-Of-The-Art systems is higher than
    previously thought

    View Slide

  22. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Conclusions
    • Some tracks are not suitable for global tempo estimation
    • Considerable number of bad annotations
    • Accuracy of State-Of-The-Art systems is higher than
    previously thought
    • We need to improve our datasets!

    (tempo distributions, not single tempo annotations)

    View Slide

  23. INTERNATIONAL AUDIO LABORATORIES ERLANGEN
    A joint institution of Fraunhofer IIS and Universität Erlangen-Nürnberg
    Thank you.
    All data and annotations are available at:
    http://www.tagtraum.com/tempo_estimation.html

    View Slide