A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music

A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music

Relative to other datasets, state-of-the-art tempo estimation algorithms perform poorly on the GiantSteps Tempo dataset for electronic dance music (EDM). In order to investigate why, we conducted a large-scale, crowdsourced experiment involving 266 participants from two distinct groups. The quality of the collected data was evaluated with regard to the participants’ input devices and background. In the data itself we observed significant tempo ambiguities, which we attribute to annotator subjectivity and tempo instability. As a further contribution, we then constructed new annotations consisting of tempo distributions for each track. Using these annotations, we re-evaluated two recent state-of-the-art tempo estimation systems achieving significantly improved results. The main conclusions of this investigation are that current tempo estimation systems perform better than previously thought and that evaluation quality needs to be improved. The new crowdsourced annotations will be released for evaluation purposes.
https://www.youtube.com/watch?v=2kqBMgbXzBI

5956d4677f50a8584f8a127d3240103d?s=128

Hendrik Schreiber

September 25, 2018
Tweet

Transcript

  1. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music Hendrik Schreiber
 tagtraum industries incorporated hs@tagtraum.com / @h_schreiber Meinard Müller
 International AudioLabs Erlangen meinard.mueller@audiolabs-erlangen.de
  2. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg EDM Tempo Estimation Piece of Cake?
  3. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg GiantSteps Tempo The only EDM Dataset • Released by Knees et al. in 2015 • 664 Beatport previews (2min) • Created by scraping a forum • Used for benchmarking Knees, Peter, et al. "Two Data Sets for Tempo Estimation and Key Detection in Electronic Dance Music Annotated from User Corrections." ISMIR. 2015.
  4. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Results GiantSteps 0% 25% 50% 75% 100% 90.2% Cross DJ Reported Accuracy2 results
  5. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg ACM Mirum Gtzan ISMIR Songs GiantSteps 0% 25% 50% 75% 100% 90.2% 93.3% 95% 97.6% Böck et al. Cross DJ Results Reported Accuracy2 results Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015.
  6. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg ACM Mirum Gtzan ISMIR Songs GiantSteps 0% 25% 50% 75% 100% 90.2% 93.3% 95% 97.6% Böck et al. Cross DJ Results Reported Accuracy2 results ? Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015.
  7. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Why is the tempo estimation accuracy so low?
  8. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Experiment • 266 Participants tapped along to half-overlapping 30s segments of GiantSteps tracks • We collected tapping data • 18,684 segment submissions, ~28/track
  9. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Results • Three tracks have no real beat
  10. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg 50 100 150 200 0 0.2 0.4 0.6 0.8 BPM salience segments 1-3 segment 4 segments 5-7 Figure 5: Tempo salience distributions for segments of the track ‘Rude Boy feat. Omar LinX Union Vocal Mix’ by Zeds Dead (Beatport id 1728723). The track’s tempo changes in segment 4, leading to four distinct peaks. With JSD = 0.44 its Jensen-Shannon divergence is high. 0 0.2 0.4 0.6 0.8 0 10 20 30 µJSD µJSD + 2 JSD JSD tracks in % Figure 6: Distribution of tracks in the dataset per terval with a bin width of 0.05. The blue line sho and the red line shows µJSD + 2 JSD. Genre A(Tseg) A(Ttrack ) all 0.25 0.26 Tapped Tempo Distribution Results • Three tracks have no real beat • Many tracks exhibit perceptual tempo ambiguities (genre- dependent) Peaks related by factor 2
  11. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg 50 100 150 200 0 0.2 0.4 0.6 0.8 BPM salience segments 1-3 segment 4 segments 5-7 Figure 5: Tempo salience distributions for segments of the track ‘Rude Boy feat. Omar LinX Union Vocal Mix’ by Zeds Dead (Beatport id 1728723). The track’s tempo changes in segment 4, leading to four distinct peaks. With JSD = 0.44 its Jensen-Shannon divergence is high. 0 0.2 0.4 0.6 0.8 0 10 20 30 µJSD µJSD + 2 JSD JSD tracks in % Figure 6: Distribution of tracks in the dataset per terval with a bin width of 0.05. The blue line sho and the red line shows µJSD + 2 JSD. Genre A(Tseg) A(Ttrack ) all 0.25 0.26 Tapped Tempo Distribution Results • Three tracks have no real beat • Many tracks exhibit perceptual tempo ambiguities (genre- dependent) • Some tracks contain tempo changes/no global tempo Tempo1 Tempo2
  12. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Results Accuracy1 Accuracy2 0% 25% 50% 75% 100% 91.1% 81.5% Original ground-truth vs. newly derived ground-truth 18.5% 8.1%
  13. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Results Accuracy1 Accuracy2 0% 25% 50% 75% 100% 91.1% 81.5% Original ground-truth vs. newly derived ground-truth 18.5% 8.1% Large disagreement
  14. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Results Accuracy1 Accuracy2 0% 25% 50% 75% 100% 91.1% 81.5% Original ground-truth vs. newly derived ground-truth 18.5% 8.1% Large disagreement Only partially explained by octave error
  15. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Results Accuracy1 for two estimation systems:
 original ground-truth vs. new ground truth 0% 25% 50% 75% 100% 70.2% 64.8% 63.1% 58.9% Original New 5.9% 7.1% böck et al. schreiber Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015. Schreiber, Hendrik, and M. Müller. "A post-processing procedure for improving music tempo estimates using supervised learning." ISMIR, 2017.
  16. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Results Accuracy1 for two estimation systems:
 original ground-truth vs. new ground truth 0% 25% 50% 75% 100% 70.2% 64.8% 63.1% 58.9% Original New 5.9% 7.1% böck et al. schreiber Two different systems perform much better on new ground-truth Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015. Schreiber, Hendrik, and M. Müller. "A post-processing procedure for improving music tempo estimates using supervised learning." ISMIR, 2017.
  17. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Results Accuracy2 for two estimation systems:
 original ground-truth vs. new ground truth 0% 25% 50% 75% 100% 95.2% 94% 88.7% 86.4% Original New 7.6% 6.5% böck et al. schreiber Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015. Schreiber, Hendrik, and M. Müller. "A post-processing procedure for improving music tempo estimates using supervised learning." ISMIR, 2017.
  18. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Results Accuracy2 for two estimation systems:
 original ground-truth vs. new ground truth 0% 25% 50% 75% 100% 95.2% 94% 88.7% 86.4% Original New 7.6% 6.5% böck et al. schreiber ! Böck, Sebastian, Florian Krebs, and Gerhard Widmer. "Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters." ISMIR. 2015. Schreiber, Hendrik, and M. Müller. "A post-processing procedure for improving music tempo estimates using supervised learning." ISMIR, 2017.
  19. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Conclusions • Some tracks are not suitable for global tempo estimation
  20. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Conclusions • Some tracks are not suitable for global tempo estimation • Considerable number of bad annotations
  21. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Conclusions • Some tracks are not suitable for global tempo estimation • Considerable number of bad annotations • Accuracy of State-Of-The-Art systems is higher than previously thought
  22. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Conclusions • Some tracks are not suitable for global tempo estimation • Considerable number of bad annotations • Accuracy of State-Of-The-Art systems is higher than previously thought • We need to improve our datasets!
 (tempo distributions, not single tempo annotations)
  23. INTERNATIONAL AUDIO LABORATORIES ERLANGEN A joint institution of Fraunhofer IIS

    and Universität Erlangen-Nürnberg Thank you. All data and annotations are available at: http://www.tagtraum.com/tempo_estimation.html