Improving Genre Annotations for the Million Song Dataset

Improving Genre Annotations for the Million Song Dataset

Abstract

Any automatic music genre recognition (MGR) system must show its value in tests against a ground truth dataset. Recently, the public dataset most often used for this purpose has been proven problematic, because of mislabeling, duplications, and its relatively small size. Another dataset, the Million Song Dataset (MSD), a collection of features and metadata for one million tracks, unfortunately does not contain readily accessible genre labels. Therefore, multiple attempts have been made to add song-level genre annotations, which are required for supervised machine learning tasks. Thus far, the quality of these annotations has not been evaluated.
In this paper we present a method for creating additional genre annotations for the MSD from databases, which contain multiple, crowd-sourced genre labels per song (Last.fm, beaTunes). Based on label co-occurrence rates, we derive taxonomies, which allow inference of top- level genres. These are most often used in MGR systems.
We then combine multiple datasets using majority voting. This both promises a more reliable ground truth and allows the evaluation of the newly generated and preexisting datasets. To facilitate further research, all derived genre annotations are publicly available on our website.

http://www.tagtraum.com/msd_genre_datasets.html

The paper was published at ISMIR 2015.

5956d4677f50a8584f8a127d3240103d?s=128

Hendrik Schreiber

October 27, 2015
Tweet

Transcript

  1. 1.

    Improving Genre Annotations for the Million Song Dataset Hendrik Schreiber


    tagtraum industries incorporated hs@tagtraum.com / @h_schreiber October 27, ISMIR 2015 Málaga
  2. 2.
  3. 4.

    TL;DR 1. New, high-quality genre annotations for
 part of the

    Million Song Dataset (MSD) 2. Involved in automatic music genre recognition?
 
 Please use them!
  4. 5.

    TL;DR 1. New, high-quality genre annotations for
 part of the

    Million Song Dataset (MSD) 2. Involved in automatic music genre recognition?
 
 Please use them! 3. http://www.tagtraum.com/msd_genre_datasets.html
 
 Linked to from the MSD site—thanks, Colin!
  5. 7.

    Automatic Music Genre Recognition
 is among the most popular MIR

    tasks. * Why? * Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. A survey of audio-based music classification and annotation. Multimedia, IEEE Transactions on, 13(2):303–319, 2011.
  6. 8.

    Automatic Music Genre Recognition
 is among the most popular MIR

    tasks. * Why? But we don’t use a large, standard dataset. * Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. A survey of audio-based music classification and annotation. Multimedia, IEEE Transactions on, 13(2):303–319, 2011.
  7. 9.

    J Intell Inf Syst (2013) 41:371–406 377 Table 2 Datasets

    used in MGR, the type of data they contain, and the percentage of experimental work (435 references) in our survey (Sturm 2012a) that use them Dataset Description % Private Constructed for research but not made available 58 GTZAN Audio; http://marsyas.info/download/data_sets 23 ISMIR2004 Audio; http://ismir2004.ismir.net/genre_contest 17 Latin (Silla et al. 2008) Features; http://www.ppgia.pucpr.br/∼silla/lmd/ 5 Ballroom Audio; http://mtg.upf.edu/ismir2004/contest/tempoContest/ 3 Homburg Audio; http://www-ai.cs.uni-dortmund.de/audio.html 3 (Homburg et al. 2005) Bodhidharma Symbolic; http://jmir.sourceforge.net/Codaich.html 3 USPOP2002 Audio; http://labrosa.ee.columbia.edu/projects/musicsim/ 2 (Berenzweig et al. 2004) uspop2002.html 1517-artists Audio; http://www.seyerlehner.info 1 RWC (Goto et al. 2003) Audio; http://staff.aist.go.jp/m.goto/RWC-MDB/ 1 SOMeJB Features; http://www.ifs.tuwien.ac.at/∼andi/somejb/ 1 SLAC Audio & symbols; http://jmir.sourceforge.net/Codaich.html 1 SALAMI (Smith et al. 2011) Features; http://ddmal.music.mcgill.ca/research/salami 0.7 Unique Features; http://www.seyerlehner.info 0.7 Million song Features; http://labrosa.ee.columbia.edu/millionsong/ 0.7 (Bertin-Mahieux et al. 2011) ISMIS2011 Features; http://tunedit.org/challenge/music-retrieval 0.4 All datasets listed after Private are public Used Datasets Sturm, B.L. A survey of evaluation in music genre recognition. In Proc. Adaptive Multimedia Retrieval. 2012.
  8. 10.

    J Intell Inf Syst (2013) 41:371–406 377 Table 2 Datasets

    used in MGR, the type of data they contain, and the percentage of experimental work (435 references) in our survey (Sturm 2012a) that use them Dataset Description % Private Constructed for research but not made available 58 GTZAN Audio; http://marsyas.info/download/data_sets 23 ISMIR2004 Audio; http://ismir2004.ismir.net/genre_contest 17 Latin (Silla et al. 2008) Features; http://www.ppgia.pucpr.br/∼silla/lmd/ 5 Ballroom Audio; http://mtg.upf.edu/ismir2004/contest/tempoContest/ 3 Homburg Audio; http://www-ai.cs.uni-dortmund.de/audio.html 3 (Homburg et al. 2005) Bodhidharma Symbolic; http://jmir.sourceforge.net/Codaich.html 3 USPOP2002 Audio; http://labrosa.ee.columbia.edu/projects/musicsim/ 2 (Berenzweig et al. 2004) uspop2002.html 1517-artists Audio; http://www.seyerlehner.info 1 RWC (Goto et al. 2003) Audio; http://staff.aist.go.jp/m.goto/RWC-MDB/ 1 SOMeJB Features; http://www.ifs.tuwien.ac.at/∼andi/somejb/ 1 SLAC Audio & symbols; http://jmir.sourceforge.net/Codaich.html 1 SALAMI (Smith et al. 2011) Features; http://ddmal.music.mcgill.ca/research/salami 0.7 Unique Features; http://www.seyerlehner.info 0.7 Million song Features; http://labrosa.ee.columbia.edu/millionsong/ 0.7 (Bertin-Mahieux et al. 2011) ISMIS2011 Features; http://tunedit.org/challenge/music-retrieval 0.4 All datasets listed after Private are public Used Datasets Sturm, B.L. A survey of evaluation in music genre recognition. In Proc. Adaptive Multimedia Retrieval. 2012. Not reproducible
  9. 11.

    J Intell Inf Syst (2013) 41:371–406 377 Table 2 Datasets

    used in MGR, the type of data they contain, and the percentage of experimental work (435 references) in our survey (Sturm 2012a) that use them Dataset Description % Private Constructed for research but not made available 58 GTZAN Audio; http://marsyas.info/download/data_sets 23 ISMIR2004 Audio; http://ismir2004.ismir.net/genre_contest 17 Latin (Silla et al. 2008) Features; http://www.ppgia.pucpr.br/∼silla/lmd/ 5 Ballroom Audio; http://mtg.upf.edu/ismir2004/contest/tempoContest/ 3 Homburg Audio; http://www-ai.cs.uni-dortmund.de/audio.html 3 (Homburg et al. 2005) Bodhidharma Symbolic; http://jmir.sourceforge.net/Codaich.html 3 USPOP2002 Audio; http://labrosa.ee.columbia.edu/projects/musicsim/ 2 (Berenzweig et al. 2004) uspop2002.html 1517-artists Audio; http://www.seyerlehner.info 1 RWC (Goto et al. 2003) Audio; http://staff.aist.go.jp/m.goto/RWC-MDB/ 1 SOMeJB Features; http://www.ifs.tuwien.ac.at/∼andi/somejb/ 1 SLAC Audio & symbols; http://jmir.sourceforge.net/Codaich.html 1 SALAMI (Smith et al. 2011) Features; http://ddmal.music.mcgill.ca/research/salami 0.7 Unique Features; http://www.seyerlehner.info 0.7 Million song Features; http://labrosa.ee.columbia.edu/millionsong/ 0.7 (Bertin-Mahieux et al. 2011) ISMIS2011 Features; http://tunedit.org/challenge/music-retrieval 0.4 All datasets listed after Private are public Used Datasets Sturm, B.L. A survey of evaluation in music genre recognition. In Proc. Adaptive Multimedia Retrieval. 2012. Not reproducible Hardly used
  10. 12.

    J Intell Inf Syst (2013) 41:371–406 377 Table 2 Datasets

    used in MGR, the type of data they contain, and the percentage of experimental work (435 references) in our survey (Sturm 2012a) that use them Dataset Description % Private Constructed for research but not made available 58 GTZAN Audio; http://marsyas.info/download/data_sets 23 ISMIR2004 Audio; http://ismir2004.ismir.net/genre_contest 17 Latin (Silla et al. 2008) Features; http://www.ppgia.pucpr.br/∼silla/lmd/ 5 Ballroom Audio; http://mtg.upf.edu/ismir2004/contest/tempoContest/ 3 Homburg Audio; http://www-ai.cs.uni-dortmund.de/audio.html 3 (Homburg et al. 2005) Bodhidharma Symbolic; http://jmir.sourceforge.net/Codaich.html 3 USPOP2002 Audio; http://labrosa.ee.columbia.edu/projects/musicsim/ 2 (Berenzweig et al. 2004) uspop2002.html 1517-artists Audio; http://www.seyerlehner.info 1 RWC (Goto et al. 2003) Audio; http://staff.aist.go.jp/m.goto/RWC-MDB/ 1 SOMeJB Features; http://www.ifs.tuwien.ac.at/∼andi/somejb/ 1 SLAC Audio & symbols; http://jmir.sourceforge.net/Codaich.html 1 SALAMI (Smith et al. 2011) Features; http://ddmal.music.mcgill.ca/research/salami 0.7 Unique Features; http://www.seyerlehner.info 0.7 Million song Features; http://labrosa.ee.columbia.edu/millionsong/ 0.7 (Bertin-Mahieux et al. 2011) ISMIS2011 Features; http://tunedit.org/challenge/music-retrieval 0.4 All datasets listed after Private are public Used Datasets Sturm, B.L. A survey of evaluation in music genre recognition. In Proc. Adaptive Multimedia Retrieval. 2012. Not reproducible Has its issues Hardly used
  11. 13.
  12. 14.

    Too Much Britney • Small: 1,000 tracks • Replicas •

    Excerpts from the same recording • Versions (same music but different recordings) • Mis-labelings • Distortions • Excerpts by the same artists: 35% of Reggae excerpts are by Bob Marley, 24% of Pop excerpts are by Britney Spears, … Sturm, B.L. An analysis of the GTZAN music genre dataset. In Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies, pages 7–12. ACM, 2012.
  13. 17.

    Last.fm MSD Tags • 522,366 unique tags • 505,216 tracks

    with at least one tag, i.e. multiple tags per song • No explicit relationships between tags • Many tags are genre related, but not all of them:
 
 rock 101,071
 pop 69,159
 alternative 55,777
 indie 48,175
 electronic 46,270
 female vocalists 42,565
 favorites 39,921
 love 34,901
 dance 33,618
 00s 31,432
 … http://labrosa.ee.columbia.edu/millionsong/lastfm
  14. 18.

    Last.fm MSD Tags • 522,366 unique tags • 505,216 tracks

    with at least one tag, i.e. multiple tags per song • No explicit relationships between tags • Many tags are genre related, but not all of them:
 
 rock 101,071
 pop 69,159
 alternative 55,777
 indie 48,175
 electronic 46,270
 female vocalists 42,565
 favorites 39,921
 love 34,901
 dance 33,618
 00s 31,432
 … http://labrosa.ee.columbia.edu/millionsong/lastfm Yes, great data. But no tag hierarchies/ relationships.
 What’s a genre, what’s not? What exactly is the ground truth here that’s usable for MGR?
  15. 19.

    Top-MAGD Annotations • Album-level annotations scraped from All Music Guide

    website • 13 unique tags • 406,427 labeled tracks:
 
 Pop/Rock 238,786
 Electronic 41,075
 Rap 20,939
 Jazz 17,836
 Latin 17,590
 R&B 14,335
 International 14,242
 Country 11,772
 Reggae 6,946
 Blues 6,836
 Vocal 6,195
 Folk 5,865
 New Age 4,010 Alexander Schindler, Rudolf Mayer, and Andreas Rauber. Facilitating comprehensive benchmarking experiments on the million song dataset. In Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), pages 469–474, 2012.
  16. 20.

    Top-MAGD Annotations • Album-level annotations scraped from All Music Guide

    website • 13 unique tags • 406,427 labeled tracks:
 
 Pop/Rock 238,786
 Electronic 41,075
 Rap 20,939
 Jazz 17,836
 Latin 17,590
 R&B 14,335
 International 14,242
 Country 11,772
 Reggae 6,946
 Blues 6,836
 Vocal 6,195
 Folk 5,865
 New Age 4,010 Is this still useful? Alexander Schindler, Rudolf Mayer, and Andreas Rauber. Facilitating comprehensive benchmarking experiments on the million song dataset. In Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), pages 469–474, 2012.
  17. 21.

    Top-MAGD Annotations • Album-level annotations scraped from All Music Guide

    website • 13 unique tags • 406,427 labeled tracks:
 
 Pop/Rock 238,786
 Electronic 41,075
 Rap 20,939
 Jazz 17,836
 Latin 17,590
 R&B 14,335
 International 14,242
 Country 11,772
 Reggae 6,946
 Blues 6,836
 Vocal 6,195
 Folk 5,865
 New Age 4,010 Alexander Schindler, Rudolf Mayer, and Andreas Rauber. Facilitating comprehensive benchmarking experiments on the million song dataset. In Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), pages 469–474, 2012. Also, great job.
 But the granularity is not really what we want:
 Metallica just isn’t Britney.
 Has anybody verified the annotations?
 Are album-level annotations good enough?
  18. 23.

    How 1. Make sure all datasets use the same
 labels

    (via taxonomies) 2. Use datasets to evaluate each other 3. Generate new & improved dataset 5. Hip-Hop Hip-Hop Jazz Reggae Latin Soundtrack 6. Hip-Hop/Rap R&B Alternative R&B Dance Jazz 7. Soundtrack Soundtrack Dance Soundtrack House Electronica/Dance 8. R&B Jazz R&B Blues Otros M#✏(Rock) 9. Electronic Country Rock/Pop Electronic Blues Altern. & Punk 10. Country Altern. & Punk Soundtrack Rap Electronica Hip-Hop/Rap Table 1. Top ten genres used by beaTunes users with different languages. N denotes the number of submissions in millions. Co-Occurrence Rank 1. 2. 3. 4. Rock Rock (0.609) Pop (0.057) Alternative (0.026) Rock/Pop (0.016) Pop Pop (0.593) Rock (0.077) Rock/Pop (0.014) R&B (0.013) Alternative Alternative (0.394) Rock (0.156) Pop (0.052) Alternative/Punk (0.036) R&B R&B (0.566) Pop (0.061) Soul (0.036) R&B/Soul (0.033) Soundtrack Soundtrack (0.754) Rock (0.024) Pop (0.022) Game (0.011) ... ... ... ... ... Table 2. Genre labels in the beaTunes database and their top four co-occurring labels ordered by relative strength given in parenthesis. The underlying values from the co-occurrence matrix C were computed taking only submissions by English speakers and the 1,000 most-used labels into account. (2) Because this rule allows a genre to be a sub-genre of multiple genres, we add: a is a direct sub-genre of b, iff a is a sub-genre of b ^ C a,b > C a,c with c 6= a ^ c 6= b; a, b, c 2 G (4) By finding all direct sub-genres and their parents, we can now create a set of trees. The number of created trees depends on the threshold ⌧. We found, that to properly dis- tinguish between genres like Pop, Rock, Dance, R&B, Folk, and Other, ⌧ := 0.085 proved to be useful, re- sulting in 141 trees. The roots of these trees are typi- cally the names of seed-genres like Jazz, Pop, Rock, etc. (see Figure 1). Not all generated trees have children. For example, the tree with the seed-genre Groove consists of just the root. Although Groove co-occurs with R&B, Rock, Funk, and Soul, the co-occurrence rates with genres other than itself are all below ⌧. Even the co-occurrence with itself is low (0.157). This suggests, that Groove is not really a genre, but more a property of a genre. Another example for a root- only tree is Calypso. Here the co-occurrence with itself is much higher (0.606) and indeed Calypso qualifies as stand-alone genre that simply does not have any sub-genres in this database. Naturally, the generated taxonomies are only simplified mappings of the more complex relationship graph repre- sented by C. In reality, genres aren’t necessarily exclusive members of one tree or another (e.g. fusion genres). An ontology is the much better construct. But, as we will see, for the purpose of mapping most sub-genres to their seed- genre, trees are useful. Rock Metal Alternative Punk ... Pop Folk Pop Acoustic Pop Top 40 ... Hip-Hop East Coast Rap Turntablism ... RnB Motown Funk Soul Urban ... Figure 1. Partial, generated trees for the seed-genres Rock, Pop, Hip-Hop, and R&B. 2.4 Matching with Million Song Dataset To create song-level genre annotations for the MSD, we queried the beaTunes database for songs with artist/title pairs contained in the MSD and were able to match 677,038 songs. In order to ease the comparison with the HO and Top-MAGD datasets, we associated each matched song with the seed-genre of its most often occurring genre label, taking advantage of the taxonomies created in Section 2.3. Motown, for example, is represented by its seed-genre RnB. In many cases, the found seed-genres are
  19. 25.

    beaTunes • Consumer application for Windows and Mac • Encourages

    users to correct metadata • Collects anonymized, user-submitted metadata in central database https://www.beatunes.com/
  20. 26.

    beaTunes Database • 870 million song submissions by 200 thousand

    users • 772 million submissions are labeled with a genre • Mapped to more than 85 million distinct songs
 (one song, many genre labels) • 677,038 songs have been
 matched to MSD https://www.beatunes.com/
  21. 27.

    Mapping User Genre Labels to a Genre Taxonomy 1. Normalization

    (lowercase, smart subs, etc.) 2. Inferring hierarchical relationships via co-occurrence
  22. 28.

    Co-Occurrence Matrix 2. Pop Pop Rock Pop Pop Pop 3.

    Alternative Alternative Electronic Jazz Jazz J-Pop 4. Jazz Hip-Hop/Rap Hip-Hop Hip-Hop Soundtrack R&B 5. Hip-Hop Hip-Hop Jazz Reggae Latin Soundtrack 6. Hip-Hop/Rap R&B Alternative R&B Dance Jazz 7. Soundtrack Soundtrack Dance Soundtrack House Electronica/Dance 8. R&B Jazz R&B Blues Otros M#✏(Rock) 9. Electronic Country Rock/Pop Electronic Blues Altern. & Punk 10. Country Altern. & Punk Soundtrack Rap Electronica Hip-Hop/Rap Table 1. Top ten genres used by beaTunes users with different languages. N denotes the number of submissions in millions. Co-Occurrence Rank 1. 2. 3. 4. Rock Rock (0.609) Pop (0.057) Alternative (0.026) Rock/Pop (0.016) Pop Pop (0.593) Rock (0.077) Rock/Pop (0.014) R&B (0.013) Alternative Alternative (0.394) Rock (0.156) Pop (0.052) Alternative/Punk (0.036) R&B R&B (0.566) Pop (0.061) Soul (0.036) R&B/Soul (0.033) Soundtrack Soundtrack (0.754) Rock (0.024) Pop (0.022) Game (0.011) ... ... ... ... ... Table 2. Genre labels in the beaTunes database and their top four co-occurring labels ordered by relative strength given in parenthesis. The underlying values from the co-occurrence matrix C were computed taking only submissions by English speakers and the 1,000 most-used labels into account. (2) Because this rule allows a genre to be a sub-genre of multiple genres, we add: a is a direct sub-genre of b, iff a is a sub-genre of b ^ C a,b > C a,c with c 6= a ^ c 6= b; a, b, c 2 G (4) Rock Metal Alternative Punk ... Pop Genre labels and their top four co-occurring labels ordered by relative strength given in parenthesis.
  23. 29.

    Co-Occurrence Matrix 2. Pop Pop Rock Pop Pop Pop 3.

    Alternative Alternative Electronic Jazz Jazz J-Pop 4. Jazz Hip-Hop/Rap Hip-Hop Hip-Hop Soundtrack R&B 5. Hip-Hop Hip-Hop Jazz Reggae Latin Soundtrack 6. Hip-Hop/Rap R&B Alternative R&B Dance Jazz 7. Soundtrack Soundtrack Dance Soundtrack House Electronica/Dance 8. R&B Jazz R&B Blues Otros M#✏(Rock) 9. Electronic Country Rock/Pop Electronic Blues Altern. & Punk 10. Country Altern. & Punk Soundtrack Rap Electronica Hip-Hop/Rap Table 1. Top ten genres used by beaTunes users with different languages. N denotes the number of submissions in millions. Co-Occurrence Rank 1. 2. 3. 4. Rock Rock (0.609) Pop (0.057) Alternative (0.026) Rock/Pop (0.016) Pop Pop (0.593) Rock (0.077) Rock/Pop (0.014) R&B (0.013) Alternative Alternative (0.394) Rock (0.156) Pop (0.052) Alternative/Punk (0.036) R&B R&B (0.566) Pop (0.061) Soul (0.036) R&B/Soul (0.033) Soundtrack Soundtrack (0.754) Rock (0.024) Pop (0.022) Game (0.011) ... ... ... ... ... Table 2. Genre labels in the beaTunes database and their top four co-occurring labels ordered by relative strength given in parenthesis. The underlying values from the co-occurrence matrix C were computed taking only submissions by English speakers and the 1,000 most-used labels into account. (2) Because this rule allows a genre to be a sub-genre of multiple genres, we add: a is a direct sub-genre of b, iff a is a sub-genre of b ^ C a,b > C a,c with c 6= a ^ c 6= b; a, b, c 2 G (4) Rock Metal Alternative Punk ... Pop Genre labels and their top four co-occurring labels ordered by relative strength given in parenthesis. Co-occurrence rates aren’t symmetric!
  24. 30.
  25. 31.

    Rules 1. If a genre a co-occurs with another genre

    b more than a minimum threshold τ, and a co-occurs with b more than the other way around, then we assume that a is a sub-genre of b.
  26. 32.

    Rules 1. If a genre a co-occurs with another genre

    b more than a minimum threshold τ, and a co-occurs with b more than the other way around, then we assume that a is a sub-genre of b. 2. a is a direct sub-genre of b, iff a is a sub-genre of b and Ca,b > Ca,c with c≠a und c≠b.
 
 a,b,c ∈ G and Ca,b being the co-occurrence rate between a and b.
  27. 33.

    Generated Taxonomies Rock (0.077) Rock/Pop (0.014) R&B (0.013) Rock (0.156)

    Pop (0.052) Alternative/Punk (0.036) Pop (0.061) Soul (0.036) R&B/Soul (0.033) Rock (0.024) Pop (0.022) Game (0.011) ... ... ... d their top four co-occurring labels ordered by relative strength given in ccurrence matrix C were computed taking only submissions by English unt. genre of (4) ents, we ed trees properly roved to se trees z, Pop, mple, the he root. nk, and an itself f is low a genre, r a root- Rock Metal Alternative Punk ... Pop Folk Pop Acoustic Pop Top 40 ... Hip-Hop East Coast Rap Turntablism ... RnB Motown Funk Soul Urban ... Figure 1. Partial, generated trees for the seed-genres Rock, Pop, Hip-Hop, and R&B. (2) Because this rule allows a genre to be a sub-genre of multiple genres, we add: a is a direct sub-genre of b, iff a is a sub-genre of b ^ C a,b > C a,c with c 6= a ^ c 6= b; a, b, c 2 G (4) By finding all direct sub-genres and their parents, we can now create a set of trees. The number of created trees depends on the threshold ⌧. We found, that to properly distinguish between Pop and Rock, ⌧ := 0.085 proved to be useful, resulting in 141 trees. The roots of these trees are typically the names of seed-genres like Jazz, Pop, Rock, etc. (see Figure 1). Not all generated trees have children. For example, the tree with the seed-genre Groove consists of just the root. Although Groove co-occurs with R&B, Rock, Funk, and Soul, the co-occurrence rates with genres other than itself are all below ⌧. Even the co-occurrence with itself is low (0.157). This suggests, that Groove is not really a genre, but more a property of a genre. Another example for a root- only tree is Calypso. Here the co-occurrence with itself is much higher (0.606) and indeed Calypso qualifies as stand-alone genre that simply does not have any sub-genres in this database. Naturally, the generated taxonomies are only simplified mappings of the more complex relationship graph repre- sented by C. In reality, genres aren’t necessarily exclusive members of one tree or another (e.g. fusion genres). An ontology is the much better construct. But, as we will see, Rock Metal Alternative Punk ... Pop Folk Pop Acoustic Pop Top 40 ... Hip-Hop East Coast Rap Turntablism ... RnB Motown Funk Soul Urban ... Figure 1. Partial, generated trees for the seed-genres Rock, Pop, Hip-Hop, and R&B. 2.4 Matching with Million Song Dataset To create song-level genre annotations for the MSD, we queried the beaTunes database for songs with artist/title pairs contained in the MSD and were able to match 677,038 songs. In order to ease the comparison with the HO and Top-MAGD datasets, we associated each matched song with the seed-genre of its most often occurring genre label, taking advantage of the taxonomies created in
  28. 34.

    Generated Taxonomies Rock (0.077) Rock/Pop (0.014) R&B (0.013) Rock (0.156)

    Pop (0.052) Alternative/Punk (0.036) Pop (0.061) Soul (0.036) R&B/Soul (0.033) Rock (0.024) Pop (0.022) Game (0.011) ... ... ... d their top four co-occurring labels ordered by relative strength given in ccurrence matrix C were computed taking only submissions by English unt. genre of (4) ents, we ed trees properly roved to se trees z, Pop, mple, the he root. nk, and an itself f is low a genre, r a root- Rock Metal Alternative Punk ... Pop Folk Pop Acoustic Pop Top 40 ... Hip-Hop East Coast Rap Turntablism ... RnB Motown Funk Soul Urban ... Figure 1. Partial, generated trees for the seed-genres Rock, Pop, Hip-Hop, and R&B. (2) Because this rule allows a genre to be a sub-genre of multiple genres, we add: a is a direct sub-genre of b, iff a is a sub-genre of b ^ C a,b > C a,c with c 6= a ^ c 6= b; a, b, c 2 G (4) By finding all direct sub-genres and their parents, we can now create a set of trees. The number of created trees depends on the threshold ⌧. We found, that to properly distinguish between Pop and Rock, ⌧ := 0.085 proved to be useful, resulting in 141 trees. The roots of these trees are typically the names of seed-genres like Jazz, Pop, Rock, etc. (see Figure 1). Not all generated trees have children. For example, the tree with the seed-genre Groove consists of just the root. Although Groove co-occurs with R&B, Rock, Funk, and Soul, the co-occurrence rates with genres other than itself are all below ⌧. Even the co-occurrence with itself is low (0.157). This suggests, that Groove is not really a genre, but more a property of a genre. Another example for a root- only tree is Calypso. Here the co-occurrence with itself is much higher (0.606) and indeed Calypso qualifies as stand-alone genre that simply does not have any sub-genres in this database. Naturally, the generated taxonomies are only simplified mappings of the more complex relationship graph repre- sented by C. In reality, genres aren’t necessarily exclusive members of one tree or another (e.g. fusion genres). An ontology is the much better construct. But, as we will see, Rock Metal Alternative Punk ... Pop Folk Pop Acoustic Pop Top 40 ... Hip-Hop East Coast Rap Turntablism ... RnB Motown Funk Soul Urban ... Figure 1. Partial, generated trees for the seed-genres Rock, Pop, Hip-Hop, and R&B. 2.4 Matching with Million Song Dataset To create song-level genre annotations for the MSD, we queried the beaTunes database for songs with artist/title pairs contained in the MSD and were able to match 677,038 songs. In order to ease the comparison with the HO and Top-MAGD datasets, we associated each matched song with the seed-genre of its most often occurring genre label, taking advantage of the taxonomies created in No parent
 = seed genre Seed genres can easily be found and
 mapped to Top-MAGD labels.
 (Pop/Rock, Electronic, Rap, Jazz, Latin, R&B, International,
 Country, Reggae, Blues, Vocal, Folk, New Age)
  29. 35.

    Building Genre Taxonomies with Last.fm Tags • Last.fm tags come

    with a relative strength (0-100) • Same procedure can be applied • Many more different tags -> minimum threshold τ has to be adjusted • Allows us to find seed genres (top-level)
  30. 36.

    Comparing Annotations • beaTunes and Last.fm labels can now be

    matched to Top-MAGD labels using the generated taxonomies • Let’s compare!
  31. 38.

    Last.fm beaTunes Top-MAGD 75.7% 84.0% Last.fm - 80.9% Pairwise Comparison

    High agreement rates, especially between beaTunes and Top-MAGD
  32. 39.

    Last.fm beaTunes Top-MAGD 75.7% 84.0% Last.fm - 80.9% Pairwise Comparison

    Glass-ceiling for ground truth with just one value/song?
  33. 42.

    Combined Dataset 1 • Find songs occurring in all datasets

    • For which at least two of the datasets agree
 (majority voting)
  34. 43.

    Combined Dataset 1 • Find songs occurring in all datasets

    • For which at least two of the datasets agree
 (majority voting) • Take note of minority vote, if existent
 (i.e. allow ambiguity)
  35. 44.

    Combined Dataset 1 • Find songs occurring in all datasets

    • For which at least two of the datasets agree
 (majority voting) • Take note of minority vote, if existent
 (i.e. allow ambiguity) • => Combined Dataset 1 (CD1):
 133,676 tracks
 98,149 (73.4%) found by unanimous consent
  36. 45.

    CD1 Genre Distribution CD1 0 20 40 60 Blues Country

    Electronic Folk Intern. Jazz Latin New Age Pop Rock Rap Reggae RnB Vocal 2.2 3.9 11.4 2.2 1.1 5.8 2.1 1 59.8 4.6 2.7 2.9 0.2 Tracks per Genre [%] Figure 2. Majority genre distribution of tracks in CD1. As gen BG suit
  37. 48.

    Combined Dataset 2 • Just beaTunes and Last.fm tracks, because

    Top-MAGD can’t distinguish between Pop and Rock
  38. 49.

    Combined Dataset 2 • Just beaTunes and Last.fm tracks, because

    Top-MAGD can’t distinguish between Pop and Rock • Split Pop and Rock
  39. 50.

    Combined Dataset 2 • Just beaTunes and Last.fm tracks, because

    Top-MAGD can’t distinguish between Pop and Rock • Split Pop and Rock • Add Metal and Punk
  40. 51.

    Combined Dataset 2 • Just beaTunes and Last.fm tracks, because

    Top-MAGD can’t distinguish between Pop and Rock • Split Pop and Rock • Add Metal and Punk • Remove Vocal
  41. 52.

    Combined Dataset 2 • Just beaTunes and Last.fm tracks, because

    Top-MAGD can’t distinguish between Pop and Rock • Split Pop and Rock • Add Metal and Punk • Remove Vocal • Combine R&B and Soul
  42. 53.

    Combined Dataset 2 • Just beaTunes and Last.fm tracks, because

    Top-MAGD can’t distinguish between Pop and Rock • Split Pop and Rock • Add Metal and Punk • Remove Vocal • Combine R&B and Soul • Rename International to World
  43. 54.

    Combined Dataset 2 • Find songs in both beaTunes and

    Last.fm datasets • => Combined Dataset 2 (CD2):
 280,831 tracks
 191,401 (68.2%) have the same genre label • Combined Dataset 2 Consensus (CD2C):
 Convenience dataset with only the songs that have the same genre label
  44. 55.

    0 20 40 60 Blues Country Electronic Folk Intern. Jazz

    Latin New Age Pop Rock Rap Reggae RnB Vocal 2.2 3.9 11.4 2.2 1.1 5.8 2.1 1 59.8 4.6 2.7 2.9 0.2 Tracks per Genre [%] Figure 2. Majority genre distribution of tracks in CD1. CD2C Genre Distribution CD1 CD2C 0 20 40 60 RnB Vocal 2.7 2.9 0.2 Tracks per Genre [%] Figure 2. Majority genre distribution of tracks in CD1. 0 20 40 60 Blues Country Electronic Folk Jazz Latin Metal New Age Pop Punk Rap Reggae RnB Rock World 3.2 4.7 11.4 2.2 7.7 1.6 4.8 0.6 6.8 1.7 5.7 4.2 5.1 39.2 1 Tracks per Genre [%] Figure 3. Genre distribution of tracks in CD2C. As CD2 songs are genre, we used the fi 6. A BGD and LFMGD suitable for compar Top-MAGD. They b the genre labels them fications are problem datasets presented in bels where feasible. is actually much mo basis. We are publis it proves useful for cludes: • Multiple genr relative streng judge reliabili • Co-occurrenc in Section 2.3 • Derived genre All data can be fo com/msd_genre_
  45. 56.

    Benchmarking Partitions • Main “feature” of Schindler et al. paper

    • Increase reproducibility • Traditional training/test splits (90%, 80%, …) • Training/test splits with genre stratification • Splits with fixed number per genre (1,000, 2,000, 3,000)
  46. 57.
  47. 59.

    Summary • Multiple large ground truth datasets for the MSD

    • Despite large size, reasonable quality
  48. 60.

    Summary • Multiple large ground truth datasets for the MSD

    • Despite large size, reasonable quality • Allow for ambiguity
  49. 61.

    Summary • Multiple large ground truth datasets for the MSD

    • Despite large size, reasonable quality • Allow for ambiguity • Benchmark partitions to promote experimentation and comparability