$30 off During Our Annual Pro Sale. View Details »

Improving Genre Annotations for the Million Song Dataset

Improving Genre Annotations for the Million Song Dataset

Abstract

Any automatic music genre recognition (MGR) system must show its value in tests against a ground truth dataset. Recently, the public dataset most often used for this purpose has been proven problematic, because of mislabeling, duplications, and its relatively small size. Another dataset, the Million Song Dataset (MSD), a collection of features and metadata for one million tracks, unfortunately does not contain readily accessible genre labels. Therefore, multiple attempts have been made to add song-level genre annotations, which are required for supervised machine learning tasks. Thus far, the quality of these annotations has not been evaluated.
In this paper we present a method for creating additional genre annotations for the MSD from databases, which contain multiple, crowd-sourced genre labels per song (Last.fm, beaTunes). Based on label co-occurrence rates, we derive taxonomies, which allow inference of top- level genres. These are most often used in MGR systems.
We then combine multiple datasets using majority voting. This both promises a more reliable ground truth and allows the evaluation of the newly generated and preexisting datasets. To facilitate further research, all derived genre annotations are publicly available on our website.

http://www.tagtraum.com/msd_genre_datasets.html

The paper was published at ISMIR 2015.

Hendrik Schreiber

October 27, 2015
Tweet

More Decks by Hendrik Schreiber

Other Decks in Science

Transcript

  1. Improving Genre Annotations for the Million Song Dataset
    Hendrik Schreiber

    tagtraum industries incorporated
    [email protected] / @h_schreiber
    October 27, ISMIR 2015 Málaga

    View Slide

  2. TL;DR

    View Slide

  3. TL;DR
    1. New, high-quality genre annotations for

    part of the Million Song Dataset (MSD)

    View Slide

  4. TL;DR
    1. New, high-quality genre annotations for

    part of the Million Song Dataset (MSD)
    2. Involved in automatic music genre recognition?


    Please use them!

    View Slide

  5. TL;DR
    1. New, high-quality genre annotations for

    part of the Million Song Dataset (MSD)
    2. Involved in automatic music genre recognition?


    Please use them!
    3. http://www.tagtraum.com/msd_genre_datasets.html


    Linked to from the MSD site—thanks, Colin!

    View Slide

  6. Thank you.

    View Slide

  7. Automatic Music Genre Recognition

    is among the most popular MIR tasks. *
    Why?
    * Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. A survey of audio-based music classification and annotation. Multimedia, IEEE
    Transactions on, 13(2):303–319, 2011.

    View Slide

  8. Automatic Music Genre Recognition

    is among the most popular MIR tasks. *
    Why?
    But we don’t use a large, standard dataset.
    * Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. A survey of audio-based music classification and annotation. Multimedia, IEEE
    Transactions on, 13(2):303–319, 2011.

    View Slide

  9. J Intell Inf Syst (2013) 41:371–406 377
    Table 2 Datasets used in MGR, the type of data they contain, and the percentage of experimental
    work (435 references) in our survey (Sturm 2012a) that use them
    Dataset Description %
    Private Constructed for research but not made available 58
    GTZAN Audio; http://marsyas.info/download/data_sets 23
    ISMIR2004 Audio; http://ismir2004.ismir.net/genre_contest 17
    Latin (Silla et al. 2008) Features; http://www.ppgia.pucpr.br/∼silla/lmd/ 5
    Ballroom Audio; http://mtg.upf.edu/ismir2004/contest/tempoContest/ 3
    Homburg Audio; http://www-ai.cs.uni-dortmund.de/audio.html 3
    (Homburg et al. 2005)
    Bodhidharma Symbolic; http://jmir.sourceforge.net/Codaich.html 3
    USPOP2002 Audio; http://labrosa.ee.columbia.edu/projects/musicsim/ 2
    (Berenzweig et al. 2004) uspop2002.html
    1517-artists Audio; http://www.seyerlehner.info 1
    RWC (Goto et al. 2003) Audio; http://staff.aist.go.jp/m.goto/RWC-MDB/ 1
    SOMeJB Features; http://www.ifs.tuwien.ac.at/∼andi/somejb/ 1
    SLAC Audio & symbols; http://jmir.sourceforge.net/Codaich.html 1
    SALAMI (Smith et al. 2011) Features; http://ddmal.music.mcgill.ca/research/salami 0.7
    Unique Features; http://www.seyerlehner.info 0.7
    Million song Features; http://labrosa.ee.columbia.edu/millionsong/ 0.7
    (Bertin-Mahieux et al. 2011)
    ISMIS2011 Features; http://tunedit.org/challenge/music-retrieval 0.4
    All datasets listed after Private are public
    Used Datasets
    Sturm, B.L. A survey of evaluation in music genre recognition. In Proc. Adaptive Multimedia Retrieval. 2012.

    View Slide

  10. J Intell Inf Syst (2013) 41:371–406 377
    Table 2 Datasets used in MGR, the type of data they contain, and the percentage of experimental
    work (435 references) in our survey (Sturm 2012a) that use them
    Dataset Description %
    Private Constructed for research but not made available 58
    GTZAN Audio; http://marsyas.info/download/data_sets 23
    ISMIR2004 Audio; http://ismir2004.ismir.net/genre_contest 17
    Latin (Silla et al. 2008) Features; http://www.ppgia.pucpr.br/∼silla/lmd/ 5
    Ballroom Audio; http://mtg.upf.edu/ismir2004/contest/tempoContest/ 3
    Homburg Audio; http://www-ai.cs.uni-dortmund.de/audio.html 3
    (Homburg et al. 2005)
    Bodhidharma Symbolic; http://jmir.sourceforge.net/Codaich.html 3
    USPOP2002 Audio; http://labrosa.ee.columbia.edu/projects/musicsim/ 2
    (Berenzweig et al. 2004) uspop2002.html
    1517-artists Audio; http://www.seyerlehner.info 1
    RWC (Goto et al. 2003) Audio; http://staff.aist.go.jp/m.goto/RWC-MDB/ 1
    SOMeJB Features; http://www.ifs.tuwien.ac.at/∼andi/somejb/ 1
    SLAC Audio & symbols; http://jmir.sourceforge.net/Codaich.html 1
    SALAMI (Smith et al. 2011) Features; http://ddmal.music.mcgill.ca/research/salami 0.7
    Unique Features; http://www.seyerlehner.info 0.7
    Million song Features; http://labrosa.ee.columbia.edu/millionsong/ 0.7
    (Bertin-Mahieux et al. 2011)
    ISMIS2011 Features; http://tunedit.org/challenge/music-retrieval 0.4
    All datasets listed after Private are public
    Used Datasets
    Sturm, B.L. A survey of evaluation in music genre recognition. In Proc. Adaptive Multimedia Retrieval. 2012.
    Not
    reproducible

    View Slide

  11. J Intell Inf Syst (2013) 41:371–406 377
    Table 2 Datasets used in MGR, the type of data they contain, and the percentage of experimental
    work (435 references) in our survey (Sturm 2012a) that use them
    Dataset Description %
    Private Constructed for research but not made available 58
    GTZAN Audio; http://marsyas.info/download/data_sets 23
    ISMIR2004 Audio; http://ismir2004.ismir.net/genre_contest 17
    Latin (Silla et al. 2008) Features; http://www.ppgia.pucpr.br/∼silla/lmd/ 5
    Ballroom Audio; http://mtg.upf.edu/ismir2004/contest/tempoContest/ 3
    Homburg Audio; http://www-ai.cs.uni-dortmund.de/audio.html 3
    (Homburg et al. 2005)
    Bodhidharma Symbolic; http://jmir.sourceforge.net/Codaich.html 3
    USPOP2002 Audio; http://labrosa.ee.columbia.edu/projects/musicsim/ 2
    (Berenzweig et al. 2004) uspop2002.html
    1517-artists Audio; http://www.seyerlehner.info 1
    RWC (Goto et al. 2003) Audio; http://staff.aist.go.jp/m.goto/RWC-MDB/ 1
    SOMeJB Features; http://www.ifs.tuwien.ac.at/∼andi/somejb/ 1
    SLAC Audio & symbols; http://jmir.sourceforge.net/Codaich.html 1
    SALAMI (Smith et al. 2011) Features; http://ddmal.music.mcgill.ca/research/salami 0.7
    Unique Features; http://www.seyerlehner.info 0.7
    Million song Features; http://labrosa.ee.columbia.edu/millionsong/ 0.7
    (Bertin-Mahieux et al. 2011)
    ISMIS2011 Features; http://tunedit.org/challenge/music-retrieval 0.4
    All datasets listed after Private are public
    Used Datasets
    Sturm, B.L. A survey of evaluation in music genre recognition. In Proc. Adaptive Multimedia Retrieval. 2012.
    Not
    reproducible
    Hardly used

    View Slide

  12. J Intell Inf Syst (2013) 41:371–406 377
    Table 2 Datasets used in MGR, the type of data they contain, and the percentage of experimental
    work (435 references) in our survey (Sturm 2012a) that use them
    Dataset Description %
    Private Constructed for research but not made available 58
    GTZAN Audio; http://marsyas.info/download/data_sets 23
    ISMIR2004 Audio; http://ismir2004.ismir.net/genre_contest 17
    Latin (Silla et al. 2008) Features; http://www.ppgia.pucpr.br/∼silla/lmd/ 5
    Ballroom Audio; http://mtg.upf.edu/ismir2004/contest/tempoContest/ 3
    Homburg Audio; http://www-ai.cs.uni-dortmund.de/audio.html 3
    (Homburg et al. 2005)
    Bodhidharma Symbolic; http://jmir.sourceforge.net/Codaich.html 3
    USPOP2002 Audio; http://labrosa.ee.columbia.edu/projects/musicsim/ 2
    (Berenzweig et al. 2004) uspop2002.html
    1517-artists Audio; http://www.seyerlehner.info 1
    RWC (Goto et al. 2003) Audio; http://staff.aist.go.jp/m.goto/RWC-MDB/ 1
    SOMeJB Features; http://www.ifs.tuwien.ac.at/∼andi/somejb/ 1
    SLAC Audio & symbols; http://jmir.sourceforge.net/Codaich.html 1
    SALAMI (Smith et al. 2011) Features; http://ddmal.music.mcgill.ca/research/salami 0.7
    Unique Features; http://www.seyerlehner.info 0.7
    Million song Features; http://labrosa.ee.columbia.edu/millionsong/ 0.7
    (Bertin-Mahieux et al. 2011)
    ISMIS2011 Features; http://tunedit.org/challenge/music-retrieval 0.4
    All datasets listed after Private are public
    Used Datasets
    Sturm, B.L. A survey of evaluation in music genre recognition. In Proc. Adaptive Multimedia Retrieval. 2012.
    Not
    reproducible
    Has its issues
    Hardly used

    View Slide

  13. View Slide

  14. Too Much Britney
    • Small: 1,000 tracks
    • Replicas
    • Excerpts from the same recording
    • Versions (same music but different recordings)
    • Mis-labelings
    • Distortions
    • Excerpts by the same artists: 35% of Reggae excerpts are by
    Bob Marley, 24% of Pop excerpts are by Britney Spears, …
    Sturm, B.L. An analysis of the GTZAN music genre dataset. In Proceedings of the second international ACM workshop on Music information retrieval
    with user-centered and multimodal strategies, pages 7–12. ACM, 2012.

    View Slide

  15. What’s wrong with MSD?

    View Slide

  16. What’s wrong with MSD?
    No song-level genre annotations.

    View Slide

  17. Last.fm MSD Tags
    • 522,366 unique tags
    • 505,216 tracks with at least one tag, i.e. multiple tags per song
    • No explicit relationships between tags
    • Many tags are genre related, but not all of them:


    rock 101,071

    pop 69,159

    alternative 55,777

    indie 48,175

    electronic 46,270

    female vocalists 42,565

    favorites 39,921

    love 34,901

    dance 33,618

    00s 31,432


    http://labrosa.ee.columbia.edu/millionsong/lastfm

    View Slide

  18. Last.fm MSD Tags
    • 522,366 unique tags
    • 505,216 tracks with at least one tag, i.e. multiple tags per song
    • No explicit relationships between tags
    • Many tags are genre related, but not all of them:


    rock 101,071

    pop 69,159

    alternative 55,777

    indie 48,175

    electronic 46,270

    female vocalists 42,565

    favorites 39,921

    love 34,901

    dance 33,618

    00s 31,432


    http://labrosa.ee.columbia.edu/millionsong/lastfm
    Yes, great data.
    But no tag hierarchies/
    relationships.

    What’s a genre, what’s not?
    What exactly is the ground truth
    here that’s usable for MGR?

    View Slide

  19. Top-MAGD Annotations
    • Album-level annotations scraped from All Music Guide website
    • 13 unique tags
    • 406,427 labeled tracks:


    Pop/Rock 238,786

    Electronic 41,075

    Rap 20,939

    Jazz 17,836

    Latin 17,590

    R&B 14,335

    International 14,242

    Country 11,772

    Reggae 6,946

    Blues 6,836

    Vocal 6,195

    Folk 5,865

    New Age 4,010
    Alexander Schindler, Rudolf Mayer, and Andreas Rauber. Facilitating comprehensive benchmarking experiments on the million song dataset. In
    Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), pages 469–474, 2012.

    View Slide

  20. Top-MAGD Annotations
    • Album-level annotations scraped from All Music Guide website
    • 13 unique tags
    • 406,427 labeled tracks:


    Pop/Rock 238,786

    Electronic 41,075

    Rap 20,939

    Jazz 17,836

    Latin 17,590

    R&B 14,335

    International 14,242

    Country 11,772

    Reggae 6,946

    Blues 6,836

    Vocal 6,195

    Folk 5,865

    New Age 4,010
    Is this still useful?
    Alexander Schindler, Rudolf Mayer, and Andreas Rauber. Facilitating comprehensive benchmarking experiments on the million song dataset. In
    Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), pages 469–474, 2012.

    View Slide

  21. Top-MAGD Annotations
    • Album-level annotations scraped from All Music Guide website
    • 13 unique tags
    • 406,427 labeled tracks:


    Pop/Rock 238,786

    Electronic 41,075

    Rap 20,939

    Jazz 17,836

    Latin 17,590

    R&B 14,335

    International 14,242

    Country 11,772

    Reggae 6,946

    Blues 6,836

    Vocal 6,195

    Folk 5,865

    New Age 4,010
    Alexander Schindler, Rudolf Mayer, and Andreas Rauber. Facilitating comprehensive benchmarking experiments on the million song dataset. In
    Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), pages 469–474, 2012.
    Also, great job.

    But the granularity is not really
    what we want:

    Metallica just isn’t Britney.

    Has anybody verified the
    annotations?

    Are album-level annotations
    good enough?

    View Slide

  22. Goal
    Intelligently merge several datasets into

    one high-quality genre annotation dataset.

    View Slide

  23. How
    1. Make sure all datasets use the same

    labels (via taxonomies)
    2. Use datasets to evaluate each other
    3. Generate new & improved dataset
    5. Hip-Hop Hip-Hop Jazz Reggae Latin Soundtrack
    6. Hip-Hop/Rap R&B Alternative R&B Dance Jazz
    7. Soundtrack Soundtrack Dance Soundtrack House Electronica/Dance
    8. R&B Jazz R&B Blues Otros M#✏(Rock)
    9. Electronic Country Rock/Pop Electronic Blues Altern. & Punk
    10. Country Altern. & Punk Soundtrack Rap Electronica Hip-Hop/Rap
    Table 1. Top ten genres used by beaTunes users with different languages. N denotes the number of submissions in millions.
    Co-Occurrence Rank 1. 2. 3. 4.
    Rock Rock (0.609) Pop (0.057) Alternative (0.026) Rock/Pop (0.016)
    Pop Pop (0.593) Rock (0.077) Rock/Pop (0.014) R&B (0.013)
    Alternative Alternative (0.394) Rock (0.156) Pop (0.052) Alternative/Punk (0.036)
    R&B R&B (0.566) Pop (0.061) Soul (0.036) R&B/Soul (0.033)
    Soundtrack Soundtrack (0.754) Rock (0.024) Pop (0.022) Game (0.011)
    ... ... ... ... ...
    Table 2. Genre labels in the beaTunes database and their top four co-occurring labels ordered by relative strength given in
    parenthesis. The underlying values from the co-occurrence matrix C were computed taking only submissions by English
    speakers and the 1,000 most-used labels into account.
    (2) Because this rule allows a genre to be a sub-genre of
    multiple genres, we add:
    a is a
    direct
    sub-genre of b, iff
    a is a sub-genre of b
    ^ C
    a,b
    > C
    a,c
    with c 6= a ^ c 6= b; a, b, c 2 G
    (4)
    By finding all direct sub-genres and their parents, we
    can now create a set of trees. The number of created trees
    depends on the threshold ⌧. We found, that to properly dis-
    tinguish between genres like Pop, Rock, Dance, R&B,
    Folk, and Other, ⌧ := 0.085 proved to be useful, re-
    sulting in 141 trees. The roots of these trees are typi-
    cally the names of seed-genres like Jazz, Pop, Rock,
    etc. (see Figure 1).
    Not all generated trees have children. For example, the
    tree with the seed-genre Groove consists of just the root.
    Although Groove co-occurs with R&B, Rock, Funk, and
    Soul, the co-occurrence rates with genres other than itself
    are all below ⌧. Even the co-occurrence with itself is low
    (0.157). This suggests, that Groove is not really a genre,
    but more a property of a genre. Another example for a root-
    only tree is Calypso. Here the co-occurrence with itself
    is much higher (0.606) and indeed Calypso qualifies as
    stand-alone genre that simply does not have any sub-genres
    in this database.
    Naturally, the generated taxonomies are only simplified
    mappings of the more complex relationship graph repre-
    sented by C. In reality, genres aren’t necessarily exclusive
    members of one tree or another (e.g. fusion genres). An
    ontology is the much better construct. But, as we will see,
    for the purpose of mapping most sub-genres to their seed-
    genre, trees are useful.
    Rock
    Metal Alternative Punk ...
    Pop
    Folk Pop Acoustic Pop Top 40 ...
    Hip-Hop
    East Coast Rap Turntablism ...
    RnB
    Motown Funk Soul Urban ...
    Figure 1. Partial, generated trees for the seed-genres
    Rock, Pop, Hip-Hop, and R&B.
    2.4 Matching with Million Song Dataset
    To create song-level genre annotations for the MSD, we
    queried the beaTunes database for songs with artist/title
    pairs contained in the MSD and were able to match
    677,038 songs. In order to ease the comparison with the
    HO and Top-MAGD datasets, we associated each matched
    song with the seed-genre of its most often occurring
    genre label, taking advantage of the taxonomies created in
    Section 2.3. Motown, for example, is represented by its
    seed-genre RnB. In many cases, the found seed-genres are

    View Slide

  24. But first, a little…

    View Slide

  25. beaTunes
    • Consumer application for Windows and Mac
    • Encourages users to correct metadata
    • Collects anonymized, user-submitted metadata in
    central database
    https://www.beatunes.com/

    View Slide

  26. beaTunes Database
    • 870 million song submissions by 200 thousand users
    • 772 million submissions are labeled with a genre
    • Mapped to more than 85 million distinct songs

    (one song, many genre labels)
    • 677,038 songs have been

    matched to MSD
    https://www.beatunes.com/

    View Slide

  27. Mapping User Genre Labels
    to a Genre Taxonomy
    1. Normalization (lowercase, smart subs, etc.)
    2. Inferring hierarchical relationships via co-occurrence

    View Slide

  28. Co-Occurrence Matrix
    2. Pop Pop Rock Pop Pop Pop
    3. Alternative Alternative Electronic Jazz Jazz J-Pop
    4. Jazz Hip-Hop/Rap Hip-Hop Hip-Hop Soundtrack R&B
    5. Hip-Hop Hip-Hop Jazz Reggae Latin Soundtrack
    6. Hip-Hop/Rap R&B Alternative R&B Dance Jazz
    7. Soundtrack Soundtrack Dance Soundtrack House Electronica/Dance
    8. R&B Jazz R&B Blues Otros M#✏(Rock)
    9. Electronic Country Rock/Pop Electronic Blues Altern. & Punk
    10. Country Altern. & Punk Soundtrack Rap Electronica Hip-Hop/Rap
    Table 1. Top ten genres used by beaTunes users with different languages. N denotes the number of submissions in millions.
    Co-Occurrence Rank 1. 2. 3. 4.
    Rock Rock (0.609) Pop (0.057) Alternative (0.026) Rock/Pop (0.016)
    Pop Pop (0.593) Rock (0.077) Rock/Pop (0.014) R&B (0.013)
    Alternative Alternative (0.394) Rock (0.156) Pop (0.052) Alternative/Punk (0.036)
    R&B R&B (0.566) Pop (0.061) Soul (0.036) R&B/Soul (0.033)
    Soundtrack Soundtrack (0.754) Rock (0.024) Pop (0.022) Game (0.011)
    ... ... ... ... ...
    Table 2. Genre labels in the beaTunes database and their top four co-occurring labels ordered by relative strength given in
    parenthesis. The underlying values from the co-occurrence matrix C were computed taking only submissions by English
    speakers and the 1,000 most-used labels into account.
    (2) Because this rule allows a genre to be a sub-genre of
    multiple genres, we add:
    a is a
    direct
    sub-genre of b, iff
    a is a sub-genre of b
    ^ C
    a,b
    > C
    a,c
    with c 6= a ^ c 6= b; a, b, c 2 G
    (4)
    Rock
    Metal Alternative Punk ...
    Pop
    Genre labels and their top four co-occurring labels
    ordered by relative strength given in parenthesis.

    View Slide

  29. Co-Occurrence Matrix
    2. Pop Pop Rock Pop Pop Pop
    3. Alternative Alternative Electronic Jazz Jazz J-Pop
    4. Jazz Hip-Hop/Rap Hip-Hop Hip-Hop Soundtrack R&B
    5. Hip-Hop Hip-Hop Jazz Reggae Latin Soundtrack
    6. Hip-Hop/Rap R&B Alternative R&B Dance Jazz
    7. Soundtrack Soundtrack Dance Soundtrack House Electronica/Dance
    8. R&B Jazz R&B Blues Otros M#✏(Rock)
    9. Electronic Country Rock/Pop Electronic Blues Altern. & Punk
    10. Country Altern. & Punk Soundtrack Rap Electronica Hip-Hop/Rap
    Table 1. Top ten genres used by beaTunes users with different languages. N denotes the number of submissions in millions.
    Co-Occurrence Rank 1. 2. 3. 4.
    Rock Rock (0.609) Pop (0.057) Alternative (0.026) Rock/Pop (0.016)
    Pop Pop (0.593) Rock (0.077) Rock/Pop (0.014) R&B (0.013)
    Alternative Alternative (0.394) Rock (0.156) Pop (0.052) Alternative/Punk (0.036)
    R&B R&B (0.566) Pop (0.061) Soul (0.036) R&B/Soul (0.033)
    Soundtrack Soundtrack (0.754) Rock (0.024) Pop (0.022) Game (0.011)
    ... ... ... ... ...
    Table 2. Genre labels in the beaTunes database and their top four co-occurring labels ordered by relative strength given in
    parenthesis. The underlying values from the co-occurrence matrix C were computed taking only submissions by English
    speakers and the 1,000 most-used labels into account.
    (2) Because this rule allows a genre to be a sub-genre of
    multiple genres, we add:
    a is a
    direct
    sub-genre of b, iff
    a is a sub-genre of b
    ^ C
    a,b
    > C
    a,c
    with c 6= a ^ c 6= b; a, b, c 2 G
    (4)
    Rock
    Metal Alternative Punk ...
    Pop
    Genre labels and their top four co-occurring labels
    ordered by relative strength given in parenthesis.
    Co-occurrence rates
    aren’t symmetric!

    View Slide

  30. Rules

    View Slide

  31. Rules
    1. If a genre a co-occurs with another genre b more
    than a minimum threshold τ, and a co-occurs with b
    more than the other way around, then we assume
    that a is a sub-genre of b.

    View Slide

  32. Rules
    1. If a genre a co-occurs with another genre b more
    than a minimum threshold τ, and a co-occurs with b
    more than the other way around, then we assume
    that a is a sub-genre of b.
    2. a is a direct sub-genre of b, iff a is a sub-genre of b
    and Ca,b > Ca,c with c≠a und c≠b.


    a,b,c ∈ G and Ca,b being the co-occurrence rate
    between a and b.

    View Slide

  33. Generated Taxonomies
    Rock (0.077) Rock/Pop (0.014) R&B (0.013)
    Rock (0.156) Pop (0.052) Alternative/Punk (0.036)
    Pop (0.061) Soul (0.036) R&B/Soul (0.033)
    Rock (0.024) Pop (0.022) Game (0.011)
    ... ... ...
    d their top four co-occurring labels ordered by relative strength given in
    ccurrence matrix C were computed taking only submissions by English
    unt.
    genre of
    (4)
    ents, we
    ed trees
    properly
    roved to
    se trees
    z, Pop,
    mple, the
    he root.
    nk, and
    an itself
    f is low
    a genre,
    r a root-
    Rock
    Metal Alternative Punk ...
    Pop
    Folk Pop Acoustic Pop Top 40 ...
    Hip-Hop
    East Coast Rap Turntablism ...
    RnB
    Motown Funk Soul Urban ...
    Figure 1. Partial, generated trees for the seed-genres
    Rock, Pop, Hip-Hop, and R&B.
    (2) Because this rule allows a genre to be a sub-genre of
    multiple genres, we add:
    a is a
    direct
    sub-genre of b, iff
    a is a sub-genre of b
    ^ C
    a,b
    > C
    a,c
    with c 6= a ^ c 6= b; a, b, c 2 G
    (4)
    By finding all direct sub-genres and their parents, we
    can now create a set of trees. The number of created trees
    depends on the threshold ⌧. We found, that to properly
    distinguish between Pop and Rock, ⌧ := 0.085 proved to
    be useful, resulting in 141 trees. The roots of these trees
    are typically the names of seed-genres like Jazz, Pop,
    Rock, etc. (see Figure 1).
    Not all generated trees have children. For example, the
    tree with the seed-genre Groove consists of just the root.
    Although Groove co-occurs with R&B, Rock, Funk, and
    Soul, the co-occurrence rates with genres other than itself
    are all below ⌧. Even the co-occurrence with itself is low
    (0.157). This suggests, that Groove is not really a genre,
    but more a property of a genre. Another example for a root-
    only tree is Calypso. Here the co-occurrence with itself
    is much higher (0.606) and indeed Calypso qualifies as
    stand-alone genre that simply does not have any sub-genres
    in this database.
    Naturally, the generated taxonomies are only simplified
    mappings of the more complex relationship graph repre-
    sented by C. In reality, genres aren’t necessarily exclusive
    members of one tree or another (e.g. fusion genres). An
    ontology is the much better construct. But, as we will see,
    Rock
    Metal Alternative Punk ...
    Pop
    Folk Pop Acoustic Pop Top 40 ...
    Hip-Hop
    East Coast Rap Turntablism ...
    RnB
    Motown Funk Soul Urban ...
    Figure 1. Partial, generated trees for the seed-genres
    Rock, Pop, Hip-Hop, and R&B.
    2.4 Matching with Million Song Dataset
    To create song-level genre annotations for the MSD, we
    queried the beaTunes database for songs with artist/title
    pairs contained in the MSD and were able to match
    677,038 songs. In order to ease the comparison with the
    HO and Top-MAGD datasets, we associated each matched
    song with the seed-genre of its most often occurring
    genre label, taking advantage of the taxonomies created in

    View Slide

  34. Generated Taxonomies
    Rock (0.077) Rock/Pop (0.014) R&B (0.013)
    Rock (0.156) Pop (0.052) Alternative/Punk (0.036)
    Pop (0.061) Soul (0.036) R&B/Soul (0.033)
    Rock (0.024) Pop (0.022) Game (0.011)
    ... ... ...
    d their top four co-occurring labels ordered by relative strength given in
    ccurrence matrix C were computed taking only submissions by English
    unt.
    genre of
    (4)
    ents, we
    ed trees
    properly
    roved to
    se trees
    z, Pop,
    mple, the
    he root.
    nk, and
    an itself
    f is low
    a genre,
    r a root-
    Rock
    Metal Alternative Punk ...
    Pop
    Folk Pop Acoustic Pop Top 40 ...
    Hip-Hop
    East Coast Rap Turntablism ...
    RnB
    Motown Funk Soul Urban ...
    Figure 1. Partial, generated trees for the seed-genres
    Rock, Pop, Hip-Hop, and R&B.
    (2) Because this rule allows a genre to be a sub-genre of
    multiple genres, we add:
    a is a
    direct
    sub-genre of b, iff
    a is a sub-genre of b
    ^ C
    a,b
    > C
    a,c
    with c 6= a ^ c 6= b; a, b, c 2 G
    (4)
    By finding all direct sub-genres and their parents, we
    can now create a set of trees. The number of created trees
    depends on the threshold ⌧. We found, that to properly
    distinguish between Pop and Rock, ⌧ := 0.085 proved to
    be useful, resulting in 141 trees. The roots of these trees
    are typically the names of seed-genres like Jazz, Pop,
    Rock, etc. (see Figure 1).
    Not all generated trees have children. For example, the
    tree with the seed-genre Groove consists of just the root.
    Although Groove co-occurs with R&B, Rock, Funk, and
    Soul, the co-occurrence rates with genres other than itself
    are all below ⌧. Even the co-occurrence with itself is low
    (0.157). This suggests, that Groove is not really a genre,
    but more a property of a genre. Another example for a root-
    only tree is Calypso. Here the co-occurrence with itself
    is much higher (0.606) and indeed Calypso qualifies as
    stand-alone genre that simply does not have any sub-genres
    in this database.
    Naturally, the generated taxonomies are only simplified
    mappings of the more complex relationship graph repre-
    sented by C. In reality, genres aren’t necessarily exclusive
    members of one tree or another (e.g. fusion genres). An
    ontology is the much better construct. But, as we will see,
    Rock
    Metal Alternative Punk ...
    Pop
    Folk Pop Acoustic Pop Top 40 ...
    Hip-Hop
    East Coast Rap Turntablism ...
    RnB
    Motown Funk Soul Urban ...
    Figure 1. Partial, generated trees for the seed-genres
    Rock, Pop, Hip-Hop, and R&B.
    2.4 Matching with Million Song Dataset
    To create song-level genre annotations for the MSD, we
    queried the beaTunes database for songs with artist/title
    pairs contained in the MSD and were able to match
    677,038 songs. In order to ease the comparison with the
    HO and Top-MAGD datasets, we associated each matched
    song with the seed-genre of its most often occurring
    genre label, taking advantage of the taxonomies created in
    No parent

    = seed genre
    Seed genres can easily be found and

    mapped to Top-MAGD labels.

    (Pop/Rock, Electronic, Rap, Jazz, Latin, R&B, International,

    Country, Reggae, Blues, Vocal, Folk, New Age)

    View Slide

  35. Building Genre Taxonomies
    with Last.fm Tags
    • Last.fm tags come with a relative strength (0-100)
    • Same procedure can be applied
    • Many more different tags -> minimum threshold τ has
    to be adjusted
    • Allows us to find seed genres (top-level)

    View Slide

  36. Comparing Annotations
    • beaTunes and Last.fm labels can now be matched to
    Top-MAGD labels using the generated taxonomies
    • Let’s compare!

    View Slide

  37. Last.fm beaTunes
    Top-MAGD 75.7% 84.0%
    Last.fm - 80.9%
    Pairwise Comparison

    View Slide

  38. Last.fm beaTunes
    Top-MAGD 75.7% 84.0%
    Last.fm - 80.9%
    Pairwise Comparison
    High agreement rates,
    especially between beaTunes
    and Top-MAGD

    View Slide

  39. Last.fm beaTunes
    Top-MAGD 75.7% 84.0%
    Last.fm - 80.9%
    Pairwise Comparison
    Glass-ceiling for ground truth
    with just one value/song?

    View Slide

  40. Combined Dataset 1

    View Slide

  41. Combined Dataset 1
    • Find songs occurring in all datasets

    View Slide

  42. Combined Dataset 1
    • Find songs occurring in all datasets
    • For which at least two of the datasets agree

    (majority voting)

    View Slide

  43. Combined Dataset 1
    • Find songs occurring in all datasets
    • For which at least two of the datasets agree

    (majority voting)
    • Take note of minority vote, if existent

    (i.e. allow ambiguity)

    View Slide

  44. Combined Dataset 1
    • Find songs occurring in all datasets
    • For which at least two of the datasets agree

    (majority voting)
    • Take note of minority vote, if existent

    (i.e. allow ambiguity)
    • => Combined Dataset 1 (CD1):

    133,676 tracks

    98,149 (73.4%) found by unanimous consent

    View Slide

  45. CD1 Genre Distribution
    CD1
    0 20 40 60
    Blues
    Country
    Electronic
    Folk
    Intern.
    Jazz
    Latin
    New Age
    Pop Rock
    Rap
    Reggae
    RnB
    Vocal
    2.2
    3.9
    11.4
    2.2
    1.1
    5.8
    2.1
    1
    59.8
    4.6
    2.7
    2.9
    0.2
    Tracks per Genre [%]
    Figure 2. Majority genre distribution of tracks in CD1.
    As
    gen
    BG
    suit

    View Slide

  46. Metallica is not Britney

    View Slide

  47. Combined Dataset 2

    View Slide

  48. Combined Dataset 2
    • Just beaTunes and Last.fm tracks, because Top-MAGD
    can’t distinguish between Pop and Rock

    View Slide

  49. Combined Dataset 2
    • Just beaTunes and Last.fm tracks, because Top-MAGD
    can’t distinguish between Pop and Rock
    • Split Pop and Rock

    View Slide

  50. Combined Dataset 2
    • Just beaTunes and Last.fm tracks, because Top-MAGD
    can’t distinguish between Pop and Rock
    • Split Pop and Rock
    • Add Metal and Punk

    View Slide

  51. Combined Dataset 2
    • Just beaTunes and Last.fm tracks, because Top-MAGD
    can’t distinguish between Pop and Rock
    • Split Pop and Rock
    • Add Metal and Punk
    • Remove Vocal

    View Slide

  52. Combined Dataset 2
    • Just beaTunes and Last.fm tracks, because Top-MAGD
    can’t distinguish between Pop and Rock
    • Split Pop and Rock
    • Add Metal and Punk
    • Remove Vocal
    • Combine R&B and Soul

    View Slide

  53. Combined Dataset 2
    • Just beaTunes and Last.fm tracks, because Top-MAGD
    can’t distinguish between Pop and Rock
    • Split Pop and Rock
    • Add Metal and Punk
    • Remove Vocal
    • Combine R&B and Soul
    • Rename International to World

    View Slide

  54. Combined Dataset 2
    • Find songs in both beaTunes and Last.fm datasets
    • => Combined Dataset 2 (CD2):

    280,831 tracks

    191,401 (68.2%) have the same genre label
    • Combined Dataset 2 Consensus (CD2C):

    Convenience dataset with only the songs that have the
    same genre label

    View Slide

  55. 0 20 40 60
    Blues
    Country
    Electronic
    Folk
    Intern.
    Jazz
    Latin
    New Age
    Pop Rock
    Rap
    Reggae
    RnB
    Vocal
    2.2
    3.9
    11.4
    2.2
    1.1
    5.8
    2.1
    1
    59.8
    4.6
    2.7
    2.9
    0.2
    Tracks per Genre [%]
    Figure 2. Majority genre distribution of tracks in CD1.
    CD2C Genre Distribution
    CD1
    CD2C
    0 20 40 60
    RnB
    Vocal
    2.7
    2.9
    0.2
    Tracks per Genre [%]
    Figure 2. Majority genre distribution of tracks in CD1.
    0 20 40 60
    Blues
    Country
    Electronic
    Folk
    Jazz
    Latin
    Metal
    New Age
    Pop
    Punk
    Rap
    Reggae
    RnB
    Rock
    World
    3.2
    4.7
    11.4
    2.2
    7.7
    1.6
    4.8
    0.6
    6.8
    1.7
    5.7
    4.2
    5.1
    39.2
    1
    Tracks per Genre [%]
    Figure 3. Genre distribution of tracks in CD2C.
    As CD2 songs are
    genre, we used the fi
    6. A
    BGD and LFMGD
    suitable for compar
    Top-MAGD. They b
    the genre labels them
    fications are problem
    datasets presented in
    bels where feasible.
    is actually much mo
    basis. We are publis
    it proves useful for
    cludes:
    • Multiple genr
    relative streng
    judge reliabili
    • Co-occurrenc
    in Section 2.3
    • Derived genre
    All data can be fo
    com/msd_genre_

    View Slide

  56. Benchmarking Partitions
    • Main “feature” of Schindler et al. paper
    • Increase reproducibility
    • Traditional training/test splits (90%, 80%, …)
    • Training/test splits with genre stratification
    • Splits with fixed number per genre (1,000, 2,000, 3,000)

    View Slide

  57. Summary

    View Slide

  58. Summary
    • Multiple large ground truth datasets for the MSD

    View Slide

  59. Summary
    • Multiple large ground truth datasets for the MSD
    • Despite large size, reasonable quality

    View Slide

  60. Summary
    • Multiple large ground truth datasets for the MSD
    • Despite large size, reasonable quality
    • Allow for ambiguity

    View Slide

  61. Summary
    • Multiple large ground truth datasets for the MSD
    • Despite large size, reasonable quality
    • Allow for ambiguity
    • Benchmark partitions to promote experimentation and
    comparability

    View Slide

  62. Thank you.
    http://www.tagtraum.com/msd_genre_datasets.html
    [email protected] / @h_schreiber

    View Slide

  63. Thank you.
    Questions?
    http://www.tagtraum.com/msd_genre_datasets.html
    [email protected] / @h_schreiber

    View Slide