Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dictionary Learning for Music Genre Recognition

Dictionary Learning for Music Genre Recognition

Source Code: https://github.com/mfcabrera/wtg
As mandatory interdisciplinary project in my M.Sc. at the TUM we worked in the Machine Learning and Geometric Optimization group implementing a system for Music Genre Recognition using K-SVD and SVM.

Miguel Cabrera

October 25, 2013
Tweet

More Decks by Miguel Cabrera

Other Decks in Research

Transcript

  1. Research Group for
    Geometric Optimization
    and Machine Learning
    Music Genre Recognition using Dictionary Learning
    Interdisciplinary Project
    Miguel Cabrera, Thomas Pieronczyk
    Research Group for Geometric Optimization and Machine Learning
    October 25, 2013
    Slide 1/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  2. Research Group for
    Geometric Optimization
    and Machine Learning
    Table of contents
    Introduction
    Framework
    Experiments
    Results
    Conclusion
    Slide 2/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  3. Research Group for
    Geometric Optimization
    and Machine Learning
    Introduction
    Music Information Retrieval (MIR)
    Artist, instrument and chord recognition
    Music annotation (tagging)
    Mood and genre classification
    Music Genre Recognition (MGR)
    Slide 3/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  4. Research Group for
    Geometric Optimization
    and Machine Learning
    MGR Challanges
    High dimensional
    No formal definition
    Highly subjective
    One song → Many genres
    Constantly new genres appearing
    Slide 4/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  5. Research Group for
    Geometric Optimization
    and Machine Learning
    Objectives
    Main objective: System that predicts the musical genre of a
    piece of music
    Combination of:
    Yeh and Youngs Dictionary Learning Framework for Music
    Genre Recognition (MGR) [1]
    with the K-SVD algorithm from Aharon et. al. [2].
    Slide 5/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  6. Research Group for
    Geometric Optimization
    and Machine Learning
    Framework
    Audio-Signal transformation
    & feature extraction
    Codebook
    generation
    Encoding
    Code word
    encoding
    aggregation
    Training
    Audio-Signal transformation
    & feature extraction
    Encoding
    Code word
    encoding
    aggregation
    Prediction
    Training Songs
    Test song
    Ground truth
    If supervised
    Codebook
    Codebook
    Training
    Testing
    Prediction
    Figure : Framework (Source: [1])
    Slide 6/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  7. Research Group for
    Geometric Optimization
    and Machine Learning
    Framework
    Audio-Signal transformation
    & feature extraction
    Codebook
    generation
    Encoding
    Code word
    encoding
    aggregation
    Training
    Audio-Signal transformation
    & feature extraction
    Encoding
    Code word
    encoding
    aggregation
    Prediction
    Training Songs
    Test song
    Ground truth
    If supervised
    Codebook
    Codebook
    Training
    Testing
    Prediction
    Figure : Framework - Audio Feature Extraction (Source: [1])
    Slide 7/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  8. Research Group for
    Geometric Optimization
    and Machine Learning
    Features for musical genre recognition
    Meta-Data Features
    Short-Time Audio Features
    Slide 8/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  9. Research Group for
    Geometric Optimization
    and Machine Learning
    Short-Time Audio Features - Spectrogram
    Figure : Spectrogram Classical Figure : Spectrogram Rock
    Slide 9/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  10. Research Group for
    Geometric Optimization
    and Machine Learning
    Short-Time Audio Features - Constant Q Transformation
    Figure : Constant Q Transform -
    Classical
    Figure : Constant Q Transform -
    Rock
    Slide 10/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  11. Research Group for
    Geometric Optimization
    and Machine Learning
    Framework
    Audio-Signal transformation
    & feature extraction
    Codebook
    generation
    Encoding
    Code word
    encoding
    aggregation
    Training
    Audio-Signal transformation
    & feature extraction
    Encoding
    Code word
    encoding
    aggregation
    Prediction
    Training Songs
    Test song
    Ground truth
    If supervised
    Codebook
    Codebook
    Training
    Testing
    Prediction
    Figure : Framework - Codebook Generation (Source: [1])
    Slide 11/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  12. Research Group for
    Geometric Optimization
    and Machine Learning
    Dictionary Learning
    Given an input signal vector y ∈ Rn, the sparse representation
    problem can be mathematically formulated as:
    x∗ = argmin
    x
    1
    2 y − Dx 2
    2 + λ x 1
    (1)
    Figure : Source: [5]
    Slide 12/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  13. Research Group for
    Geometric Optimization
    and Machine Learning
    K-SVD
    Dictionary Learning Algorithm
    min
    D,X
    Y − DX 2
    F
    subject to ∀i, xi 0 ≤ T0
    (2)

    K-SVD is a generalization of the K-Means algorithm.
    Slide 13/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  14. Research Group for
    Geometric Optimization
    and Machine Learning
    K-SVD - Algorithm
    Initialization: Initialization of the dictionary D ∈ RnxK
    Sparse Coding Step: Sparse coding of examples based
    on the current dictionary
    Codebook Update Step Updating the dictionary atoms to
    better fit the data.
    Slide 14/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  15. Research Group for
    Geometric Optimization
    and Machine Learning
    Dictionary Learning for MGR
    Dictionaries are trained separately per genre
    The resulting dictionary is the concatenation of all the
    separately trained dictionaries
    D [D1, D2, D3, ..., Dc] (3)
    Slide 15/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  16. Research Group for
    Geometric Optimization
    and Machine Learning
    Dictionary Learning for MGR
    Dictionaries are trained separately per genre
    The resulting dictionary is the concatenation of all the
    separately trained dictionaries
    D [D1, D2, D3, ..., Dc] (3)
    Slide 15/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  17. Research Group for
    Geometric Optimization
    and Machine Learning
    Framework
    Audio-Signal transformation
    & feature extraction
    Codebook
    generation
    Encoding
    Code word
    encoding
    aggregation
    Training
    Audio-Signal transformation
    & feature extraction
    Encoding
    Code word
    encoding
    aggregation
    Prediction
    Training Songs
    Test song
    Ground truth
    If supervised
    Codebook
    Codebook
    Training
    Testing
    Prediction
    Figure : Framework - Encoding (Source: [1])
    Slide 16/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  18. Research Group for
    Geometric Optimization
    and Machine Learning
    Encoding
    After codebook generation the training data is re-encoded with
    the concatenated dictionary D [D1, D2, D3, ..., Dc] resulting in
    a sparse representation.
    Slide 17/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  19. Research Group for
    Geometric Optimization
    and Machine Learning
    Framework
    Audio-Signal transformation
    & feature extraction
    Codebook
    generation
    Encoding
    Code word
    encoding
    aggregation
    Training
    Audio-Signal transformation
    & feature extraction
    Encoding
    Code word
    encoding
    aggregation
    Prediction
    Training Songs
    Test song
    Ground truth
    If supervised
    Codebook
    Codebook
    Training
    Testing
    Prediction
    Figure : Framework - Aggregation (Source: [1])
    Slide 18/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  20. Research Group for
    Geometric Optimization
    and Machine Learning
    Histogram Aggregation
    We aggregate the encoded frames into “texture windows”
    [3]
    Texture Windows: Minimum amount of time that is
    necessary to identify a particular music “texture”
    Window size: 3-5 seconds
    Song represented as a bag-of-histograms. i.e. 6
    histograms per song
    The histograms inherit the labels
    Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  21. Research Group for
    Geometric Optimization
    and Machine Learning
    Histogram Aggregation
    We aggregate the encoded frames into “texture windows”
    [3]
    Texture Windows: Minimum amount of time that is
    necessary to identify a particular music “texture”
    Window size: 3-5 seconds
    Song represented as a bag-of-histograms. i.e. 6
    histograms per song
    The histograms inherit the labels
    Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  22. Research Group for
    Geometric Optimization
    and Machine Learning
    Histogram Aggregation
    We aggregate the encoded frames into “texture windows”
    [3]
    Texture Windows: Minimum amount of time that is
    necessary to identify a particular music “texture”
    Window size: 3-5 seconds
    Song represented as a bag-of-histograms. i.e. 6
    histograms per song
    The histograms inherit the labels
    Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  23. Research Group for
    Geometric Optimization
    and Machine Learning
    Histogram Aggregation
    We aggregate the encoded frames into “texture windows”
    [3]
    Texture Windows: Minimum amount of time that is
    necessary to identify a particular music “texture”
    Window size: 3-5 seconds
    Song represented as a bag-of-histograms. i.e. 6
    histograms per song
    The histograms inherit the labels
    Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  24. Research Group for
    Geometric Optimization
    and Machine Learning
    Histogram Aggregation
    We aggregate the encoded frames into “texture windows”
    [3]
    Texture Windows: Minimum amount of time that is
    necessary to identify a particular music “texture”
    Window size: 3-5 seconds
    Song represented as a bag-of-histograms. i.e. 6
    histograms per song
    The histograms inherit the labels
    Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  25. Research Group for
    Geometric Optimization
    and Machine Learning
    Framework
    Audio-Signal transformation
    & feature extraction
    Codebook
    generation
    Encoding
    Code word
    encoding
    aggregation
    Training
    Audio-Signal transformation
    & feature extraction
    Encoding
    Code word
    encoding
    aggregation
    Prediction
    Training Songs
    Test song
    Ground truth
    If supervised
    Codebook
    Codebook
    Training
    Testing
    Prediction
    Figure : Framework - Aggregation (Source: [1])
    Slide 20/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  26. Research Group for
    Geometric Optimization
    and Machine Learning
    SVM with Histogram Intersection Kernel
    We use a Support Vector Machine for the classification
    step.
    Histogram Intersection Kernel
    KHI(ha, hb) = k
    j=1 min(ha(j), hb(j))
    Measure the degree of similarity between two histograms.
    Computational comparable with linear SVM
    Works better than linear and non-linear SVM for
    histograms features.
    Implementation1 based on the popular Libsvm for Matlab.
    1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/
    Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  27. Research Group for
    Geometric Optimization
    and Machine Learning
    SVM with Histogram Intersection Kernel
    We use a Support Vector Machine for the classification
    step.
    Histogram Intersection Kernel
    KHI(ha, hb) = k
    j=1 min(ha(j), hb(j))
    Measure the degree of similarity between two histograms.
    Computational comparable with linear SVM
    Works better than linear and non-linear SVM for
    histograms features.
    Implementation1 based on the popular Libsvm for Matlab.
    1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/
    Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  28. Research Group for
    Geometric Optimization
    and Machine Learning
    SVM with Histogram Intersection Kernel
    We use a Support Vector Machine for the classification
    step.
    Histogram Intersection Kernel
    KHI(ha, hb) = k
    j=1 min(ha(j), hb(j))
    Measure the degree of similarity between two histograms.
    Computational comparable with linear SVM
    Works better than linear and non-linear SVM for
    histograms features.
    Implementation1 based on the popular Libsvm for Matlab.
    1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/
    Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  29. Research Group for
    Geometric Optimization
    and Machine Learning
    SVM with Histogram Intersection Kernel
    We use a Support Vector Machine for the classification
    step.
    Histogram Intersection Kernel
    KHI(ha, hb) = k
    j=1 min(ha(j), hb(j))
    Measure the degree of similarity between two histograms.
    Computational comparable with linear SVM
    Works better than linear and non-linear SVM for
    histograms features.
    Implementation1 based on the popular Libsvm for Matlab.
    1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/
    Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  30. Research Group for
    Geometric Optimization
    and Machine Learning
    SVM with Histogram Intersection Kernel
    We use a Support Vector Machine for the classification
    step.
    Histogram Intersection Kernel
    KHI(ha, hb) = k
    j=1 min(ha(j), hb(j))
    Measure the degree of similarity between two histograms.
    Computational comparable with linear SVM
    Works better than linear and non-linear SVM for
    histograms features.
    Implementation1 based on the popular Libsvm for Matlab.
    1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/
    Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  31. Research Group for
    Geometric Optimization
    and Machine Learning
    SVM with Histogram Intersection Kernel
    We use a Support Vector Machine for the classification
    step.
    Histogram Intersection Kernel
    KHI(ha, hb) = k
    j=1 min(ha(j), hb(j))
    Measure the degree of similarity between two histograms.
    Computational comparable with linear SVM
    Works better than linear and non-linear SVM for
    histograms features.
    Implementation1 based on the popular Libsvm for Matlab.
    1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/
    Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  32. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments
    Slide 22/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  33. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary I
    Data
    GTZAN dataset comprising 1000 songs with 30 sec. length,
    equally divided into 10 genres: blues, classical,
    country, disco, hiphop, jazz, metal, pop, reggae,
    and rock.
    One of the most frequently used datasets in MGR.
    But: Exposes several problems such as replications,
    mislabelings, and distortions. [6]
    Features
    CQT
    Spectrogram
    Features are normalized
    Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  34. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary I
    Data
    GTZAN dataset comprising 1000 songs with 30 sec. length,
    equally divided into 10 genres: blues, classical,
    country, disco, hiphop, jazz, metal, pop, reggae,
    and rock.
    One of the most frequently used datasets in MGR.
    But: Exposes several problems such as replications,
    mislabelings, and distortions. [6]
    Features
    CQT
    Spectrogram
    Features are normalized
    Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  35. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary I
    Data
    GTZAN dataset comprising 1000 songs with 30 sec. length,
    equally divided into 10 genres: blues, classical,
    country, disco, hiphop, jazz, metal, pop, reggae,
    and rock.
    One of the most frequently used datasets in MGR.
    But: Exposes several problems such as replications,
    mislabelings, and distortions. [6]
    Features
    CQT
    Spectrogram
    Features are normalized
    Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  36. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary I
    Data
    GTZAN dataset comprising 1000 songs with 30 sec. length,
    equally divided into 10 genres: blues, classical,
    country, disco, hiphop, jazz, metal, pop, reggae,
    and rock.
    One of the most frequently used datasets in MGR.
    But: Exposes several problems such as replications,
    mislabelings, and distortions. [6]
    Features
    CQT
    Spectrogram
    Features are normalized
    Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  37. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary I
    Data
    GTZAN dataset comprising 1000 songs with 30 sec. length,
    equally divided into 10 genres: blues, classical,
    country, disco, hiphop, jazz, metal, pop, reggae,
    and rock.
    One of the most frequently used datasets in MGR.
    But: Exposes several problems such as replications,
    mislabelings, and distortions. [6]
    Features
    CQT
    Spectrogram
    Features are normalized
    Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  38. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary I
    Data
    GTZAN dataset comprising 1000 songs with 30 sec. length,
    equally divided into 10 genres: blues, classical,
    country, disco, hiphop, jazz, metal, pop, reggae,
    and rock.
    One of the most frequently used datasets in MGR.
    But: Exposes several problems such as replications,
    mislabelings, and distortions. [6]
    Features
    CQT
    Spectrogram
    Features are normalized
    Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  39. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary I
    Data
    GTZAN dataset comprising 1000 songs with 30 sec. length,
    equally divided into 10 genres: blues, classical,
    country, disco, hiphop, jazz, metal, pop, reggae,
    and rock.
    One of the most frequently used datasets in MGR.
    But: Exposes several problems such as replications,
    mislabelings, and distortions. [6]
    Features
    CQT
    Spectrogram
    Features are normalized
    Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  40. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary I
    Data
    GTZAN dataset comprising 1000 songs with 30 sec. length,
    equally divided into 10 genres: blues, classical,
    country, disco, hiphop, jazz, metal, pop, reggae,
    and rock.
    One of the most frequently used datasets in MGR.
    But: Exposes several problems such as replications,
    mislabelings, and distortions. [6]
    Features
    CQT
    Spectrogram
    Features are normalized
    Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  41. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary II
    Dictionary Learning
    Initialization: random and from data.
    Dictionary Size: 50-400
    Target Sparsity: 1-3
    Pursuit Algorithm: Orthogonal Matching Pursuit (OMP)
    Classification
    Parameter Selection: Experimentally using 10-fold
    cross-validation
    Performance measures: Accuracy at histogram and clip
    level
    Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  42. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary II
    Dictionary Learning
    Initialization: random and from data.
    Dictionary Size: 50-400
    Target Sparsity: 1-3
    Pursuit Algorithm: Orthogonal Matching Pursuit (OMP)
    Classification
    Parameter Selection: Experimentally using 10-fold
    cross-validation
    Performance measures: Accuracy at histogram and clip
    level
    Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  43. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary II
    Dictionary Learning
    Initialization: random and from data.
    Dictionary Size: 50-400
    Target Sparsity: 1-3
    Pursuit Algorithm: Orthogonal Matching Pursuit (OMP)
    Classification
    Parameter Selection: Experimentally using 10-fold
    cross-validation
    Performance measures: Accuracy at histogram and clip
    level
    Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  44. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary II
    Dictionary Learning
    Initialization: random and from data.
    Dictionary Size: 50-400
    Target Sparsity: 1-3
    Pursuit Algorithm: Orthogonal Matching Pursuit (OMP)
    Classification
    Parameter Selection: Experimentally using 10-fold
    cross-validation
    Performance measures: Accuracy at histogram and clip
    level
    Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  45. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary II
    Dictionary Learning
    Initialization: random and from data.
    Dictionary Size: 50-400
    Target Sparsity: 1-3
    Pursuit Algorithm: Orthogonal Matching Pursuit (OMP)
    Classification
    Parameter Selection: Experimentally using 10-fold
    cross-validation
    Performance measures: Accuracy at histogram and clip
    level
    Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  46. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary II
    Dictionary Learning
    Initialization: random and from data.
    Dictionary Size: 50-400
    Target Sparsity: 1-3
    Pursuit Algorithm: Orthogonal Matching Pursuit (OMP)
    Classification
    Parameter Selection: Experimentally using 10-fold
    cross-validation
    Performance measures: Accuracy at histogram and clip
    level
    Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  47. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary II
    Dictionary Learning
    Initialization: random and from data.
    Dictionary Size: 50-400
    Target Sparsity: 1-3
    Pursuit Algorithm: Orthogonal Matching Pursuit (OMP)
    Classification
    Parameter Selection: Experimentally using 10-fold
    cross-validation
    Performance measures: Accuracy at histogram and clip
    level
    Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  48. Research Group for
    Geometric Optimization
    and Machine Learning
    Experiments Summary II
    Dictionary Learning
    Initialization: random and from data.
    Dictionary Size: 50-400
    Target Sparsity: 1-3
    Pursuit Algorithm: Orthogonal Matching Pursuit (OMP)
    Classification
    Parameter Selection: Experimentally using 10-fold
    cross-validation
    Performance measures: Accuracy at histogram and clip
    level
    Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  49. Research Group for
    Geometric Optimization
    and Machine Learning
    Data Partitioning
    Two partitioning schemes
    90-10
    90% used for dictionary and SVM training
    10%: Encoded with the dictionary learned and used as
    testing set
    Full data
    100% used for dictionary and SVM training
    Performance evaluation with with 10-fold cross validation
    This scheme is the one used in the literature
    Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  50. Research Group for
    Geometric Optimization
    and Machine Learning
    Data Partitioning
    Two partitioning schemes
    90-10
    90% used for dictionary and SVM training
    10%: Encoded with the dictionary learned and used as
    testing set
    Full data
    100% used for dictionary and SVM training
    Performance evaluation with with 10-fold cross validation
    This scheme is the one used in the literature
    Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  51. Research Group for
    Geometric Optimization
    and Machine Learning
    Data Partitioning
    Two partitioning schemes
    90-10
    90% used for dictionary and SVM training
    10%: Encoded with the dictionary learned and used as
    testing set
    Full data
    100% used for dictionary and SVM training
    Performance evaluation with with 10-fold cross validation
    This scheme is the one used in the literature
    Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  52. Research Group for
    Geometric Optimization
    and Machine Learning
    Data Partitioning
    Two partitioning schemes
    90-10
    90% used for dictionary and SVM training
    10%: Encoded with the dictionary learned and used as
    testing set
    Full data
    100% used for dictionary and SVM training
    Performance evaluation with with 10-fold cross validation
    This scheme is the one used in the literature
    Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  53. Research Group for
    Geometric Optimization
    and Machine Learning
    Data Partitioning
    Two partitioning schemes
    90-10
    90% used for dictionary and SVM training
    10%: Encoded with the dictionary learned and used as
    testing set
    Full data
    100% used for dictionary and SVM training
    Performance evaluation with with 10-fold cross validation
    This scheme is the one used in the literature
    Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  54. Research Group for
    Geometric Optimization
    and Machine Learning
    Data Partitioning
    Two partitioning schemes
    90-10
    90% used for dictionary and SVM training
    10%: Encoded with the dictionary learned and used as
    testing set
    Full data
    100% used for dictionary and SVM training
    Performance evaluation with with 10-fold cross validation
    This scheme is the one used in the literature
    Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  55. Research Group for
    Geometric Optimization
    and Machine Learning
    Data Partitioning
    Two partitioning schemes
    90-10
    90% used for dictionary and SVM training
    10%: Encoded with the dictionary learned and used as
    testing set
    Full data
    100% used for dictionary and SVM training
    Performance evaluation with with 10-fold cross validation
    This scheme is the one used in the literature
    Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  56. Research Group for
    Geometric Optimization
    and Machine Learning
    Data Partitioning
    Two partitioning schemes
    90-10
    90% used for dictionary and SVM training
    10%: Encoded with the dictionary learned and used as
    testing set
    Full data
    100% used for dictionary and SVM training
    Performance evaluation with with 10-fold cross validation
    This scheme is the one used in the literature
    Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  57. Research Group for
    Geometric Optimization
    and Machine Learning
    Results
    Slide 26/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  58. Research Group for
    Geometric Optimization
    and Machine Learning
    Results - Dictionary Update Iterations
    Y − DX 2
    F
    for J = 1, 2, . . . , k (4)
    Slide 27/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  59. Research Group for
    Geometric Optimization
    and Machine Learning
    Encoding - Atom Usage : Blues
    blues classical country disco hip hop jazz metal pop reggae rock
    0
    200
    400
    600
    800
    1000
    1200
    Dictionaries
    Atom counts
    Atom usage counts
    Slide 28/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  60. Research Group for
    Geometric Optimization
    and Machine Learning
    Encoding - Atom Usage : Rock
    blues classical country disco hip hop jazz metal pop reggae rock
    0
    100
    200
    300
    400
    500
    600
    700
    800
    900
    1000
    1100
    Dictionaries
    Atom counts
    Atom usage counts
    Slide 29/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  61. Research Group for
    Geometric Optimization
    and Machine Learning
    Results - Histogram Level
    Table : Results summary using 90-10% split and normalized
    spectrogram
    Dictionary Target Cross test set
    size sparsity validation performance
    performance
    500 1 75.96 62.83
    1000 1 80.20 63.40
    2000 1 81.69 65.67
    3000 1 84.40 60.00
    4000 1 77.61 65.16
    500 2 65.10 53.66
    1000 2 68.88 60.00
    2000 2 74.55 61.00
    3000 2 76.29 62.3
    4000 2 76.29 62.3
    500 3 65.16 54.87
    1000 3 54.87 56.16
    2000 3 71.81 58.83
    3000 3 72.10 59.0
    4000 3 74.22 59.3
    Slide 30/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  62. Research Group for
    Geometric Optimization
    and Machine Learning
    Results - Clip Level
    Table : Results using full data and cross-validation with normalized
    spectrogram
    Dictionary Target Cross Cross
    size sparsity validation validation
    performance perf. clip level
    500 1 75.02 79.53
    1000 1 78.05 81.50
    2000 1 82.11 84.26
    3000 1 83.40 85.10
    4000 1 83.23 85.12
    Slide 31/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  63. Research Group for
    Geometric Optimization
    and Machine Learning
    Results - Dictionary Size vs Performance
    500 1000 1500 2000 2500 3000 3500 4000
    Dictionary Size
    74
    76
    78
    80
    82
    84
    86
    Performance
    Performance with different dictionary sizes.
    Frame level Performance
    Clip level Performance
    Slide 32/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  64. Research Group for
    Geometric Optimization
    and Machine Learning
    Results - Confusion Matrix
    blues
    classical
    country
    disco
    hiphop
    jazz
    metal
    pop
    reggae
    rock
    blues
    classical
    country
    disco
    hiphop
    jazz
    metal
    pop
    reggae
    rock
    8 0 0 1 0 0 0 0 0 1
    0 10 0 0 0 0 0 0 0 0
    0 0 10 0 0 0 0 0 0 1
    0 0 0 8 0 0 1 2 0 0
    0 0 0 0 10 0 0 0 0 0
    0 0 0 0 0 10 0 0 0 0
    0 0 0 0 0 0 8 0 0 0
    0 0 0 1 0 0 0 8 1 0
    2 0 0 0 0 0 0 0 9 1
    0 0 0 0 0 0 1 0 0 7
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    Figure : Confusion matrix from the test runs
    Slide 33/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  65. Research Group for
    Geometric Optimization
    and Machine Learning
    Results - State-of-the-art accuracies
    Tzanetakis et al.[TC02b]
    Panagakis et al.[PBK08]
    Yeh et al.[YY12]
    K-SVD + Histogram SVM
    Panagakis et al.[PKIA09]
    Chang et al.[CsRJI10]
    0
    20
    40
    60
    80
    100
    Accuracy (%)
    Figure : State-of-the-art accuracies
    Slide 34/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  66. Research Group for
    Geometric Optimization
    and Machine Learning
    Conclusion
    K-SVD in combination with SVM HIK Kernel performs
    comparable with other state-of-the-art techniques.
    Sparsity 1 is the best set-up for this particular task
    Learning a number of sub-dictionaries for each class
    enhances the discriminative power of the encoding system
    This technique works better when the dictionary is
    intialized with frames from the data.
    Slide 35/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  67. Research Group for
    Geometric Optimization
    and Machine Learning
    Thank you.
    Slide 36/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  68. Research Group for
    Geometric Optimization
    and Machine Learning
    Sources I
    [1] Yeh, Chin-Chia Michael and Yang, Yi-Hsuan
    Supervised dictionary learning for music genre
    classification.
    ACM, 2012., ISBN: 978-1-4503-1329-2
    [2] Aharon, M. and Elad, M. and Bruckstein, A.
    K-SVD: An Algorithm for Designing Overcomplete
    Dictionaries for Sparse Representation
    Signal Processing, IEEE Transactions on
    [3] Tzanetakis, G. and Cook, P.
    Musical genre classification of audio signals
    Speech and Audio Processing, IEEE Transactions on
    Slide 1/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  69. Research Group for
    Geometric Optimization
    and Machine Learning
    Sources II
    [4] Subhransu Maji, Alexander C. Berg and Jitendra Malik
    Fast Intersection / Additive Kernel SVM Toolbox
    [5] Course Slides - Information retrieval in high dimensional
    data WS1213, (Image)
    [6] Sturm, Bob L.
    An Analysis of the GTZAN Music Genre Dataset.
    Proceedings of the Second International ACM Workshop
    on Music Information Retrieval with User-centered and
    Multimodal Strategies
    Slide 2/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  70. Research Group for
    Geometric Optimization
    and Machine Learning
    Backup Slides
    Slide 3/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  71. Research Group for
    Geometric Optimization
    and Machine Learning
    Graphical User Interface
    Slide 4/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  72. Research Group for
    Geometric Optimization
    and Machine Learning
    K-SVD - Initialization Phase
    Initialization
    Set the dictionary matrix D(0) ∈ RnxK with l2 normalized
    columns. Set J = 1.
    Slide 5/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  73. Research Group for
    Geometric Optimization
    and Machine Learning
    K-SVD - Sparse Coding Step
    Sparse Coding Step
    Use any pursuit algorithm to compute the representation
    vectors xi
    for each example yi
    , by approximating the solution of
    i = 1, 2, . . . , N, min
    xi
    yi − Dxi
    2
    2 subject to xi 0 ≤ T0. (5)
    Slide 6/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  74. Research Group for
    Geometric Optimization
    and Machine Learning
    K-SVD - Codebook Update Step
    Codebook Update Step
    For each column k = 1, 2, . . . , K ∈ DJ−1, update it by
    Define the group of examples that use this atom,
    ωk
    = i|1 ≤ i ≤ N, xk
    T
    (i) = 0 .
    Compute the overall representation error matrix, Ek
    , by
    Ek
    = Y −
    j=k
    dj
    xi
    T
    (6)
    Restrict Ek
    by choosing only the columns corresponding to ωk
    , and
    obtain ER
    k
    Apply SVD decomposition ER=U∆VT
    k
    . Choose the updated dictionary
    column dk
    to be the first column of U. Update the coefficient vector xk
    R
    to be the first column of V multiplied by ∆(1, 1).
    Slide 7/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide

  75. Research Group for
    Geometric Optimization
    and Machine Learning
    K-SVD - Update Iteration Step
    Increase Iteration Step
    Set J = J + 1
    Slide 8/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

    View Slide