Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dictionary Learning for Music Genre Recognition

Dictionary Learning for Music Genre Recognition

Source Code: https://github.com/mfcabrera/wtg
As mandatory interdisciplinary project in my M.Sc. at the TUM we worked in the Machine Learning and Geometric Optimization group implementing a system for Music Genre Recognition using K-SVD and SVM.

Miguel Cabrera

October 25, 2013
Tweet

More Decks by Miguel Cabrera

Other Decks in Research

Transcript

  1. Research Group for Geometric Optimization and Machine Learning Music Genre

    Recognition using Dictionary Learning Interdisciplinary Project Miguel Cabrera, Thomas Pieronczyk Research Group for Geometric Optimization and Machine Learning October 25, 2013 Slide 1/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  2. Research Group for Geometric Optimization and Machine Learning Table of

    contents Introduction Framework Experiments Results Conclusion Slide 2/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  3. Research Group for Geometric Optimization and Machine Learning Introduction Music

    Information Retrieval (MIR) Artist, instrument and chord recognition Music annotation (tagging) Mood and genre classification Music Genre Recognition (MGR) Slide 3/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  4. Research Group for Geometric Optimization and Machine Learning MGR Challanges

    High dimensional No formal definition Highly subjective One song → Many genres Constantly new genres appearing Slide 4/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  5. Research Group for Geometric Optimization and Machine Learning Objectives Main

    objective: System that predicts the musical genre of a piece of music Combination of: Yeh and Youngs Dictionary Learning Framework for Music Genre Recognition (MGR) [1] with the K-SVD algorithm from Aharon et. al. [2]. Slide 5/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  6. Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

    transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework (Source: [1]) Slide 6/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  7. Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

    transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Audio Feature Extraction (Source: [1]) Slide 7/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  8. Research Group for Geometric Optimization and Machine Learning Features for

    musical genre recognition Meta-Data Features Short-Time Audio Features Slide 8/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  9. Research Group for Geometric Optimization and Machine Learning Short-Time Audio

    Features - Spectrogram Figure : Spectrogram Classical Figure : Spectrogram Rock Slide 9/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  10. Research Group for Geometric Optimization and Machine Learning Short-Time Audio

    Features - Constant Q Transformation Figure : Constant Q Transform - Classical Figure : Constant Q Transform - Rock Slide 10/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  11. Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

    transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Codebook Generation (Source: [1]) Slide 11/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  12. Research Group for Geometric Optimization and Machine Learning Dictionary Learning

    Given an input signal vector y ∈ Rn, the sparse representation problem can be mathematically formulated as: x∗ = argmin x 1 2 y − Dx 2 2 + λ x 1 (1) Figure : Source: [5] Slide 12/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  13. Research Group for Geometric Optimization and Machine Learning K-SVD Dictionary

    Learning Algorithm min D,X Y − DX 2 F subject to ∀i, xi 0 ≤ T0 (2) ⇓ K-SVD is a generalization of the K-Means algorithm. Slide 13/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  14. Research Group for Geometric Optimization and Machine Learning K-SVD -

    Algorithm Initialization: Initialization of the dictionary D ∈ RnxK Sparse Coding Step: Sparse coding of examples based on the current dictionary Codebook Update Step Updating the dictionary atoms to better fit the data. Slide 14/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  15. Research Group for Geometric Optimization and Machine Learning Dictionary Learning

    for MGR Dictionaries are trained separately per genre The resulting dictionary is the concatenation of all the separately trained dictionaries D [D1, D2, D3, ..., Dc] (3) Slide 15/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  16. Research Group for Geometric Optimization and Machine Learning Dictionary Learning

    for MGR Dictionaries are trained separately per genre The resulting dictionary is the concatenation of all the separately trained dictionaries D [D1, D2, D3, ..., Dc] (3) Slide 15/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  17. Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

    transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Encoding (Source: [1]) Slide 16/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  18. Research Group for Geometric Optimization and Machine Learning Encoding After

    codebook generation the training data is re-encoded with the concatenated dictionary D [D1, D2, D3, ..., Dc] resulting in a sparse representation. Slide 17/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  19. Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

    transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Aggregation (Source: [1]) Slide 18/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  20. Research Group for Geometric Optimization and Machine Learning Histogram Aggregation

    We aggregate the encoded frames into “texture windows” [3] Texture Windows: Minimum amount of time that is necessary to identify a particular music “texture” Window size: 3-5 seconds Song represented as a bag-of-histograms. i.e. 6 histograms per song The histograms inherit the labels Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  21. Research Group for Geometric Optimization and Machine Learning Histogram Aggregation

    We aggregate the encoded frames into “texture windows” [3] Texture Windows: Minimum amount of time that is necessary to identify a particular music “texture” Window size: 3-5 seconds Song represented as a bag-of-histograms. i.e. 6 histograms per song The histograms inherit the labels Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  22. Research Group for Geometric Optimization and Machine Learning Histogram Aggregation

    We aggregate the encoded frames into “texture windows” [3] Texture Windows: Minimum amount of time that is necessary to identify a particular music “texture” Window size: 3-5 seconds Song represented as a bag-of-histograms. i.e. 6 histograms per song The histograms inherit the labels Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  23. Research Group for Geometric Optimization and Machine Learning Histogram Aggregation

    We aggregate the encoded frames into “texture windows” [3] Texture Windows: Minimum amount of time that is necessary to identify a particular music “texture” Window size: 3-5 seconds Song represented as a bag-of-histograms. i.e. 6 histograms per song The histograms inherit the labels Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  24. Research Group for Geometric Optimization and Machine Learning Histogram Aggregation

    We aggregate the encoded frames into “texture windows” [3] Texture Windows: Minimum amount of time that is necessary to identify a particular music “texture” Window size: 3-5 seconds Song represented as a bag-of-histograms. i.e. 6 histograms per song The histograms inherit the labels Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  25. Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

    transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Aggregation (Source: [1]) Slide 20/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  26. Research Group for Geometric Optimization and Machine Learning SVM with

    Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  27. Research Group for Geometric Optimization and Machine Learning SVM with

    Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  28. Research Group for Geometric Optimization and Machine Learning SVM with

    Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  29. Research Group for Geometric Optimization and Machine Learning SVM with

    Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  30. Research Group for Geometric Optimization and Machine Learning SVM with

    Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  31. Research Group for Geometric Optimization and Machine Learning SVM with

    Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  32. Research Group for Geometric Optimization and Machine Learning Experiments Slide

    22/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  33. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  34. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  35. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  36. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  37. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  38. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  39. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  40. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  41. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  42. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  43. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  44. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  45. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  46. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  47. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  48. Research Group for Geometric Optimization and Machine Learning Experiments Summary

    II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  49. Research Group for Geometric Optimization and Machine Learning Data Partitioning

    Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  50. Research Group for Geometric Optimization and Machine Learning Data Partitioning

    Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  51. Research Group for Geometric Optimization and Machine Learning Data Partitioning

    Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  52. Research Group for Geometric Optimization and Machine Learning Data Partitioning

    Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  53. Research Group for Geometric Optimization and Machine Learning Data Partitioning

    Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  54. Research Group for Geometric Optimization and Machine Learning Data Partitioning

    Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  55. Research Group for Geometric Optimization and Machine Learning Data Partitioning

    Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  56. Research Group for Geometric Optimization and Machine Learning Data Partitioning

    Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  57. Research Group for Geometric Optimization and Machine Learning Results Slide

    26/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  58. Research Group for Geometric Optimization and Machine Learning Results -

    Dictionary Update Iterations Y − DX 2 F for J = 1, 2, . . . , k (4) Slide 27/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  59. Research Group for Geometric Optimization and Machine Learning Encoding -

    Atom Usage : Blues blues classical country disco hip hop jazz metal pop reggae rock 0 200 400 600 800 1000 1200 Dictionaries Atom counts Atom usage counts Slide 28/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  60. Research Group for Geometric Optimization and Machine Learning Encoding -

    Atom Usage : Rock blues classical country disco hip hop jazz metal pop reggae rock 0 100 200 300 400 500 600 700 800 900 1000 1100 Dictionaries Atom counts Atom usage counts Slide 29/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  61. Research Group for Geometric Optimization and Machine Learning Results -

    Histogram Level Table : Results summary using 90-10% split and normalized spectrogram Dictionary Target Cross test set size sparsity validation performance performance 500 1 75.96 62.83 1000 1 80.20 63.40 2000 1 81.69 65.67 3000 1 84.40 60.00 4000 1 77.61 65.16 500 2 65.10 53.66 1000 2 68.88 60.00 2000 2 74.55 61.00 3000 2 76.29 62.3 4000 2 76.29 62.3 500 3 65.16 54.87 1000 3 54.87 56.16 2000 3 71.81 58.83 3000 3 72.10 59.0 4000 3 74.22 59.3 Slide 30/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  62. Research Group for Geometric Optimization and Machine Learning Results -

    Clip Level Table : Results using full data and cross-validation with normalized spectrogram Dictionary Target Cross Cross size sparsity validation validation performance perf. clip level 500 1 75.02 79.53 1000 1 78.05 81.50 2000 1 82.11 84.26 3000 1 83.40 85.10 4000 1 83.23 85.12 Slide 31/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  63. Research Group for Geometric Optimization and Machine Learning Results -

    Dictionary Size vs Performance 500 1000 1500 2000 2500 3000 3500 4000 Dictionary Size 74 76 78 80 82 84 86 Performance Performance with different dictionary sizes. Frame level Performance Clip level Performance Slide 32/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  64. Research Group for Geometric Optimization and Machine Learning Results -

    Confusion Matrix blues classical country disco hiphop jazz metal pop reggae rock blues classical country disco hiphop jazz metal pop reggae rock 8 0 0 1 0 0 0 0 0 1 0 10 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 1 0 0 0 8 0 0 1 2 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 8 1 0 2 0 0 0 0 0 0 0 9 1 0 0 0 0 0 0 1 0 0 7 0 1 2 3 4 5 6 7 8 9 10 Figure : Confusion matrix from the test runs Slide 33/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  65. Research Group for Geometric Optimization and Machine Learning Results -

    State-of-the-art accuracies Tzanetakis et al.[TC02b] Panagakis et al.[PBK08] Yeh et al.[YY12] K-SVD + Histogram SVM Panagakis et al.[PKIA09] Chang et al.[CsRJI10] 0 20 40 60 80 100 Accuracy (%) Figure : State-of-the-art accuracies Slide 34/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  66. Research Group for Geometric Optimization and Machine Learning Conclusion K-SVD

    in combination with SVM HIK Kernel performs comparable with other state-of-the-art techniques. Sparsity 1 is the best set-up for this particular task Learning a number of sub-dictionaries for each class enhances the discriminative power of the encoding system This technique works better when the dictionary is intialized with frames from the data. Slide 35/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  67. Research Group for Geometric Optimization and Machine Learning Thank you.

    Slide 36/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  68. Research Group for Geometric Optimization and Machine Learning Sources I

    [1] Yeh, Chin-Chia Michael and Yang, Yi-Hsuan Supervised dictionary learning for music genre classification. ACM, 2012., ISBN: 978-1-4503-1329-2 [2] Aharon, M. and Elad, M. and Bruckstein, A. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation Signal Processing, IEEE Transactions on [3] Tzanetakis, G. and Cook, P. Musical genre classification of audio signals Speech and Audio Processing, IEEE Transactions on Slide 1/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  69. Research Group for Geometric Optimization and Machine Learning Sources II

    [4] Subhransu Maji, Alexander C. Berg and Jitendra Malik Fast Intersection / Additive Kernel SVM Toolbox [5] Course Slides - Information retrieval in high dimensional data WS1213, (Image) [6] Sturm, Bob L. An Analysis of the GTZAN Music Genre Dataset. Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-centered and Multimodal Strategies Slide 2/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  70. Research Group for Geometric Optimization and Machine Learning Backup Slides

    Slide 3/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  71. Research Group for Geometric Optimization and Machine Learning Graphical User

    Interface Slide 4/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  72. Research Group for Geometric Optimization and Machine Learning K-SVD -

    Initialization Phase Initialization Set the dictionary matrix D(0) ∈ RnxK with l2 normalized columns. Set J = 1. Slide 5/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  73. Research Group for Geometric Optimization and Machine Learning K-SVD -

    Sparse Coding Step Sparse Coding Step Use any pursuit algorithm to compute the representation vectors xi for each example yi , by approximating the solution of i = 1, 2, . . . , N, min xi yi − Dxi 2 2 subject to xi 0 ≤ T0. (5) Slide 6/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  74. Research Group for Geometric Optimization and Machine Learning K-SVD -

    Codebook Update Step Codebook Update Step For each column k = 1, 2, . . . , K ∈ DJ−1, update it by Define the group of examples that use this atom, ωk = i|1 ≤ i ≤ N, xk T (i) = 0 . Compute the overall representation error matrix, Ek , by Ek = Y − j=k dj xi T (6) Restrict Ek by choosing only the columns corresponding to ωk , and obtain ER k Apply SVD decomposition ER=U∆VT k . Choose the updated dictionary column dk to be the first column of U. Update the coefficient vector xk R to be the first column of V multiplied by ∆(1, 1). Slide 7/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
  75. Research Group for Geometric Optimization and Machine Learning K-SVD -

    Update Iteration Step Increase Iteration Step Set J = J + 1 Slide 8/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013