Source Code: https://github.com/mfcabrera/wtg As mandatory interdisciplinary project in my M.Sc. at the TUM we worked in the Machine Learning and Geometric Optimization group implementing a system for Music Genre Recognition using K-SVD and SVM.
Research Group for Geometric Optimization and Machine Learning Music Genre Recognition using Dictionary Learning Interdisciplinary Project Miguel Cabrera, Thomas Pieronczyk Research Group for Geometric Optimization and Machine Learning October 25, 2013 Slide 1/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Table of contents Introduction Framework Experiments Results Conclusion Slide 2/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Introduction Music Information Retrieval (MIR) Artist, instrument and chord recognition Music annotation (tagging) Mood and genre classification Music Genre Recognition (MGR) Slide 3/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning MGR Challanges High dimensional No formal definition Highly subjective One song → Many genres Constantly new genres appearing Slide 4/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Objectives Main objective: System that predicts the musical genre of a piece of music Combination of: Yeh and Youngs Dictionary Learning Framework for Music Genre Recognition (MGR) [1] with the K-SVD algorithm from Aharon et. al. [2]. Slide 5/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework (Source: [1]) Slide 6/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Audio Feature Extraction (Source: [1]) Slide 7/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Features for musical genre recognition Meta-Data Features Short-Time Audio Features Slide 8/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Short-Time Audio Features - Spectrogram Figure : Spectrogram Classical Figure : Spectrogram Rock Slide 9/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Short-Time Audio Features - Constant Q Transformation Figure : Constant Q Transform - Classical Figure : Constant Q Transform - Rock Slide 10/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Codebook Generation (Source: [1]) Slide 11/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Dictionary Learning Given an input signal vector y ∈ Rn, the sparse representation problem can be mathematically formulated as: x∗ = argmin x 1 2 y − Dx 2 2 + λ x 1 (1) Figure : Source: [5] Slide 12/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning K-SVD Dictionary Learning Algorithm min D,X Y − DX 2 F subject to ∀i, xi 0 ≤ T0 (2) ⇓ K-SVD is a generalization of the K-Means algorithm. Slide 13/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning K-SVD - Algorithm Initialization: Initialization of the dictionary D ∈ RnxK Sparse Coding Step: Sparse coding of examples based on the current dictionary Codebook Update Step Updating the dictionary atoms to better fit the data. Slide 14/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Dictionary Learning for MGR Dictionaries are trained separately per genre The resulting dictionary is the concatenation of all the separately trained dictionaries D [D1, D2, D3, ..., Dc] (3) Slide 15/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Dictionary Learning for MGR Dictionaries are trained separately per genre The resulting dictionary is the concatenation of all the separately trained dictionaries D [D1, D2, D3, ..., Dc] (3) Slide 15/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Encoding (Source: [1]) Slide 16/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Encoding After codebook generation the training data is re-encoded with the concatenated dictionary D [D1, D2, D3, ..., Dc] resulting in a sparse representation. Slide 17/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Aggregation (Source: [1]) Slide 18/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Histogram Aggregation We aggregate the encoded frames into “texture windows” [3] Texture Windows: Minimum amount of time that is necessary to identify a particular music “texture” Window size: 3-5 seconds Song represented as a bag-of-histograms. i.e. 6 histograms per song The histograms inherit the labels Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Histogram Aggregation We aggregate the encoded frames into “texture windows” [3] Texture Windows: Minimum amount of time that is necessary to identify a particular music “texture” Window size: 3-5 seconds Song represented as a bag-of-histograms. i.e. 6 histograms per song The histograms inherit the labels Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Histogram Aggregation We aggregate the encoded frames into “texture windows” [3] Texture Windows: Minimum amount of time that is necessary to identify a particular music “texture” Window size: 3-5 seconds Song represented as a bag-of-histograms. i.e. 6 histograms per song The histograms inherit the labels Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Histogram Aggregation We aggregate the encoded frames into “texture windows” [3] Texture Windows: Minimum amount of time that is necessary to identify a particular music “texture” Window size: 3-5 seconds Song represented as a bag-of-histograms. i.e. 6 histograms per song The histograms inherit the labels Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Histogram Aggregation We aggregate the encoded frames into “texture windows” [3] Texture Windows: Minimum amount of time that is necessary to identify a particular music “texture” Window size: 3-5 seconds Song represented as a bag-of-histograms. i.e. 6 histograms per song The histograms inherit the labels Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Aggregation (Source: [1]) Slide 20/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning SVM with Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning SVM with Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning SVM with Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning SVM with Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning SVM with Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning SVM with Histogram Intersection Kernel We use a Support Vector Machine for the classification step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/fiksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Slide 22/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Experiments Summary II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classification Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Data Partitioning Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Data Partitioning Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Data Partitioning Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Data Partitioning Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Data Partitioning Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Data Partitioning Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Data Partitioning Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Data Partitioning Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Results Slide 26/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Results - Dictionary Update Iterations Y − DX 2 F for J = 1, 2, . . . , k (4) Slide 27/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Encoding - Atom Usage : Blues blues classical country disco hip hop jazz metal pop reggae rock 0 200 400 600 800 1000 1200 Dictionaries Atom counts Atom usage counts Slide 28/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Encoding - Atom Usage : Rock blues classical country disco hip hop jazz metal pop reggae rock 0 100 200 300 400 500 600 700 800 900 1000 1100 Dictionaries Atom counts Atom usage counts Slide 29/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Results - State-of-the-art accuracies Tzanetakis et al.[TC02b] Panagakis et al.[PBK08] Yeh et al.[YY12] K-SVD + Histogram SVM Panagakis et al.[PKIA09] Chang et al.[CsRJI10] 0 20 40 60 80 100 Accuracy (%) Figure : State-of-the-art accuracies Slide 34/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Conclusion K-SVD in combination with SVM HIK Kernel performs comparable with other state-of-the-art techniques. Sparsity 1 is the best set-up for this particular task Learning a number of sub-dictionaries for each class enhances the discriminative power of the encoding system This technique works better when the dictionary is intialized with frames from the data. Slide 35/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Thank you. Slide 36/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Sources I [1] Yeh, Chin-Chia Michael and Yang, Yi-Hsuan Supervised dictionary learning for music genre classification. ACM, 2012., ISBN: 978-1-4503-1329-2 [2] Aharon, M. and Elad, M. and Bruckstein, A. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation Signal Processing, IEEE Transactions on [3] Tzanetakis, G. and Cook, P. Musical genre classification of audio signals Speech and Audio Processing, IEEE Transactions on Slide 1/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Sources II [4] Subhransu Maji, Alexander C. Berg and Jitendra Malik Fast Intersection / Additive Kernel SVM Toolbox [5] Course Slides - Information retrieval in high dimensional data WS1213, (Image) [6] Sturm, Bob L. An Analysis of the GTZAN Music Genre Dataset. Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-centered and Multimodal Strategies Slide 2/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Backup Slides Slide 3/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning Graphical User Interface Slide 4/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning K-SVD - Initialization Phase Initialization Set the dictionary matrix D(0) ∈ RnxK with l2 normalized columns. Set J = 1. Slide 5/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning K-SVD - Sparse Coding Step Sparse Coding Step Use any pursuit algorithm to compute the representation vectors xi for each example yi , by approximating the solution of i = 1, 2, . . . , N, min xi yi − Dxi 2 2 subject to xi 0 ≤ T0. (5) Slide 6/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning K-SVD - Codebook Update Step Codebook Update Step For each column k = 1, 2, . . . , K ∈ DJ−1, update it by Define the group of examples that use this atom, ωk = i|1 ≤ i ≤ N, xk T (i) = 0 . Compute the overall representation error matrix, Ek , by Ek = Y − j=k dj xi T (6) Restrict Ek by choosing only the columns corresponding to ωk , and obtain ER k Apply SVD decomposition ER=U∆VT k . Choose the updated dictionary column dk to be the first column of U. Update the coefficient vector xk R to be the first column of V multiplied by ∆(1, 1). Slide 7/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013
Research Group for Geometric Optimization and Machine Learning K-SVD - Update Iteration Step Increase Iteration Step Set J = J + 1 Slide 8/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013