Dictionary Learning for Music Genre Recognition

Transcript

Research Group for Geometric Optimization and Machine Learning Music Genre

Recognition using Dictionary Learning Interdisciplinary Project Miguel Cabrera, Thomas Pieronczyk Research Group for Geometric Optimization and Machine Learning October 25, 2013 Slide 1/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Table of

contents Introduction Framework Experiments Results Conclusion Slide 2/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Introduction Music

Information Retrieval (MIR) Artist, instrument and chord recognition Music annotation (tagging) Mood and genre classiﬁcation Music Genre Recognition (MGR) Slide 3/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning MGR Challanges

High dimensional No formal deﬁnition Highly subjective One song → Many genres Constantly new genres appearing Slide 4/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Objectives Main

objective: System that predicts the musical genre of a piece of music Combination of: Yeh and Youngs Dictionary Learning Framework for Music Genre Recognition (MGR) [1] with the K-SVD algorithm from Aharon et. al. [2]. Slide 5/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework (Source: [1]) Slide 6/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Audio Feature Extraction (Source: [1]) Slide 7/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Features for

musical genre recognition Meta-Data Features Short-Time Audio Features Slide 8/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Short-Time Audio

Features - Spectrogram Figure : Spectrogram Classical Figure : Spectrogram Rock Slide 9/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Short-Time Audio

Features - Constant Q Transformation Figure : Constant Q Transform - Classical Figure : Constant Q Transform - Rock Slide 10/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Codebook Generation (Source: [1]) Slide 11/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Dictionary Learning

Given an input signal vector y ∈ Rn, the sparse representation problem can be mathematically formulated as: x∗ = argmin x 1 2 y − Dx 2 2 + λ x 1 (1) Figure : Source: [5] Slide 12/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning K-SVD Dictionary

Learning Algorithm min D,X Y − DX 2 F subject to ∀i, xi 0 ≤ T0 (2) ⇓ K-SVD is a generalization of the K-Means algorithm. Slide 13/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning K-SVD -

Algorithm Initialization: Initialization of the dictionary D ∈ RnxK Sparse Coding Step: Sparse coding of examples based on the current dictionary Codebook Update Step Updating the dictionary atoms to better ﬁt the data. Slide 14/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Dictionary Learning

for MGR Dictionaries are trained separately per genre The resulting dictionary is the concatenation of all the separately trained dictionaries D [D1, D2, D3, ..., Dc] (3) Slide 15/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Dictionary Learning

for MGR Dictionaries are trained separately per genre The resulting dictionary is the concatenation of all the separately trained dictionaries D [D1, D2, D3, ..., Dc] (3) Slide 15/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Encoding (Source: [1]) Slide 16/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Encoding After

codebook generation the training data is re-encoded with the concatenated dictionary D [D1, D2, D3, ..., Dc] resulting in a sparse representation. Slide 17/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Aggregation (Source: [1]) Slide 18/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Histogram Aggregation

We aggregate the encoded frames into “texture windows” [3] Texture Windows: Minimum amount of time that is necessary to identify a particular music “texture” Window size: 3-5 seconds Song represented as a bag-of-histograms. i.e. 6 histograms per song The histograms inherit the labels Slide 19/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Histogram Aggregation

Research Group for Geometric Optimization and Machine Learning Framework Audio-Signal

transformation & feature extraction Codebook generation Encoding Code word encoding aggregation Training Audio-Signal transformation & feature extraction Encoding Code word encoding aggregation Prediction Training Songs Test song Ground truth If supervised Codebook Codebook Training Testing Prediction Figure : Framework - Aggregation (Source: [1]) Slide 20/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning SVM with

Histogram Intersection Kernel We use a Support Vector Machine for the classiﬁcation step. Histogram Intersection Kernel KHI(ha, hb) = k j=1 min(ha(j), hb(j)) Measure the degree of similarity between two histograms. Computational comparable with linear SVM Works better than linear and non-linear SVM for histograms features. Implementation1 based on the popular Libsvm for Matlab. 1http://www.cs.berkeley.edu/ smaji/projects/ﬁksvm/ Slide 21/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning SVM with

Research Group for Geometric Optimization and Machine Learning Experiments Slide

22/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Experiments Summary

I Data GTZAN dataset comprising 1000 songs with 30 sec. length, equally divided into 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. One of the most frequently used datasets in MGR. But: Exposes several problems such as replications, mislabelings, and distortions. [6] Features CQT Spectrogram Features are normalized Slide 23/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Experiments Summary

II Dictionary Learning Initialization: random and from data. Dictionary Size: 50-400 Target Sparsity: 1-3 Pursuit Algorithm: Orthogonal Matching Pursuit (OMP) Classiﬁcation Parameter Selection: Experimentally using 10-fold cross-validation Performance measures: Accuracy at histogram and clip level Slide 24/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Experiments Summary

Research Group for Geometric Optimization and Machine Learning Data Partitioning

Two partitioning schemes 90-10 90% used for dictionary and SVM training 10%: Encoded with the dictionary learned and used as testing set Full data 100% used for dictionary and SVM training Performance evaluation with with 10-fold cross validation This scheme is the one used in the literature Slide 25/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Data Partitioning

Research Group for Geometric Optimization and Machine Learning Results Slide

26/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Results -

Dictionary Update Iterations Y − DX 2 F for J = 1, 2, . . . , k (4) Slide 27/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Encoding -

Atom Usage : Blues blues classical country disco hip hop jazz metal pop reggae rock 0 200 400 600 800 1000 1200 Dictionaries Atom counts Atom usage counts Slide 28/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Encoding -

Atom Usage : Rock blues classical country disco hip hop jazz metal pop reggae rock 0 100 200 300 400 500 600 700 800 900 1000 1100 Dictionaries Atom counts Atom usage counts Slide 29/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Results -

Histogram Level Table : Results summary using 90-10% split and normalized spectrogram Dictionary Target Cross test set size sparsity validation performance performance 500 1 75.96 62.83 1000 1 80.20 63.40 2000 1 81.69 65.67 3000 1 84.40 60.00 4000 1 77.61 65.16 500 2 65.10 53.66 1000 2 68.88 60.00 2000 2 74.55 61.00 3000 2 76.29 62.3 4000 2 76.29 62.3 500 3 65.16 54.87 1000 3 54.87 56.16 2000 3 71.81 58.83 3000 3 72.10 59.0 4000 3 74.22 59.3 Slide 30/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Results -

Clip Level Table : Results using full data and cross-validation with normalized spectrogram Dictionary Target Cross Cross size sparsity validation validation performance perf. clip level 500 1 75.02 79.53 1000 1 78.05 81.50 2000 1 82.11 84.26 3000 1 83.40 85.10 4000 1 83.23 85.12 Slide 31/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Results -

Dictionary Size vs Performance 500 1000 1500 2000 2500 3000 3500 4000 Dictionary Size 74 76 78 80 82 84 86 Performance Performance with different dictionary sizes. Frame level Performance Clip level Performance Slide 32/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Results -

Confusion Matrix blues classical country disco hiphop jazz metal pop reggae rock blues classical country disco hiphop jazz metal pop reggae rock 8 0 0 1 0 0 0 0 0 1 0 10 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 1 0 0 0 8 0 0 1 2 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 1 0 0 0 8 1 0 2 0 0 0 0 0 0 0 9 1 0 0 0 0 0 0 1 0 0 7 0 1 2 3 4 5 6 7 8 9 10 Figure : Confusion matrix from the test runs Slide 33/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Results -

State-of-the-art accuracies Tzanetakis et al.[TC02b] Panagakis et al.[PBK08] Yeh et al.[YY12] K-SVD + Histogram SVM Panagakis et al.[PKIA09] Chang et al.[CsRJI10] 0 20 40 60 80 100 Accuracy (%) Figure : State-of-the-art accuracies Slide 34/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Conclusion K-SVD

in combination with SVM HIK Kernel performs comparable with other state-of-the-art techniques. Sparsity 1 is the best set-up for this particular task Learning a number of sub-dictionaries for each class enhances the discriminative power of the encoding system This technique works better when the dictionary is intialized with frames from the data. Slide 35/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Thank you.

Slide 36/36 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Sources I

[1] Yeh, Chin-Chia Michael and Yang, Yi-Hsuan Supervised dictionary learning for music genre classiﬁcation. ACM, 2012., ISBN: 978-1-4503-1329-2 [2] Aharon, M. and Elad, M. and Bruckstein, A. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation Signal Processing, IEEE Transactions on [3] Tzanetakis, G. and Cook, P. Musical genre classiﬁcation of audio signals Speech and Audio Processing, IEEE Transactions on Slide 1/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Sources II

[4] Subhransu Maji, Alexander C. Berg and Jitendra Malik Fast Intersection / Additive Kernel SVM Toolbox [5] Course Slides - Information retrieval in high dimensional data WS1213, (Image) [6] Sturm, Bob L. An Analysis of the GTZAN Music Genre Dataset. Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-centered and Multimodal Strategies Slide 2/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Backup Slides

Slide 3/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning Graphical User

Interface Slide 4/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning K-SVD -

Initialization Phase Initialization Set the dictionary matrix D(0) ∈ RnxK with l2 normalized columns. Set J = 1. Slide 5/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning K-SVD -

Sparse Coding Step Sparse Coding Step Use any pursuit algorithm to compute the representation vectors xi for each example yi , by approximating the solution of i = 1, 2, . . . , N, min xi yi − Dxi 2 2 subject to xi 0 ≤ T0. (5) Slide 6/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning K-SVD -

Codebook Update Step Codebook Update Step For each column k = 1, 2, . . . , K ∈ DJ−1, update it by Define the group of examples that use this atom, ωk = i|1 ≤ i ≤ N, xk T (i) = 0 . Compute the overall representation error matrix, Ek , by Ek = Y − j=k dj xi T (6) Restrict Ek by choosing only the columns corresponding to ωk , and obtain ER k Apply SVD decomposition ER=U∆VT k . Choose the updated dictionary column dk to be the first column of U. Update the coefficient vector xk R to be the first column of V multiplied by ∆(1, 1). Slide 7/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Research Group for Geometric Optimization and Machine Learning K-SVD -

Update Iteration Step Increase Iteration Step Set J = J + 1 Slide 8/8 | Interdisciplinary Project | Music Genre Recognition using Dictionary Learning | July 2013

Dictionary Learning for Music Genre Recognition

Dictionary Learning for Music Genre Recognition

More Decks by Miguel Cabrera

Other Decks in Research

Featured

Transcript