Kaggle Fress Sound Audio Tagging 2019 Model Pipeline

F6c0cb53d72908942998923f1a05c71b?s=47 Maxwell
June 10, 2019

Kaggle Fress Sound Audio Tagging 2019 Model Pipeline

Model Pipeline

F6c0cb53d72908942998923f1a05c71b?s=128

Maxwell

June 10, 2019
Tweet

Transcript

  1. Free Sound Audio Tagging 2019 Remove Silent Audios ~ 5k

    train curated audios ~ 20k train noisy audios test audios Maxwell https://www.kaggle.com/maxwell110 Trim Silent Parts librosa.effects.trim Sampling Rate : 44.1 kHz FFT Window Size : 80 ms Hop Length : 10 ms Mel Bands : 64 librosa.feature.melspectrogram librosa.core.power_to_db Log Mel Spectrogram 64 X ( depends on each audio length ) 1 Frequency - Wise 25 Statistical Features ( mean, std, mean grad, ... ) stat 1 stat 2 stat 25 64 64 x 25 features Normalized with constant range 2 Clustered Features Distances to each cluaster center Standardized n_clusters : 200 MiniBatchKMeans Ubuntu 18.04 Geforce GTX 1080 Ti x 2 Training and Prediction 80 Classes - Balanced 5 folds Mobile Net V2 ResNet 50 DenseNet 121 1 Feature BN Conv2D Point - wise 2D Convolution 10 filters Conv2D Point - wise 2D Convolution 3 filters Feature Extraction Dense Layer 1536 nodes + BN 2 Feature Dense Layer 384 nodes + BN BCE SoftMax BCE BCE BCE BCE Global Average Pooling 2D Dense Layer 80 Classes Model 1 Prediction  TTA with width shift range : + - 0.2 height shift range : +- 6/64  5 Log Mel Spec Length [263, 388, 513, 638, 763] Model 2 Model 3 Model 4 Model 5 Model 6 Blending Weighted Geometric Average Weights are optimized using 5 fold OOF prediction 2 Feature 2 Feature  Data Augmentation with width shift range : + - 0.6 height shift range : +- 12/64  Random Log Mel Spec Length Cropping with padding 263 <= Length <= 763  Mix Up with alpha = 0.5  3 stage learning with 3 LR schedule ( Cyclic, RLRonPlateau x 2 ) 1. train w/o Feature 2 , train-curated ONLY 2. train with Feature2 , train-curated and ALL train-noisy using MODIFIED BCE ( ignore records with high error ) 3. train with Feature2 , train-curated and SELECTED train-noisy where BCEs are good Training BN BN BN Public : 31 th / 880 teams Local 0.876 on train-curated Public LB 0.719 Private : 28 th / 880 teams Private LB 0.72820