Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kaggle Fress Sound Audio Tagging 2019 Model Pipeline

Maxwell
June 10, 2019

Kaggle Fress Sound Audio Tagging 2019 Model Pipeline

Model Pipeline for FS2019
Automatically recognize sounds and apply tags of varying natures

28th place solution
https://www.kaggle.com/c/freesound-audio-tagging-2019

Maxwell

June 10, 2019
Tweet

More Decks by Maxwell

Other Decks in Science

Transcript

  1. Free Sound Audio Tagging 2019 Remove Silent Audios train curated

    ~ 5k audios train noisy ~ 20k audios test Trim Silent Parts librosa.effects.trim Sampling Rate : 44.1 kHz FFT Window Size : 80 ms Hop Length : 10 ms Mel Bands : 64 librosa.feature.melspectrogram librosa.core.power_to_db LogMel Spectrogram 64 X ( depends on each audio length ) 1 Frequency - Wise 25 Statistical Features ( mean, std, mean grad, ... ) stat 1 stat 2 stat 25 64 64 x 25 features Normalized with constant range 2 Clustered Features Distances to each cluaster center Flatten => Standardized n_clusters : 200 MiniBatchKMeans Resources: Geforce GTX 1080 Ti x 2 Training and Prediction MobileNet V2 ResNet 50 DenseNet 121 1 BN Conv2D Point - wise 10 filters Conv2D Point - wise 3 filters Dense (1536) + BN Dense (384) + BN BCE SoftMax BCE BCE BCE BCE GAP 2D Dense (80) - TTA (x3) width = 0.2 height = 6/64 - 5 length ensemble 263, 388, 513, 638, 763 - 5 fold ensemble  5 folds using iterative-stratification  Augmentation shift range: width = 0.6, height = 12/64  Random LogMel Length Extraction 263 <= Length <= 763, padding with zero  MixUp: alpha = 0.5  3 stage learning with 3 LR schedule (Cyclic, RLRonPlateau x 2) 1. train w/o Feature 2, train-curated ONLY 2. train w/ Feature 2, train-curated and ALL train-noisy using MODIFIED BCE (ignoring audios with high BCE) 3. train w/ Feature 2, train-curated and SELECTED train-noisy (low BCE audios) Training Public : 31 th / 880 teams Local 0.876 on train-curated Public LB 0.719 Private : 28 th / 880 teams Private LB 0.72820 Feature Extraction Copyright 2019 @ Maxwell_110 concat 2 BN w/o clustered features 2 BN 2 BN Model Backbones prediction Geometric Blending 6 Models Blending Coefficients - Optimized using 5 fold OOF - Blending coefficients MobileNet V2 : 0.24/0.22 ResNet 50 : 0.15/0.07 DenseNet 121 : 0.12/0.20