7th Place Solution for Freesound Audio Tagging 2019 Tokyo BISH Bash #01 Hitachi, Ltd. Social Infrastructure Information Systems Division 2/26/2020 Tatsuya Uratani

7th Place Solution for Freesound Audio Tagging 2019

Hello! I am Tatsuya Uratani You can find me at @uratatsu on kaggle.

Uratatsu Major : Evolutionary psychology, Evolutionary Ecology Occupation : Data Scientist

Team Shirogane @kaerururu ● NLP ML engineer ● Kaggle Master @Hidehisa Arai ● The University of Tokyo ● Kaggle Master

Competition Overview Welcome to the kaggle competition.

Freesound Audio Tagging 2019 Task ● 80 class multilabel classification of audio data ● Clean data and noisy data from different sources

Competition Rules ● GPU Kernel < 1 hour run-time (Inference only) ● External data and pre-trained models are not allowed ● 2 stage kernel competition Light models ensemble or Heavy single model TTA or Ensemble Faster preprocess

Competition Points ● Image classification when converted to spectrogram. ● Many preprocess parameters ● Various length inputs ● Multi class Multi label ● Clean(4970) and Noisy(19815) datasets

Basic Solution Convolutional Neural Network Convert to Melspectrogram 128 * 128 Random Crop Various Length 80 classes

Sound Sample Accordion Church_bell Female_singing Sigh Motorcycle Traffic_noise_and_roadway_noise Meow Bathtub_(filling_or_washing)

7th Place Solution ~Shirogane Solution~

Inception 1ch LB 0.720↑ CustomCNN LB 0.720↑ Inception 3ch LB 0.724↑ Noisy Log Mel Spectrogram CV :0.01 up LB :0.005 up Augmentation in batch ・mix up ・Random Resized Crop (Inception only) ・HorizontalFlip Data Augmentation ・pitch Data Augmentation ・fade ・treble & bass ・pitch ・equalize ・reverb 20tta Predict 20tta 20tta Blending CV :0.01 up Pretrain Inception 1ch Inception 3ch CustomCNN weight Curated Log Mel Spectrogram 128 * 128 Strength Crop 128 * 128 Strength Crop Shirogane Solution Augmentation in batch ・mix up ・Random Erasing ・Coarse Dropout ・Random Resized Crop (Inception only) ・HorizontalFlip

Data Augmentation with SoX ● Fade ● Pitch ● Reverb Point 1 ● Treble & Bass ● Equalize

Strength Adaptive Crop ● Crop by distribution of strength of db Why we noticed ? Point 2

Custom Convolution Network ● Custom pooling layer Point 3 def forward(self, x): x = x.view(x.size(0), 1, x.size(1), x.size(2)) x = self.conv1(x, pool_size=(1, 1), pool_type="both") x = self.conv2(x, pool_size=(4, 1), pool_type="both") x = self.conv3(x, pool_size=(1, 3), pool_type="both") x = self.conv4(x, pool_size=(4, 1), pool_type="both") x = self.conv5(x, pool_size=(1, 3), pool_type="both") ……………………………………………………………………………… elif pool_type == "both": x1 = F.max_pool2d(x, kernel_size=pool_size) x2 = F.avg_pool2d(x, kernel_size=pool_size) x = x1 + x2

Inception with Random Resized Crop ● We use Inception-V3 model ● Default Random Resized Crop ● Low score in single fold → score jumping up in 5 folds Point 4 torchvision.transforms.RandomResi zedCrop( size, scale=(0.08, 1.0), ratio=(0.75,1.3333333333333333), interpolation=2)

Other's Solution ~Other Top Solution~

Other's Solution ● Melspectrogram Layer (2nd) They use it to search the hyperparameter of log mel end 2 end ● SpecMix (8th) SpecAugment SpecMix

Other's Solution ● Multitask learning with noisy labels. (4th) ● Semi-supervised learning(SSL) with noisy data. (4th) ● Resnet34 + Envnet-v2 (4th)

How we work ~Unsophisticated work~

Resources Tools Discussion :Slack To Do: Trello Code : Kaggle Kernel Our Experience First pure image competition for all members. All Kaggle expert Machine Resources GPU : P100 * 7 → P100 * 4 (Kaggle Kernel)

Our Policy Try every methods in Discussion/ Public Kernel We investigated all discussion and high score kernel. We try most methods. Number of experiments first (Not jobs, Not research) Understanding method is also important, but it is more important not to stop GPUs during the competition.

Unsophisticated Works View and Listen many data Some sounds are very dirty. Viewing and listening the data is most important for making hypothesis. Try many famous architecture We try many famous architecture. (ex. Resnet, Densenet, Wideresnet, resnext, se-resnext...)

KaggleはGoogle LLCの商標です FreesoundはBeijing Xiaoniao Tingting Technology Co., LTD.の商標です SoX