7th Place Solution for Freesound Audio Tagging 2019

7th Place Solution for Freesound Audio Tagging 2019 Tokyo BISH
Bash #01 Hitachi, Ltd. Social Infrastructure Information Systems Division 2/26/2020 Tatsuya Uratani

© Hitachi, Ltd. 2020. All rights reserved. 7th Place Solution
for Freesound Audio Tagging 2019

© Hitachi, Ltd. 2020. All rights reserved. Hello! I am
Tatsuya Uratani You can find me at @uratatsu on kaggle. 3

© Hitachi, Ltd. 2020. All rights reserved. Uratatsu Major :
Evolutionary psychology, Evolutionary Ecology Occupation : Data Scientist 4

© Hitachi, Ltd. 2020. All rights reserved. Team Shirogane 5
@kaerururu • NLP ML engineer • Kaggle Master @Hidehisa Arai • The University of Tokyo • Kaggle Master

© Hitachi, Ltd. 2020. All rights reserved. Competition Overview Welcome
to the kaggle competition. 1

© Hitachi, Ltd. 2020. All rights reserved. Freesound Audio Tagging
2019 7 Task • 80 class multilabel classification of audio data • Clean data and noisy data from different sources

© Hitachi, Ltd. 2020. All rights reserved. Competition Rules •
GPU Kernel < 1 hour run-time (Inference only) • External data and pre-trained models are not allowed • 2 stage kernel competition Light models ensemble or Heavy single model TTA or Ensemble Faster preprocess 8

© Hitachi, Ltd. 2020. All rights reserved. Competition Points •
Image classification when converted to spectrogram. • Many preprocess parameters • Various length inputs • Multi class Multi label • Clean(4970) and Noisy(19815) datasets 9

© Hitachi, Ltd. 2020. All rights reserved. Basic Solution 10
Convolutional Neural Network Convert to Melspectrogram 128 * 128 Random Crop Various Length 80 classes

© Hitachi, Ltd. 2020. All rights reserved. Sound Sample 11
Accordion Church_bell Female_singing Sigh Motorcycle Traffic_noise_and_roadway_noise Meow Bathtub_(filling_or_washing)

© Hitachi, Ltd. 2020. All rights reserved. 7th Place Solution
～Shirogane Solution～ 2

© Hitachi, Ltd. 2020. All rights reserved. 13 Inception 1ch
LB 0.720↑ CustomCNN LB 0.720↑ Inception 3ch LB 0.724↑ Noisy Log Mel Spectrogram CV :0.01 up LB :0.005 up Augmentation in batch ・mix up ・Random Resized Crop (Inception only) ・HorizontalFlip Data Augmentation ・pitch Data Augmentation ・fade ・treble & bass ・pitch ・equalize ・reverb 20tta Predict 20tta 20tta Blending CV :0.01 up Pretrain Inception 1ch Inception 3ch CustomCNN weight Curated Log Mel Spectrogram 128 * 128 Strength Crop 128 * 128 Strength Crop Shirogane Solution Augmentation in batch ・mix up ・Random Erasing ・Coarse Dropout ・Random Resized Crop (Inception only) ・HorizontalFlip

© Hitachi, Ltd. 2020. All rights reserved. Data Augmentation with
SoX • Fade • Pitch • Reverb 14 Point 1 • Treble & Bass • Equalize

© Hitachi, Ltd. 2020. All rights reserved. Strength Adaptive Crop
• Crop by distribution of strength of db Why we noticed ? 15 Point 2

© Hitachi, Ltd. 2020. All rights reserved. Custom Convolution Network
• Custom pooling layer 16 Point 3 def forward(self, x): x = x.view(x.size(0), 1, x.size(1), x.size(2)) x = self.conv1(x, pool_size=(1, 1), pool_type="both") x = self.conv2(x, pool_size=(4, 1), pool_type="both") x = self.conv3(x, pool_size=(1, 3), pool_type="both") x = self.conv4(x, pool_size=(4, 1), pool_type="both") x = self.conv5(x, pool_size=(1, 3), pool_type="both") ……………………………………………………………………………… elif pool_type == "both": x1 = F.max_pool2d(x, kernel_size=pool_size) x2 = F.avg_pool2d(x, kernel_size=pool_size) x = x1 + x2

© Hitachi, Ltd. 2020. All rights reserved. Inception with Random
Resized Crop • We use Inception-V3 model • Default Random Resized Crop • Low score in single fold → score jumping up in 5 folds 17 Point 4 torchvision.transforms.RandomResi zedCrop( size, scale=(0.08, 1.0), ratio=(0.75,1.3333333333333333), interpolation=2)

© Hitachi, Ltd. 2020. All rights reserved. Resources Tools Discussion
:Slack To Do: Trello Code : Kaggle Kernel Our Experience First pure image competition for all members. All Kaggle expert 22 Machine Resources GPU : P100 * 7 → P100 * 4 (Kaggle Kernel)

© Hitachi, Ltd. 2020. All rights reserved. Our Policy Try
every methods in Discussion/ Public Kernel We investigated all discussion and high score kernel. We try most methods. 23 Number of experiments first (Not jobs, Not research) Understanding method is also important, but it is more important not to stop GPUs during the competition.

© Hitachi, Ltd. 2020. All rights reserved. Unsophisticated Works View
and Listen many data Some sounds are very dirty. Viewing and listening the data is most important for making hypothesis. 24 Try many famous architecture We try many famous architecture. (ex. Resnet, Densenet, Wideresnet, resnext, se-resnext...)

KaggleはGoogle LLCの商標です FreesoundはBeijing Xiaoniao Tingting Technology Co., LTD.の商標です SoX http://sox.sourceforge.net/

7th Place Solution for Freesound Audio Tagging ...

7th Place Solution for Freesound Audio Tagging 2019

uratatsu

Other Decks in Research

Featured

Transcript

7th Place Solution for Freesound Audio Tagging 2019 Tokyo BISH

© Hitachi, Ltd. 2020. All rights reserved. 7th Place Solution

© Hitachi, Ltd. 2020. All rights reserved. Hello! I am

© Hitachi, Ltd. 2020. All rights reserved. Uratatsu Major :

© Hitachi, Ltd. 2020. All rights reserved. Team Shirogane 5

© Hitachi, Ltd. 2020. All rights reserved. Competition Overview Welcome

© Hitachi, Ltd. 2020. All rights reserved. Freesound Audio Tagging

© Hitachi, Ltd. 2020. All rights reserved. Competition Rules •

© Hitachi, Ltd. 2020. All rights reserved. Competition Points •

© Hitachi, Ltd. 2020. All rights reserved. Basic Solution 10

© Hitachi, Ltd. 2020. All rights reserved. Sound Sample 11

© Hitachi, Ltd. 2020. All rights reserved. 7th Place Solution

© Hitachi, Ltd. 2020. All rights reserved. 13 Inception 1ch

© Hitachi, Ltd. 2020. All rights reserved. Data Augmentation with

© Hitachi, Ltd. 2020. All rights reserved. Strength Adaptive Crop

© Hitachi, Ltd. 2020. All rights reserved. Custom Convolution Network

© Hitachi, Ltd. 2020. All rights reserved. Inception with Random

© Hitachi, Ltd. 2020. All rights reserved. Other’s Solution ～Other

© Hitachi, Ltd. 2020. All rights reserved. 19 Other’s Solution

© Hitachi, Ltd. 2020. All rights reserved. 20 Other’s Solution

© Hitachi, Ltd. 2020. All rights reserved. How we work

© Hitachi, Ltd. 2020. All rights reserved. Resources Tools Discussion

© Hitachi, Ltd. 2020. All rights reserved. Our Policy Try

© Hitachi, Ltd. 2020. All rights reserved. Unsophisticated Works View

KaggleはGoogle LLCの商標です FreesoundはBeijing Xiaoniao Tingting Technology Co., LTD.の商標です SoX http://sox.sourceforge.net/