Slide 1

Slide 1 text

7th Place Solution for Freesound Audio Tagging 2019 Tokyo BISH Bash #01 Hitachi, Ltd. Social Infrastructure Information Systems Division 2/26/2020 Tatsuya Uratani

Slide 2

Slide 2 text

© Hitachi, Ltd. 2020. All rights reserved. 7th Place Solution for Freesound Audio Tagging 2019

Slide 3

Slide 3 text

© Hitachi, Ltd. 2020. All rights reserved. Hello! I am Tatsuya Uratani You can find me at @uratatsu on kaggle. 3

Slide 4

Slide 4 text

© Hitachi, Ltd. 2020. All rights reserved. Uratatsu Major : Evolutionary psychology, Evolutionary Ecology Occupation : Data Scientist 4

Slide 5

Slide 5 text

© Hitachi, Ltd. 2020. All rights reserved. Team Shirogane 5 @kaerururu ● NLP ML engineer ● Kaggle Master @Hidehisa Arai ● The University of Tokyo ● Kaggle Master

Slide 6

Slide 6 text

© Hitachi, Ltd. 2020. All rights reserved. Competition Overview Welcome to the kaggle competition. 1

Slide 7

Slide 7 text

© Hitachi, Ltd. 2020. All rights reserved. Freesound Audio Tagging 2019 7 Task ● 80 class multilabel classification of audio data ● Clean data and noisy data from different sources

Slide 8

Slide 8 text

© Hitachi, Ltd. 2020. All rights reserved. Competition Rules ● GPU Kernel < 1 hour run-time (Inference only) ● External data and pre-trained models are not allowed ● 2 stage kernel competition Light models ensemble or Heavy single model TTA or Ensemble Faster preprocess 8

Slide 9

Slide 9 text

© Hitachi, Ltd. 2020. All rights reserved. Competition Points ● Image classification when converted to spectrogram. ● Many preprocess parameters ● Various length inputs ● Multi class Multi label ● Clean(4970) and Noisy(19815) datasets 9

Slide 10

Slide 10 text

© Hitachi, Ltd. 2020. All rights reserved. Basic Solution 10 Convolutional Neural Network Convert to Melspectrogram 128 * 128 Random Crop Various Length 80 classes

Slide 11

Slide 11 text

© Hitachi, Ltd. 2020. All rights reserved. Sound Sample 11 Accordion Church_bell Female_singing Sigh Motorcycle Traffic_noise_and_roadway_noise Meow Bathtub_(filling_or_washing)

Slide 12

Slide 12 text

© Hitachi, Ltd. 2020. All rights reserved. 7th Place Solution ~Shirogane Solution~ 2

Slide 13

Slide 13 text

© Hitachi, Ltd. 2020. All rights reserved. 13 Inception 1ch LB 0.720↑ CustomCNN LB 0.720↑ Inception 3ch LB 0.724↑ Noisy Log Mel Spectrogram CV :0.01 up LB :0.005 up Augmentation in batch ・mix up ・Random Resized Crop (Inception only) ・HorizontalFlip Data Augmentation ・pitch Data Augmentation ・fade ・treble & bass ・pitch ・equalize ・reverb 20tta Predict 20tta 20tta Blending CV :0.01 up Pretrain Inception 1ch Inception 3ch CustomCNN weight Curated Log Mel Spectrogram 128 * 128 Strength Crop 128 * 128 Strength Crop Shirogane Solution Augmentation in batch ・mix up ・Random Erasing ・Coarse Dropout ・Random Resized Crop (Inception only) ・HorizontalFlip

Slide 14

Slide 14 text

© Hitachi, Ltd. 2020. All rights reserved. Data Augmentation with SoX ● Fade ● Pitch ● Reverb 14 Point 1 ● Treble & Bass ● Equalize

Slide 15

Slide 15 text

© Hitachi, Ltd. 2020. All rights reserved. Strength Adaptive Crop ● Crop by distribution of strength of db Why we noticed ? 15 Point 2

Slide 16

Slide 16 text

© Hitachi, Ltd. 2020. All rights reserved. Custom Convolution Network ● Custom pooling layer 16 Point 3 def forward(self, x): x = x.view(x.size(0), 1, x.size(1), x.size(2)) x = self.conv1(x, pool_size=(1, 1), pool_type="both") x = self.conv2(x, pool_size=(4, 1), pool_type="both") x = self.conv3(x, pool_size=(1, 3), pool_type="both") x = self.conv4(x, pool_size=(4, 1), pool_type="both") x = self.conv5(x, pool_size=(1, 3), pool_type="both") ……………………………………………………………………………… elif pool_type == "both": x1 = F.max_pool2d(x, kernel_size=pool_size) x2 = F.avg_pool2d(x, kernel_size=pool_size) x = x1 + x2

Slide 17

Slide 17 text

© Hitachi, Ltd. 2020. All rights reserved. Inception with Random Resized Crop ● We use Inception-V3 model ● Default Random Resized Crop ● Low score in single fold → score jumping up in 5 folds 17 Point 4 torchvision.transforms.RandomResi zedCrop( size, scale=(0.08, 1.0), ratio=(0.75,1.3333333333333333), interpolation=2)

Slide 18

Slide 18 text

© Hitachi, Ltd. 2020. All rights reserved. Other’s Solution ~Other Top Solution~ 3

Slide 19

Slide 19 text

© Hitachi, Ltd. 2020. All rights reserved. 19 Other’s Solution ● Melspectrogram Layer (2nd) They use it to search the hyperparameter of log mel end 2 end ● SpecMix (8th) SpecAugment SpecMix

Slide 20

Slide 20 text

© Hitachi, Ltd. 2020. All rights reserved. 20 Other’s Solution ● Multitask learning with noisy labels. (4th) ● Semi-supervised learning(SSL) with noisy data. (4th) ● Resnet34 + Envnet-v2 (4th)

Slide 21

Slide 21 text

© Hitachi, Ltd. 2020. All rights reserved. How we work ~Unsophisticated work~ 4

Slide 22

Slide 22 text

© Hitachi, Ltd. 2020. All rights reserved. Resources Tools Discussion :Slack To Do: Trello Code : Kaggle Kernel Our Experience First pure image competition for all members. All Kaggle expert 22 Machine Resources GPU : P100 * 7 → P100 * 4 (Kaggle Kernel)

Slide 23

Slide 23 text

© Hitachi, Ltd. 2020. All rights reserved. Our Policy Try every methods in Discussion/ Public Kernel We investigated all discussion and high score kernel. We try most methods. 23 Number of experiments first (Not jobs, Not research) Understanding method is also important, but it is more important not to stop GPUs during the competition.

Slide 24

Slide 24 text

© Hitachi, Ltd. 2020. All rights reserved. Unsophisticated Works View and Listen many data Some sounds are very dirty. Viewing and listening the data is most important for making hypothesis. 24 Try many famous architecture We try many famous architecture. (ex. Resnet, Densenet, Wideresnet, resnext, se-resnext...)

Slide 25

Slide 25 text

KaggleはGoogle LLCの商標です FreesoundはBeijing Xiaoniao Tingting Technology Co., LTD.の商標です SoX http://sox.sourceforge.net/