7th Place Solution for Freesound Audio Tagging 2019

Slide 1

Slide 1 text

7th Place Solution for Freesound Audio Tagging 2019 Tokyo BISH Bash #01 Hitachi, Ltd. Social Infrastructure Information Systems Division 2/26/2020 Tatsuya Uratani

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

© Hitachi, Ltd. 2020. All rights reserved. Competition Rules ● GPU Kernel < 1 hour run-time (Inference only) ● External data and pre-trained models are not allowed ● 2 stage kernel competition Light models ensemble or Heavy single model TTA or Ensemble Faster preprocess 8

Slide 9

Slide 9 text

© Hitachi, Ltd. 2020. All rights reserved. Competition Points ● Image classification when converted to spectrogram. ● Many preprocess parameters ● Various length inputs ● Multi class Multi label ● Clean(4970) and Noisy(19815) datasets 9

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

© Hitachi, Ltd. 2020. All rights reserved. 13 Inception 1ch LB 0.720↑ CustomCNN LB 0.720↑ Inception 3ch LB 0.724↑ Noisy Log Mel Spectrogram CV :0.01 up LB :0.005 up Augmentation in batch ・mix up ・Random Resized Crop (Inception only) ・HorizontalFlip Data Augmentation ・pitch Data Augmentation ・fade ・treble & bass ・pitch ・equalize ・reverb 20tta Predict 20tta 20tta Blending CV :0.01 up Pretrain Inception 1ch Inception 3ch CustomCNN weight Curated Log Mel Spectrogram 128 * 128 Strength Crop 128 * 128 Strength Crop Shirogane Solution Augmentation in batch ・mix up ・Random Erasing ・Coarse Dropout ・Random Resized Crop (Inception only) ・HorizontalFlip

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

© Hitachi, Ltd. 2020. All rights reserved. Custom Convolution Network ● Custom pooling layer 16 Point 3 def forward(self, x): x = x.view(x.size(0), 1, x.size(1), x.size(2)) x = self.conv1(x, pool_size=(1, 1), pool_type="both") x = self.conv2(x, pool_size=(4, 1), pool_type="both") x = self.conv3(x, pool_size=(1, 3), pool_type="both") x = self.conv4(x, pool_size=(4, 1), pool_type="both") x = self.conv5(x, pool_size=(1, 3), pool_type="both") ……………………………………………………………………………… elif pool_type == "both": x1 = F.max_pool2d(x, kernel_size=pool_size) x2 = F.avg_pool2d(x, kernel_size=pool_size) x = x1 + x2

Slide 17

Slide 17 text

© Hitachi, Ltd. 2020. All rights reserved. Inception with Random Resized Crop ● We use Inception-V3 model ● Default Random Resized Crop ● Low score in single fold → score jumping up in 5 folds 17 Point 4 torchvision.transforms.RandomResi zedCrop( size, scale=(0.08, 1.0), ratio=(0.75,1.3333333333333333), interpolation=2)

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

© Hitachi, Ltd. 2020. All rights reserved. Resources Tools Discussion :Slack To Do: Trello Code : Kaggle Kernel Our Experience First pure image competition for all members. All Kaggle expert 22 Machine Resources GPU : P100 * 7 → P100 * 4 (Kaggle Kernel)

Slide 23

Slide 23 text

© Hitachi, Ltd. 2020. All rights reserved. Our Policy Try every methods in Discussion/ Public Kernel We investigated all discussion and high score kernel. We try most methods. 23 Number of experiments first (Not jobs, Not research) Understanding method is also important, but it is more important not to stop GPUs during the competition.

Slide 24

Slide 24 text

© Hitachi, Ltd. 2020. All rights reserved. Unsophisticated Works View and Listen many data Some sounds are very dirty. Viewing and listening the data is most important for making hypothesis. 24 Try many famous architecture We try many famous architecture. (ex. Resnet, Densenet, Wideresnet, resnext, se-resnext...)

Slide 25

Slide 25 text

KaggleはGoogle LLCの商標です FreesoundはBeijing Xiaoniao Tingting Technology Co., LTD.の商標です SoX http://sox.sourceforge.net/