Slide 1
Slide 1 text
1D Features
Resources:
TITAN RTX, 1080Ti x 2, 2080Ti x 2 *1
Cornell Birdcall Identification
Kaggle
XenoCant
Extended
XenoCant*2
Copyright 2020 @ Maxwell_110
*1 My room became tropical.
*2 Credit to Vopani
*3 Kerneler-kun had become
a notebook expert! Check
his profile.
Feature Extraction
Training / Prediction
BirdVox
ff1010
fs2019
ESC50
264 birds
nocall
Trim silent parts
(librosa.effects.trim)
Load Audio
(librosa.load)
Resample with
22.05 kHz
remove only
silent start/end parts
Log Mel Spectrogram (2D)
( librosa.feature.melspectrogram
librosa.core.power_to_db )
Audio Data (1D)
5 - 10 (s) variable audio length
64 nmel, 10ms hop, 80ms sfft
Event aware extraction
5 (s) constant audio length
Event aware extraction
64
500 - 1000
5 x 22050
1
Augmentation
p: 0.5
width / height shift:
0.2 / 0.1
Scale: -0.05 / +0.05
2
Random Eraser
p: 0.5
erase num: 1
width: [0, 0.1]
height: [0.1, 0.3]
fill with -1
Standardize
[- 1, + 1]
To 2D Models
To 1D Models
p: 0.5
width shift: 0.2
NoiseInjection
Augmentation Standardize
[- 1, + 1]
cut out
at random
2D
Features
ResNet 18
nocall
Binary Model
(call / nocall )
3
4 Multi-Label Model (264 types)
Multi-Task-Learning (MTL) for primary and noisy background labels
call
2D Models
2048
nodes
+
BN
+
ReLu
512
nodes
+
BN
+
ReLu
2D: GAP
/
1D: MAP*4
2048
nodes
+
BN
+
ReLu
512
nodes
+
BN
+
ReLu
2D: GAP
/
1D: MAP*4
264
nodes
primary
labels
264
nodes
background
labels
ResNet 18
2D Models
MTL
Loss
3 stage scratch learning
1. primary only
-
,
= [, ]
- 2D: 200 epochs + Early Stopping (ES)
Adam, CyclicLR 1e-4 ~ e-3
- 1D: 100 epochs + ES
SGD, CosineAnnealing 1e-1 ~ e-6
=> Adam, ReduceLROnP 1e-4
2. + background
-
,
= [, ]
- Adam (5e-5)
- ReduceLROnP (x 0.25)
3. + Psuedo Labeling
-
,
= [, ]
- Adam (5e-5)
- ReduceLROnP (x 0.25)
- Predictions of backgrounds more
than 0.15 are added to primary labels
as soft labels in primary branch. All
values are clipped between 0 and 1.
primary branch
background branch
PANNs 1D
1D Model
Blending
Public: 36 th (0.623)
Private: 39 th (0.580)
*3
• Class-Wise Blending
- For each bird class
- Optimize blend coefficients with BCE loss
• Class-Wise threshold optimization
- For each bird class
- Maximize macro-F1 (not sample-wise)
• 3 Epoch ensemble for PANNs 1D
• 5 Fold ensemble for ResNet18 , PANNs 1D
Model Architecture
*4 mean on time axis,
max and mean on freq axis