Slide 31
Slide 31 text
Our approach:
Self-attention based weak supervised method
› Self-attention (Transformer); outstanding performance in various fields (NLP, ASR,,,)
› First application to this field [Miyazaki*+,2020] *LINE summer internship 2019
› Can capture global information effectively
Multi-head
Self-attention
Sound input
Time
Frequency
Sound
Classifier
Weak label
estimation
Neural Feature
Extraction
Stacked
transformer encoder
Feed
Forward
Sound
Classifier
Recognition
results
Special token
for weak label
× n times
Concat
CNN-based
Feature
extraction