Introduction of DCASE 2021 Challenge Task 2 / dcase2021task2

© Hitachi, Ltd. 2021. All rights reserved. Introduction of DCASE
2021 Challenge Task 2 Tokyo BISH Bash #4, Mar. 30, 2021 Yohei Kawaguchi Hitachi, Ltd. R&D Group

1 © Hitachi, Ltd. 2021. All rights reserved. Special thanks
to DCASE Challenge Task 2 co-organizers

2 © Hitachi, Ltd. 2021. All rights reserved. Looking Back
on DCASE 2020 Task 2

3 © Hitachi, Ltd. 2021. All rights reserved. Task scope
& applications  Machine condition monitoring ◼ Determine if a machine is normal or anomalous from sound Background photo created by fanjianhua - www.freepik.com https://www.freepik.com/photos/background [Koizumi et al, DCASE2020]

4 © Hitachi, Ltd. 2021. All rights reserved. Challenge &
positioning  How can we detect anomalies without anomalous training data? Number of training samples of target events Massive Zero resource Few-shot Sound Event Detection <Regular task> Easy to collect ◼ brakes squeaking ◼ car ◼ children ◼ people speaking etc. Rare-Sound Event Detection <DCASE 2017 challenge> Difficult to collect ◼ Baby crying ◼ Glass breaking ◼ Gunshot etc. Detecting “DEFINED“ sounds Detecting “UNKNOWN“ sounds Unsupervised Anomalous Sound Detection <DCASE 2020 challenge> Impossible to collect Impossible to list exhaustive patterns of anomalies... We are here! [Koizumi et al, DCASE2020]

5 © Hitachi, Ltd. 2021. All rights reserved. Task setup
in 2020  Dataset: ToyADMOS [Koizumi+, 2020] & MIMII [Purohit+, 2020]  Metrics: AUC & pAUC 6 machine types (4+3) machine IDs Around 1000 samples of 10 sec normal sounds Training data size: per each ID in total : 10 sec × 1000 samples ≒ 2.8h : 2.8h × 7 IDs × 6 types ≒ 116.7h Important to share training clips between different machines [Koizumi et al, DCASE2020]

6 © Hitachi, Ltd. 2021. All rights reserved. Results in
2020  117 submission from 40 teams ◼ Top five teams achieved consistently high scores in all machine types. ◼ Some teams achieved high scores on several machine types, but they dropped in ranks owing to relatively low Toy-conveyor scores. ◼ Some top rankers had very low AUC for some machine IDs, even though the average AUC was high. (Evaluation metric should be revised.) [Koizumi et al, DCASE2020]

7 © Hitachi, Ltd. 2021. All rights reserved. Top rankers’
solution  Outlier exposure-like ASD ◼ Classify a machine IDs instead of outlier detection dim 1 dim 2 dim 1 dim 2 dim 1 dim 2 Baseline system Anomaly simulation approach Classification approach Developed by top rankers independently! Pros: Cons: Effective use of training data, resulting in high score Different machines are too similar. → False positives Different machines are too different. → False negatives Difficult to control it! [Koizumi et al, DCASE2020] Outlier exposure-like approach

8 © Hitachi, Ltd. 2021. All rights reserved. DCASE 2021
Task 2 Detail information: http://dcase.community/challenge2021/task-unsupervised-detection-of-anomalous-sounds

9 © Hitachi, Ltd. 2021. All rights reserved. New challenge
in 2021: Domain shift  Normal conditions are not always constant. →Domain shift: Distribution of normal test data differs from training.  Seasonal and accidental variations ◼ Production demand changes. → Operation speed changes. ✓ e.g., 300-400 rpm for winter and 200-300 rpm for summer ◼ Environmental condition changes. ✓ e.g., SNR, noise from other machines Last year’s solutions will suffer from change. Winter Summer dim2 dim1

10 © Hitachi, Ltd. 2021. All rights reserved. Taxonomy in
2021 Fan Gearbox Pump Slide rail ToyCar ToyTrain Valve All machines 00 01 Section 02 03 04 05 Machine type Source domain Domain Target domain The section is a unit for calculating performance metrics and is almost identical to what was called "machine ID" in the 2020 version. 7 machine types Source domain: the original condition Target domain: another different condition e.g., operating speed, machine load, viscosity, heating temperature, environmental noise, SNR, etc.  Concept hierarchy

11 © Hitachi, Ltd. 2021. All rights reserved. Dataset in
2021 7 machine types (3+3) sections ✓ Training data in source and target domains contains 1000 and 3 clips, respectively. ✓ Each clip is a 10-second monaural wave file. Combination of the additional training and evaluation datasets is like the development dataset, but it does not contain normal/anomalous labels. 2 domains Mar. 1 Apr. 1 Jun. 1

12 © Hitachi, Ltd. 2021. All rights reserved. Task setup
in 2021 Decision results must be submitted, but it will not be used for ranking You can know which domain each clip belongs to. Harmonic mean of AUCs and pAUCs over all machine types ✓ 1000 clips from source domain ✓ 3 clips from target domain  Similar to 2020, but with some differences

13 © Hitachi, Ltd. 2021. All rights reserved. Description of
each section in development dataset (1/2) OK!  Many kinds of domain shifts are prepared.  The file name of each clip contains attribution information.

14 © Hitachi, Ltd. 2021. All rights reserved. Description of
each section in development dataset (2/2)  Many kinds of domain shifts are prepared.  The file name of each clip contains attribution information. OK!

15 © Hitachi, Ltd. 2021. All rights reserved. You can
add house rules freely! (1/2)  The challenge to the fine-tuning scenario is valuable.  Realistic restriction in many cases Rule Test data Available training data Freestyle (2021 official rule) source test ✓ source train data ✓ target train data target test ✓ source train data ✓ target train data Fine-tuning source test ✓ source train data target test ✓ target train data for fine-tuning without source train data Examples of house rules

16 © Hitachi, Ltd. 2021. All rights reserved. You can
add house rules freely! (2/2)  Model generalization is also well worth it.  The more general, the more useful. Rule Restriction for models Freestyle (2021 official rule) Any number of models can be switched. 1 model per section Models cannot be switched/ fine-tuned in the same section. 1 model per machine type Models cannot be switched in the same machine type. 1 model for all machines Models cannot be switched. General Examples of house rules

17 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q1.
There is no difference between normal and anomalous sounds. This is an annotation error, right? A1. It may be just a difficult clip for you. The annotations are based on if an anomaly really occur and independent of opinions of mechanic engineers. There is no guarantee that a human can find the difference. Don’t mind. (The goal of the 2021 task is NOT to imitate mechanic engineers.)

Can I use external public datasets or pre-trained models? A2. Yes. If you want to use them, please let us know by June 1. We will add them to the external resource list and publish the list to the web.

Can I use multiple test clips in the evaluation dataset for calculating anomaly scores? A3. No. The anomaly score for each clip must NOT be calculated using test clips other than that test clip and training clips. Q4. Can I use test clips in the evaluation dataset for parameter tuning? A4. No. Of course. Sorry. Typo fixed.

Do you have a plan to open attribution information also for the additional training dataset? A5. Yes. We will open it with the additional training dataset on April 1. Q6. Do you have a plan to publish the ground truth data? A6. Yes. The ground truth will be published after the challenge.

Do you have a plan to publish this slides? A7. Yes. Of course.

22 © Hitachi, Ltd. 2021. All rights reserved. Concluding remarks
 Looking back DCASE 2020 Task 2 ◼ Unsupervised anomalous sound detection ◼ 117 submission from 40 teams ◼ New paradigm: Outlier exposure-like ASD ◼ Many things to be revised  DCASE 2021 Task 2 ◼ Domain shifts ◼ Metric: Harmonic mean of AUCs and pAUCs over all machines. ◼ Decision results must be submitted. ◼ Attribute information

24 © Hitachi, Ltd. 2021. All rights reserved. Send your
CV and research experiences (publication records, OSS contributions, product development, etc.) Address: [email protected] Subject line: AI Researcher Location Tokyo, Japan Hitachi Central Research Laboratory WE ARE HIRING AI RESEARCHERS Qualifications - PhD/MS in Computer Science - Expertise in acoustic scene classification, sound event detection, anomalous sound detection, speech recognition, voice activity detection, speech separation, speech enhancement, echo cancellation, dereverberation, speaker diarization, speaker recognition, microphone-array processing, voice conversion, text-to-speech, dialogue system, multimodal processing, or human-robot interaction.

26 © Hitachi, Ltd. 2021. All rights reserved. “DCASE” Tech.
for Machine Sound Check Business Business targets: ➢ Predictive maintenance ➢ Product inspection 1. We develop signal processing and machine learning for anomalous sound detection. 2. We encourage open innovation through dataset release and challenge coordination. ➢ [Harsh+ DCASE2020] See Poster-10 ➢ [Suefusa+ ICASSP2020], [Harsh+ DCASE 2019], [Kawaguchi+ ICASSP2019], etc. ➢ MIMII Dataset [Harsh+ DCASE2019] ➢ DCASE2020 Challenge Task 2: [Koizumi+ DCASE2020] (collab with NTT and Doshisha Univ.) http://dcase.community/challenge2020/task-unsupervised-detection-of-anomalous-sounds OK NG !

27 © Hitachi, Ltd. 2021. All rights reserved. Speech Processing
for Multi-speaker Conversation Our target applications ➢ Meeting support ➢ Human Machine Interface We focus on multi-speaker processing. ➢ [Fujita+ Interspeech’19, ASRU’19] [Horiguchi+ Interspeech’20] Our diarization detects overlapping speech. ➢ [Kanda+ CHiME-5 challenge] got 2nd place. Our speech recognition is robust under noisy and multi-speaker environment. ➢ See our AI-related publications from https://hitachi-speech.github.io/

Introduction of DCASE 2021 Challenge Task 2 / d...

Introduction of DCASE 2021 Challenge Task 2 / dcase2021task2

y-kawagu

More Decks by y-kawagu

Other Decks in Research

Featured

Transcript

© Hitachi, Ltd. 2021. All rights reserved. Introduction of DCASE

1 © Hitachi, Ltd. 2021. All rights reserved. Special thanks

2 © Hitachi, Ltd. 2021. All rights reserved. Looking Back

3 © Hitachi, Ltd. 2021. All rights reserved. Task scope

4 © Hitachi, Ltd. 2021. All rights reserved. Challenge &

5 © Hitachi, Ltd. 2021. All rights reserved. Task setup

6 © Hitachi, Ltd. 2021. All rights reserved. Results in

7 © Hitachi, Ltd. 2021. All rights reserved. Top rankers’

8 © Hitachi, Ltd. 2021. All rights reserved. DCASE 2021

9 © Hitachi, Ltd. 2021. All rights reserved. New challenge

10 © Hitachi, Ltd. 2021. All rights reserved. Taxonomy in

11 © Hitachi, Ltd. 2021. All rights reserved. Dataset in

12 © Hitachi, Ltd. 2021. All rights reserved. Task setup

13 © Hitachi, Ltd. 2021. All rights reserved. Description of

14 © Hitachi, Ltd. 2021. All rights reserved. Description of

15 © Hitachi, Ltd. 2021. All rights reserved. You can

16 © Hitachi, Ltd. 2021. All rights reserved. You can

17 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q1.

18 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q2.

19 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q3.

20 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q5.

21 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q7.

22 © Hitachi, Ltd. 2021. All rights reserved. Concluding remarks

23 © Hitachi, Ltd. 2021. All rights reserved. Job Opportunity

24 © Hitachi, Ltd. 2021. All rights reserved. Send your

25 © Hitachi, Ltd. 2021. All rights reserved. Hitachi Central

26 © Hitachi, Ltd. 2021. All rights reserved. “DCASE” Tech.

27 © Hitachi, Ltd. 2021. All rights reserved. Speech Processing