Slide 1

Slide 1 text

© Hitachi, Ltd. 2021. All rights reserved. Introduction of DCASE 2021 Challenge Task 2 Tokyo BISH Bash #4, Mar. 30, 2021 Yohei Kawaguchi Hitachi, Ltd. R&D Group

Slide 2

Slide 2 text

1 © Hitachi, Ltd. 2021. All rights reserved. Special thanks to DCASE Challenge Task 2 co-organizers

Slide 3

Slide 3 text

2 © Hitachi, Ltd. 2021. All rights reserved. Looking Back on DCASE 2020 Task 2

Slide 4

Slide 4 text

3 © Hitachi, Ltd. 2021. All rights reserved. Task scope & applications  Machine condition monitoring ◼ Determine if a machine is normal or anomalous from sound Background photo created by fanjianhua - www.freepik.com https://www.freepik.com/photos/background [Koizumi et al, DCASE2020]

Slide 5

Slide 5 text

4 © Hitachi, Ltd. 2021. All rights reserved. Challenge & positioning  How can we detect anomalies without anomalous training data? Number of training samples of target events Massive Zero resource Few-shot Sound Event Detection Easy to collect ◼ brakes squeaking ◼ car ◼ children ◼ people speaking etc. Rare-Sound Event Detection Difficult to collect ◼ Baby crying ◼ Glass breaking ◼ Gunshot etc. Detecting “DEFINED“ sounds Detecting “UNKNOWN“ sounds Unsupervised Anomalous Sound Detection Impossible to collect Impossible to list exhaustive patterns of anomalies... We are here! [Koizumi et al, DCASE2020]

Slide 6

Slide 6 text

5 © Hitachi, Ltd. 2021. All rights reserved. Task setup in 2020  Dataset: ToyADMOS [Koizumi+, 2020] & MIMII [Purohit+, 2020]  Metrics: AUC & pAUC 6 machine types (4+3) machine IDs Around 1000 samples of 10 sec normal sounds Training data size: per each ID in total : 10 sec × 1000 samples ≒ 2.8h : 2.8h × 7 IDs × 6 types ≒ 116.7h Important to share training clips between different machines [Koizumi et al, DCASE2020]

Slide 7

Slide 7 text

6 © Hitachi, Ltd. 2021. All rights reserved. Results in 2020  117 submission from 40 teams ◼ Top five teams achieved consistently high scores in all machine types. ◼ Some teams achieved high scores on several machine types, but they dropped in ranks owing to relatively low Toy-conveyor scores. ◼ Some top rankers had very low AUC for some machine IDs, even though the average AUC was high. (Evaluation metric should be revised.) [Koizumi et al, DCASE2020]

Slide 8

Slide 8 text

7 © Hitachi, Ltd. 2021. All rights reserved. Top rankers’ solution  Outlier exposure-like ASD ◼ Classify a machine IDs instead of outlier detection dim 1 dim 2 dim 1 dim 2 dim 1 dim 2 Baseline system Anomaly simulation approach Classification approach Developed by top rankers independently! Pros: Cons: Effective use of training data, resulting in high score Different machines are too similar. → False positives Different machines are too different. → False negatives Difficult to control it! [Koizumi et al, DCASE2020] Outlier exposure-like approach

Slide 9

Slide 9 text

8 © Hitachi, Ltd. 2021. All rights reserved. DCASE 2021 Task 2 Detail information: http://dcase.community/challenge2021/task-unsupervised-detection-of-anomalous-sounds

Slide 10

Slide 10 text

9 © Hitachi, Ltd. 2021. All rights reserved. New challenge in 2021: Domain shift  Normal conditions are not always constant. →Domain shift: Distribution of normal test data differs from training.  Seasonal and accidental variations ◼ Production demand changes. → Operation speed changes. ✓ e.g., 300-400 rpm for winter and 200-300 rpm for summer ◼ Environmental condition changes. ✓ e.g., SNR, noise from other machines Last year’s solutions will suffer from change. Winter Summer dim2 dim1

Slide 11

Slide 11 text

10 © Hitachi, Ltd. 2021. All rights reserved. Taxonomy in 2021 Fan Gearbox Pump Slide rail ToyCar ToyTrain Valve All machines 00 01 Section 02 03 04 05 Machine type Source domain Domain Target domain The section is a unit for calculating performance metrics and is almost identical to what was called "machine ID" in the 2020 version. 7 machine types Source domain: the original condition Target domain: another different condition e.g., operating speed, machine load, viscosity, heating temperature, environmental noise, SNR, etc.  Concept hierarchy

Slide 12

Slide 12 text

11 © Hitachi, Ltd. 2021. All rights reserved. Dataset in 2021 7 machine types (3+3) sections ✓ Training data in source and target domains contains 1000 and 3 clips, respectively. ✓ Each clip is a 10-second monaural wave file. Combination of the additional training and evaluation datasets is like the development dataset, but it does not contain normal/anomalous labels. 2 domains Mar. 1 Apr. 1 Jun. 1

Slide 13

Slide 13 text

12 © Hitachi, Ltd. 2021. All rights reserved. Task setup in 2021 Decision results must be submitted, but it will not be used for ranking You can know which domain each clip belongs to. Harmonic mean of AUCs and pAUCs over all machine types ✓ 1000 clips from source domain ✓ 3 clips from target domain  Similar to 2020, but with some differences

Slide 14

Slide 14 text

13 © Hitachi, Ltd. 2021. All rights reserved. Description of each section in development dataset (1/2) OK!  Many kinds of domain shifts are prepared.  The file name of each clip contains attribution information.

Slide 15

Slide 15 text

14 © Hitachi, Ltd. 2021. All rights reserved. Description of each section in development dataset (2/2)  Many kinds of domain shifts are prepared.  The file name of each clip contains attribution information. OK!

Slide 16

Slide 16 text

15 © Hitachi, Ltd. 2021. All rights reserved. You can add house rules freely! (1/2)  The challenge to the fine-tuning scenario is valuable.  Realistic restriction in many cases Rule Test data Available training data Freestyle (2021 official rule) source test ✓ source train data ✓ target train data target test ✓ source train data ✓ target train data Fine-tuning source test ✓ source train data target test ✓ target train data for fine-tuning without source train data Examples of house rules

Slide 17

Slide 17 text

16 © Hitachi, Ltd. 2021. All rights reserved. You can add house rules freely! (2/2)  Model generalization is also well worth it.  The more general, the more useful. Rule Restriction for models Freestyle (2021 official rule) Any number of models can be switched. 1 model per section Models cannot be switched/ fine-tuned in the same section. 1 model per machine type Models cannot be switched in the same machine type. 1 model for all machines Models cannot be switched. General Examples of house rules

Slide 18

Slide 18 text

17 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q1. There is no difference between normal and anomalous sounds. This is an annotation error, right? A1. It may be just a difficult clip for you. The annotations are based on if an anomaly really occur and independent of opinions of mechanic engineers. There is no guarantee that a human can find the difference. Don’t mind. (The goal of the 2021 task is NOT to imitate mechanic engineers.)

Slide 19

Slide 19 text

18 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q2. Can I use external public datasets or pre-trained models? A2. Yes. If you want to use them, please let us know by June 1. We will add them to the external resource list and publish the list to the web.

Slide 20

Slide 20 text

19 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q3. Can I use multiple test clips in the evaluation dataset for calculating anomaly scores? A3. No. The anomaly score for each clip must NOT be calculated using test clips other than that test clip and training clips. Q4. Can I use test clips in the evaluation dataset for parameter tuning? A4. No. Of course. Sorry. Typo fixed.

Slide 21

Slide 21 text

20 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q5. Do you have a plan to open attribution information also for the additional training dataset? A5. Yes. We will open it with the additional training dataset on April 1. Q6. Do you have a plan to publish the ground truth data? A6. Yes. The ground truth will be published after the challenge.

Slide 22

Slide 22 text

21 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q7. Do you have a plan to publish this slides? A7. Yes. Of course.

Slide 23

Slide 23 text

22 © Hitachi, Ltd. 2021. All rights reserved. Concluding remarks  Looking back DCASE 2020 Task 2 ◼ Unsupervised anomalous sound detection ◼ 117 submission from 40 teams ◼ New paradigm: Outlier exposure-like ASD ◼ Many things to be revised  DCASE 2021 Task 2 ◼ Domain shifts ◼ Metric: Harmonic mean of AUCs and pAUCs over all machines. ◼ Decision results must be submitted. ◼ Attribute information

Slide 24

Slide 24 text

23 © Hitachi, Ltd. 2021. All rights reserved. Job Opportunity Information

Slide 25

Slide 25 text

24 © Hitachi, Ltd. 2021. All rights reserved. Send your CV and research experiences (publication records, OSS contributions, product development, etc.) Address: [email protected] Subject line: AI Researcher Location Tokyo, Japan Hitachi Central Research Laboratory WE ARE HIRING AI RESEARCHERS Qualifications - PhD/MS in Computer Science - Expertise in acoustic scene classification, sound event detection, anomalous sound detection, speech recognition, voice activity detection, speech separation, speech enhancement, echo cancellation, dereverberation, speaker diarization, speaker recognition, microphone-array processing, voice conversion, text-to-speech, dialogue system, multimodal processing, or human-robot interaction.

Slide 26

Slide 26 text

25 © Hitachi, Ltd. 2021. All rights reserved. Hitachi Central Research Laboratory

Slide 27

Slide 27 text

26 © Hitachi, Ltd. 2021. All rights reserved. “DCASE” Tech. for Machine Sound Check Business Business targets: ➢ Predictive maintenance ➢ Product inspection 1. We develop signal processing and machine learning for anomalous sound detection. 2. We encourage open innovation through dataset release and challenge coordination. ➢ [Harsh+ DCASE2020] See Poster-10 ➢ [Suefusa+ ICASSP2020], [Harsh+ DCASE 2019], [Kawaguchi+ ICASSP2019], etc. ➢ MIMII Dataset [Harsh+ DCASE2019] ➢ DCASE2020 Challenge Task 2: [Koizumi+ DCASE2020] (collab with NTT and Doshisha Univ.) http://dcase.community/challenge2020/task-unsupervised-detection-of-anomalous-sounds OK NG !

Slide 28

Slide 28 text

27 © Hitachi, Ltd. 2021. All rights reserved. Speech Processing for Multi-speaker Conversation Our target applications ➢ Meeting support ➢ Human Machine Interface We focus on multi-speaker processing. ➢ [Fujita+ Interspeech’19, ASRU’19] [Horiguchi+ Interspeech’20] Our diarization detects overlapping speech. ➢ [Kanda+ CHiME-5 challenge] got 2nd place. Our speech recognition is robust under noisy and multi-speaker environment. ➢ See our AI-related publications from https://hitachi-speech.github.io/

Slide 29

Slide 29 text

No content