Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction of DCASE 2021 Challenge Task 2 / dcase2021task2

y-kawagu
April 06, 2021

Introduction of DCASE 2021 Challenge Task 2 / dcase2021task2

Rev. 2: In page 19, typo of an answer (A3) fixed.

y-kawagu

April 06, 2021
Tweet

More Decks by y-kawagu

Other Decks in Research

Transcript

  1. © Hitachi, Ltd. 2021. All rights reserved. Introduction of DCASE

    2021 Challenge Task 2 Tokyo BISH Bash #4, Mar. 30, 2021 Yohei Kawaguchi Hitachi, Ltd. R&D Group
  2. 1 © Hitachi, Ltd. 2021. All rights reserved. Special thanks

    to DCASE Challenge Task 2 co-organizers
  3. 3 © Hitachi, Ltd. 2021. All rights reserved. Task scope

    & applications  Machine condition monitoring ◼ Determine if a machine is normal or anomalous from sound Background photo created by fanjianhua - www.freepik.com https://www.freepik.com/photos/background [Koizumi et al, DCASE2020]
  4. 4 © Hitachi, Ltd. 2021. All rights reserved. Challenge &

    positioning  How can we detect anomalies without anomalous training data? Number of training samples of target events Massive Zero resource Few-shot Sound Event Detection <Regular task> Easy to collect ◼ brakes squeaking ◼ car ◼ children ◼ people speaking etc. Rare-Sound Event Detection <DCASE 2017 challenge> Difficult to collect ◼ Baby crying ◼ Glass breaking ◼ Gunshot etc. Detecting “DEFINED“ sounds Detecting “UNKNOWN“ sounds Unsupervised Anomalous Sound Detection <DCASE 2020 challenge> Impossible to collect Impossible to list exhaustive patterns of anomalies... We are here! [Koizumi et al, DCASE2020]
  5. 5 © Hitachi, Ltd. 2021. All rights reserved. Task setup

    in 2020  Dataset: ToyADMOS [Koizumi+, 2020] & MIMII [Purohit+, 2020]  Metrics: AUC & pAUC 6 machine types (4+3) machine IDs Around 1000 samples of 10 sec normal sounds Training data size: per each ID in total : 10 sec × 1000 samples ≒ 2.8h : 2.8h × 7 IDs × 6 types ≒ 116.7h Important to share training clips between different machines [Koizumi et al, DCASE2020]
  6. 6 © Hitachi, Ltd. 2021. All rights reserved. Results in

    2020  117 submission from 40 teams ◼ Top five teams achieved consistently high scores in all machine types. ◼ Some teams achieved high scores on several machine types, but they dropped in ranks owing to relatively low Toy-conveyor scores. ◼ Some top rankers had very low AUC for some machine IDs, even though the average AUC was high. (Evaluation metric should be revised.) [Koizumi et al, DCASE2020]
  7. 7 © Hitachi, Ltd. 2021. All rights reserved. Top rankers’

    solution  Outlier exposure-like ASD ◼ Classify a machine IDs instead of outlier detection dim 1 dim 2 dim 1 dim 2 dim 1 dim 2 Baseline system Anomaly simulation approach Classification approach Developed by top rankers independently! Pros: Cons: Effective use of training data, resulting in high score Different machines are too similar. → False positives Different machines are too different. → False negatives Difficult to control it! [Koizumi et al, DCASE2020] Outlier exposure-like approach
  8. 8 © Hitachi, Ltd. 2021. All rights reserved. DCASE 2021

    Task 2 Detail information: http://dcase.community/challenge2021/task-unsupervised-detection-of-anomalous-sounds
  9. 9 © Hitachi, Ltd. 2021. All rights reserved. New challenge

    in 2021: Domain shift  Normal conditions are not always constant. →Domain shift: Distribution of normal test data differs from training.  Seasonal and accidental variations ◼ Production demand changes. → Operation speed changes. ✓ e.g., 300-400 rpm for winter and 200-300 rpm for summer ◼ Environmental condition changes. ✓ e.g., SNR, noise from other machines Last year’s solutions will suffer from change. Winter Summer dim2 dim1
  10. 10 © Hitachi, Ltd. 2021. All rights reserved. Taxonomy in

    2021 Fan Gearbox Pump Slide rail ToyCar ToyTrain Valve All machines 00 01 Section 02 03 04 05 Machine type Source domain Domain Target domain The section is a unit for calculating performance metrics and is almost identical to what was called "machine ID" in the 2020 version. 7 machine types Source domain: the original condition Target domain: another different condition e.g., operating speed, machine load, viscosity, heating temperature, environmental noise, SNR, etc.  Concept hierarchy
  11. 11 © Hitachi, Ltd. 2021. All rights reserved. Dataset in

    2021 7 machine types (3+3) sections ✓ Training data in source and target domains contains 1000 and 3 clips, respectively. ✓ Each clip is a 10-second monaural wave file. Combination of the additional training and evaluation datasets is like the development dataset, but it does not contain normal/anomalous labels. 2 domains Mar. 1 Apr. 1 Jun. 1
  12. 12 © Hitachi, Ltd. 2021. All rights reserved. Task setup

    in 2021 Decision results must be submitted, but it will not be used for ranking You can know which domain each clip belongs to. Harmonic mean of AUCs and pAUCs over all machine types ✓ 1000 clips from source domain ✓ 3 clips from target domain  Similar to 2020, but with some differences
  13. 13 © Hitachi, Ltd. 2021. All rights reserved. Description of

    each section in development dataset (1/2) OK!  Many kinds of domain shifts are prepared.  The file name of each clip contains attribution information.
  14. 14 © Hitachi, Ltd. 2021. All rights reserved. Description of

    each section in development dataset (2/2)  Many kinds of domain shifts are prepared.  The file name of each clip contains attribution information. OK!
  15. 15 © Hitachi, Ltd. 2021. All rights reserved. You can

    add house rules freely! (1/2)  The challenge to the fine-tuning scenario is valuable.  Realistic restriction in many cases Rule Test data Available training data Freestyle (2021 official rule) source test ✓ source train data ✓ target train data target test ✓ source train data ✓ target train data Fine-tuning source test ✓ source train data target test ✓ target train data for fine-tuning without source train data Examples of house rules
  16. 16 © Hitachi, Ltd. 2021. All rights reserved. You can

    add house rules freely! (2/2)  Model generalization is also well worth it.  The more general, the more useful. Rule Restriction for models Freestyle (2021 official rule) Any number of models can be switched. 1 model per section Models cannot be switched/ fine-tuned in the same section. 1 model per machine type Models cannot be switched in the same machine type. 1 model for all machines Models cannot be switched. General Examples of house rules
  17. 17 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q1.

    There is no difference between normal and anomalous sounds. This is an annotation error, right? A1. It may be just a difficult clip for you. The annotations are based on if an anomaly really occur and independent of opinions of mechanic engineers. There is no guarantee that a human can find the difference. Don’t mind. (The goal of the 2021 task is NOT to imitate mechanic engineers.)
  18. 18 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q2.

    Can I use external public datasets or pre-trained models? A2. Yes. If you want to use them, please let us know by June 1. We will add them to the external resource list and publish the list to the web.
  19. 19 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q3.

    Can I use multiple test clips in the evaluation dataset for calculating anomaly scores? A3. No. The anomaly score for each clip must NOT be calculated using test clips other than that test clip and training clips. Q4. Can I use test clips in the evaluation dataset for parameter tuning? A4. No. Of course. Sorry. Typo fixed.
  20. 20 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q5.

    Do you have a plan to open attribution information also for the additional training dataset? A5. Yes. We will open it with the additional training dataset on April 1. Q6. Do you have a plan to publish the ground truth data? A6. Yes. The ground truth will be published after the challenge.
  21. 21 © Hitachi, Ltd. 2021. All rights reserved. FAQ Q7.

    Do you have a plan to publish this slides? A7. Yes. Of course.
  22. 22 © Hitachi, Ltd. 2021. All rights reserved. Concluding remarks

     Looking back DCASE 2020 Task 2 ◼ Unsupervised anomalous sound detection ◼ 117 submission from 40 teams ◼ New paradigm: Outlier exposure-like ASD ◼ Many things to be revised  DCASE 2021 Task 2 ◼ Domain shifts ◼ Metric: Harmonic mean of AUCs and pAUCs over all machines. ◼ Decision results must be submitted. ◼ Attribute information
  23. 24 © Hitachi, Ltd. 2021. All rights reserved. Send your

    CV and research experiences (publication records, OSS contributions, product development, etc.) Address: [email protected] Subject line: AI Researcher Location Tokyo, Japan Hitachi Central Research Laboratory WE ARE HIRING AI RESEARCHERS Qualifications - PhD/MS in Computer Science - Expertise in acoustic scene classification, sound event detection, anomalous sound detection, speech recognition, voice activity detection, speech separation, speech enhancement, echo cancellation, dereverberation, speaker diarization, speaker recognition, microphone-array processing, voice conversion, text-to-speech, dialogue system, multimodal processing, or human-robot interaction.
  24. 26 © Hitachi, Ltd. 2021. All rights reserved. “DCASE” Tech.

    for Machine Sound Check Business Business targets: ➢ Predictive maintenance ➢ Product inspection 1. We develop signal processing and machine learning for anomalous sound detection. 2. We encourage open innovation through dataset release and challenge coordination. ➢ [Harsh+ DCASE2020] See Poster-10 ➢ [Suefusa+ ICASSP2020], [Harsh+ DCASE 2019], [Kawaguchi+ ICASSP2019], etc. ➢ MIMII Dataset [Harsh+ DCASE2019] ➢ DCASE2020 Challenge Task 2: [Koizumi+ DCASE2020] (collab with NTT and Doshisha Univ.) http://dcase.community/challenge2020/task-unsupervised-detection-of-anomalous-sounds OK NG !
  25. 27 © Hitachi, Ltd. 2021. All rights reserved. Speech Processing

    for Multi-speaker Conversation Our target applications ➢ Meeting support ➢ Human Machine Interface We focus on multi-speaker processing. ➢ [Fujita+ Interspeech’19, ASRU’19] [Horiguchi+ Interspeech’20] Our diarization detects overlapping speech. ➢ [Kanda+ CHiME-5 challenge] got 2nd place. Our speech recognition is robust under noisy and multi-speaker environment. ➢ See our AI-related publications from https://hitachi-speech.github.io/