Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Research introduction of anomalous sound detect...

Research introduction of anomalous sound detection for real-world scenarios

Avatar for Takuya Fujimura

Takuya Fujimura

October 28, 2024
Tweet

Other Decks in Research

Transcript

  1. Takuya Fujimura (Nagoya University) Research introduction of anomalous sound detection

    for real-world scenarios Technical seminar on anomalous sound detection@Doshisha University 2024/10/28
  2. 1/49 Self-introduction Takuya Fujimura ◼1st year Ph.D. student@Nagoya University My

    research topics ◼Anomalous sound detection ◼(Unsupervised) speech enhancement Our ASD team ◼Prof. Tomoki Toda ◼Mr. Ibuki Kuroyanagi (2nd year Ph.D. student) ◼Me Co-researcher ◼Assoc. Prof. Keisuke Imoto Me Prof. Toda Mr. Kuroyanagi
  3. 2/49 Research introduction of anomalous sound detection for real-world scenarios

    I’ll be talking mainly about my recent work, which has focused on practical challenges. T. Fujimura, K. Imoto, T. Toda, "Discriminative neighborhood smoothing for generative anomalous sound detection," Proc. EUSIPCO, Aug. 2024. [arxiv: https://arxiv.org/abs/2403.11508] T. Fujimura, I. Kuroyanagi, T. Toda, "Improvements of Discriminative Feature Space Training for Anomalous Sound Detection in Unlabeled Conditions,” arXiv e-prints: 2409.09332, 2024. [arxiv: https://arxiv.org/abs/2409.09332]
  4. 3/49 Outline Introduction of ASD task ◼Problem settings ◼Two basic

    approaches: Generative and Discriminative Practical challenge 1: Instability ◼“Discriminative neighborhood smoothing for generative anomalous sound detection” Practical challenge 2: Data collection (annotation) costs ◼“Improvements of Discriminative Feature Space Training for Anomalous Sound Detection in Unlabeled Conditions” Summary
  5. 4/49 Anomalous Sound Detection (ASD) Detecting anomalous behavior from machine

    sound ◼Calculate an anomaly score from machine sound ◼Detect anomalies based on the anomaly score Problem setting ◼Difficult to collect anomalous sound samples → Develop ASD systems using only normal sound data Basic approaches ◼Generative and Discriminative Normal or Anomalous Anomaly score ASD system Thresh- olding
  6. 5/49 Generative Approach ◼Directly construct a generative model for normal

    sounds in the audio feature domain ◼Anomaly score: Degree of deviation from the generative model Normal Obser- vation Anomaly score AE Anomalies are not accurately reconstructed because they are not in the training data AE Example: Autoencoder (AE) ◼AE is trained to reconstruct audio feature of normal sounds ◼Anomaly score: Reconstruction error of the observed sound
  7. 6/49 Discriminative Approach ◼Construct a discriminative model to classify differences

    in normal sounds using annotated labels ◼Anomaly score: Distance between observation and training data in the discriminative feature space (i.e., decrease in the posterior probability of the correct class) Normal Labels (e.g., machine types and operation parms.) Feature Extractor Valve (pat 02) Valve (pat 01) Slide rail (vel 300, ac 0.30)
  8. 7/49 Discriminative Approach ◼Construct a discriminative model to classify differences

    in normal sounds using annotated labels ◼Anomaly score: Distance between observation and training data in the discriminative feature space (i.e., decrease in the posterior probability of the correct class) Normal Labels (e.g., machine types and operation parms.) Feature Extractor Valve (pat 02) Valve (pat 01) Slide rail (vel 300, ac 0.30) Labels (e.g., machine types and operation parms.) … … Labels provided in DCASE Challenge Task2
  9. 8/49 Discriminative Approach ◼Construct a discriminative model to classify differences

    in normal sounds using annotated labels ◼Anomaly score: Distance between observation and training data in the discriminative feature space (i.e., decrease in the posterior probability of the correct class) Normal Labels (e.g., machine types and operation parms.) Feature Extractor Valve (pat 02) Valve (pat 01) Slide rail (vel 300, ac 0.30)
  10. 9/49 Discriminative Approach ◼Construct a discriminative model to classify differences

    in normal sounds using annotated labels ◼Anomaly score: Distance between observation and training data in the discriminative feature space (i.e., decrease in the posterior probability of the correct class) Anomalies are not accurately classified because they are not in the training data Obser- vation Valve (pat 02) Valve (pat 01) Slide rail (vel 300, ac 0.30) Feature Extractor
  11. 10/49 Basic approaches and research direction Problem setting during training

    Modeling a probability density function (Harder) Modeling a posterior probability function (Easier) Performance Insufficient High Generative Discriminative → Improve to a sufficient level → Further improve
  12. 11/49 Basic approaches and research direction Problem setting during training

    Modeling a probability density function (Harder) Modeling a posterior probability function (Easier) Performance Insufficient High Generative Discriminative → Improve to a sufficient level → Further improve Stability Relatively stable Unstable Useful but the annotation is costly Use of labels → Reduce the annotation cost → Improve stability → Further improve
  13. 12/49 Outline Introduction of ASD task ◼Problem settings ◼Two basic

    approaches: Generative and Discriminative Practical challenge 1: Instability ◼“Discriminative neighborhood smoothing for generative anomalous sound detection” Practical challenge 2: Data collection (annotation) costs ◼“Improvements of Discriminative Feature Space Training for Anomalous Sound Detection in Unlabeled Conditions” Summary
  14. 13/49 Practical challenge 1: Instability Performance examples of Gen. and

    Dis. approaches Machine A B C D Basic performance Stability Gen. 60 65 55 58 ✗ ✓ Dis. 80 85 30 86 ✓ ✗ In practical applications, stability under various conditions is important
  15. 14/49 Practical challenge 1: Instability Performance examples of Gen. and

    Dis. approaches Machine A B C D Basic performance Stability Gen. 60 65 55 58 ✗ ✓ Dis. 80 85 30 86 ✓ ✗ Goal 68 70 72 70 ✓ ✓ Goal: To achieve stable and (moderately) good performance Specific goals 1. To outperform Gen. 2. To avoid critical performance degradation seen in Dis.
  16. 15/49 Analysis of critical performance degradation Ideal discriminative feature space

    The distance from the training data is the correct anomaly score Training data  Normal sounds are also misclassified (due to high difficulty of the classification task) Case of critical performance degradation Training data The distance from the training data does not work as the anomaly score
  17. 16/49 Proposed method Specific goals 1. To outperform Gen. 2.

    To avoid critical performance degradation seen in Dis. Two important observation 1. Gen. is relatively stable, but its performance is insufficient 2. Even in a discriminative feature space that leads to critical performance degradation, normal and anomalous samples still tend to be distinguished (i.e., although the distance from the training data does not work, the space still provides useful information) Training data
  18. 18/49 Discriminative neighborhood smoothing of generative anomaly scores ◼ Performance

    of Gen. is not high ◼ We assume that normal and anomalous sounds are distinguished
  19. 20/49 Specific goals 1. To outperform Gen. 2. To avoid

    critical performance degradation seen in Dis. Discriminative neighborhood smoothing of generative anomaly scores ↑ Improve the performance of the original Gen. in an ensemble manner utilizing the discriminative space (Note: this method utilizes test data) ↑Remove the risk by not measuring the distance from the training data in the discriminative feature space Training data
  20. 21/49 Experimental evaluation Goal ◼To outperform Gen. ◼To avoid critical

    performance degradation seen in Dis. Comparison methods ◼Gen.: AE[Koizumi+, 2020] ◼Dis.: SerialOE[Kuroyanagi+, 2022] ◼Proposed method: Combines generative anomaly scores calculated by AE and the discriminative feature extractor of SerialOE Setups ◼Dataset: DCASE2021 Task2 ◼Metric: Harmonic mean of AUC and pAUC in the source domain (0 to 100, higher is better)
  21. 22/49 Evaluation results Goal ◼To outperform Gen. ◼To avoid critical

    performance degradation seen in Dis. K (# of neighborhoods) was simply determined by validation Selected the best K in the hyperparameter set
  22. 23/49 Evaluation results Goal ◼To outperform Gen. ◼To avoid critical

    performance degradation seen in Dis. Performance of the proposed method ◼Significantly improve performance of AE, leading an absolute improvement of 7% in All-hmean ◼It has the potential to achieve better performance than Dis. when the proper # of neighborhoods is provided
  23. 24/49 Evaluation results Goal ◼To outperform Gen. ◼To avoid critical

    performance degradation seen in Dis. Performance of the proposed method ◼Significantly improve performance of AE, leading an absolute improvement of 7% in All-hmean ◼It has the potential to achieve better performance than Dis. when the proper # of neighborhoods is provided
  24. 25/49 Evaluation results Goal ◼To outperform Gen. ◼To avoid critical

    performance degradation seen in Dis. Performance of the proposed method ◼Significantly improve performance of AE, leading an absolute improvement of 7% in All-hmean ◼It has the potential to achieve better performance than Dis. when the proper # of neighborhoods is provided
  25. 26/49 Evaluation results Goal ◼To outperform Gen. ◼To avoid critical

    performance degradation seen in Dis. Training, Normal Test, Anomalous Test, Normal  Normal sounds are also misclassified ToyCar-5 ☺ Even if such a feature space is formed, Proposed avoids performance degradation → Dis. results in Performance degradation
  26. 27/49 Visualization of the smoothing process Original AE (Gen.) AUC:

    72.87 % Proposed (Oracle) AUC: 77.29 % Modified to higher values Modified to lower values
  27. 28/49 Practical challenge 1: Instability Background: Trade-off between performance and

    stability ◼The performance of Gen. is insufficient ◼Dis. can sometimes causes critical performance degradation Proposed method 1. Improve the performance of Gen. by utilizing the discriminative feature space 2. Remove the risk by not measuring the distance from the training data in the discriminative space Results ◼Significantly improved the performance of the original Gen. ◼Robustly worked even when Dis. faced critical performance degradation problem
  28. 29/49 Outline Introduction of ASD task ◼Problem settings ◼Two basic

    approaches: Generative and Discriminative Practical challenge 1: Instability ◼“Discriminative neighborhood smoothing for generative anomalous sound detection” Practical challenge 2: Data collection (annotation) costs ◼“Improvements of Discriminative Feature Space Training for Anomalous Sound Detection in Unlabeled Conditions” Summary
  29. 30/49 Practical challenge 2: Annotation costs Effectiveness of labels in

    the discriminative approach Annotating operation params is costly 10:00 speed 30 10:05 speed 20 10:10 speed 25 … Annotation Information of labels Effectiveness Operation param. Capture differences in machine sounds → ☺ Detect anomalies based on differences in machine sounds Detailed param. Capture more detailed differences → ☺ Detect more subtle anomalies Noise type Capture differences in noise →  Detect anomalies based on differences in noise
  30. 31/49 Practical challenge 2: Annotation costs Effectiveness of labels in

    the discriminative approach Annotating operation params is costly 10:00 speed 30 10:05 speed 20 10:10 speed 25 … Annotation Information of labels Effectiveness Operation param. Capture differences in machine sounds → ☺ Detect anomalies based on differences in machine sounds Detailed param. Capture more detailed differences → ☺ Detect more subtle anomalies Noise type Capture differences in noise →  Detect anomalies based on differences in noise Goal: To improve performance without relying on annotated labels
  31. 32/49 Situation we will consider DCASE 2024 Task2 Challenge settings

    Machine types Operation param. Slide rail Valve … … Unavailable (on all or some machines) Available
  32. 33/49 Proposed method: pseudo-labeling Normal sounds FE for pseudo- labeling

    Feature space GMM w/ BIC pseudo- labels FE for ASD A feature space that reflects differences in machines sounds Cluster-B Cluster-C Cluster-A We obtain pseudo-labels by clustering
  33. 34/49 4 types of FE for pseudo-labeling Normal sounds FE

    for pseudo- labeling Feature space GMM w/ BIC pseudo- labels FE for ASD Method name Training task Training data Class Classification of available labels (supervised) DCASE: Small-scale machine sound dataset (including target machine sounds) Triplet Triplet learning (self-supervised) PANNs Audio event classification (supervised) Audioset: Large-scale audio event dataset (not including target machine sounds) OpenL3 Audio-video clip correspondence prediction (self-supervised)
  34. 35/49 Triplet Anchor (original audio) Positive = Anchor + Noise

    Negative = Resize(Anchor) Pull Push Ignore noise → Noise robust feature space Reflect differences in the machine sound
  35. 36/49 Experimental evaluation Goal ◼To improve performance under unlabeled conditions

    Dataset Comparison methods ◼N/A: This does not use pseudo-labels ◼Proposed pseudo-labeling: Class, Triplet, PANNs, and OpenL3 ◼GT: This uses ground-truth labels Dataset name # of machines w/ operation param. w/o operation param. Original DCASE2023 14 0 Modified DCASE2023 0 14 Original DCASE2024 9 7
  36. 37/49 Evaluation results ◼Skipping evaluation results of 9 machines w/

    operation param. from DCASE 2024 (Results were almost the same) ◼Metric: the harmonic mean of AUC and pAUC over all domains ◼Show the arithmetic mean and standard deviation across 5 trials 7 machines w/o operation param. (DCASE 2023) 7 other machines w/o operation param. (DCASE 2023) 7 machines w/o operation param. (DCASE 2024)
  37. 38/49 Evaluation results ◼Ground truth labels are extremely effective in

    most cases ☺ All pseudo-labeling methods improve performance ◼PANNs and OpenL3 basically achieve better performance
  38. 39/49 Evaluation results ◼Ground truth labels are extremely effective in

    most cases ☺ All pseudo-labeling methods improve performance ◼PANNs and OpenL3 basically achieve better performance
  39. 40/49 Evaluation results ◼Ground truth labels are extremely effective in

    most cases ☺ All pseudo-labeling methods improve performance ◼PANNs and OpenL3 basically achieve better performance
  40. 41/49 Evaluation results ◼Ground truth labels are extremely effective in

    most cases ☺ All pseudo-labeling methods improve performance ◼PANNs and OpenL3 basically achieve better performance
  41. 42/49 Evaluation results ◼Ground truth labels are extremely effective in

    most cases ☺ All pseudo-labeling methods improve performance ◼PANNs and OpenL3 basically achieve better performance For example, OpenL3 achieved an absolute improvement of 30%
  42. 43/49 Good case of OpenL3 Shaker in 23eval (AUC of

    N/A was 44.02) ◼OpenL3 successfully reflected the ground-truth labels ☺ It significantly improved the performance Feature spaces colored by ground truth labels
  43. 44/49 Bad case of PANNs and OpenL3 valve in 23dev

    (AUC of N/A was 74.18) ◼PANNs and OpenL3 did not reflect the ground-truth labels  The generated pseudo-labels degraded performance
  44. 45/49 Bad case of PANNs and OpenL3 valve in 23dev

    (AUC of N/A was 74.18) ◼The clusters also reflected differences in types of noise → unhelpful pseudo-labels Contain similar noise Contain similar noise Machine sound Machine sound
  45. 46/49 Good case of Triplet valve in 23dev (AUC of

    N/A was 74.18) ◼Resize(⋅) effectively captured differences in machine sounds ☺ The generated pseudo-labels improved the performance Machine sound Noise Machine sound Noise No noise No noise
  46. 47/49 Practical challenge 2: Annotation costs Background: Annotation is costly

    Proposed method: Pseudo-labeling Results ◼All types of our pseudo-labeling methods were effective ◼PANNs and OpenL3 trained on Audioset especially provided useful pseudo-labels, but they sometimes suffered from noise ◼In some machines, Resize(⋅) very effectively captured differences in machine sounds Future work ◼More detailed analysis (e.g., differences between PANNs and OpenL3, further ablation studies) ◼Noise-robust pseudo-labeling method
  47. 48/49 Outline Introduction of ASD task ◼Problem settings ◼Two basic

    approaches: Generative and Discriminative Practical challenge 1: Instability ◼“Discriminative neighborhood smoothing for generative anomalous sound detection” Practical challenge 2: Data collection (annotation) costs ◼“Improvements of Discriminative Feature Space Training for Anomalous Sound Detection in Unlabeled Conditions” Summary
  48. 49/49 Summary Promising research directions ◼Accumulate and utilize observed data

    ◼Combine or selectively use Gen. and Dis. ◼Develop noise-robust pseudo-labels ◼There is still room for improvement in Domain shift and stability (first-shot) problems Performance Insufficient High Generative Discriminative Stability Relatively stable Unstable Useful but the annotation is costly Use of labels ↑ 2nd topic ↑ 1st topic