Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[ACCV22] Visual Explanation Generation Based on...

[ACCV22] Visual Explanation Generation Based on Lambda Attention Branch Networks

More Decks by Semantic Machine Intelligence Lab., Keio Univ.

Other Decks in Technology

Transcript

  1. Tsumugi Iida1 Takumi Komatsu1 Kanta Kaneda1 Tsubasa Hirakawa2 Takayoshi Yamashita2

    Hironobu Fujiyoshi2 Komei Sugiura1 1. Keio University 2. Chubu University Visual Explanation Generation Based on Lambda Attention Branch Networks
  2. 1 Visual explanations for deep neural networks are important in

    terms of ・Enhancing accountability (e.g., health care) ・Providing scientific insight to experts (e.g., solar flare) 1 Magnetogram Visual explanation Introduction: Visual explanations can provide insights into unexplained phenomena
  3. 1 Visual explanations for deep neural networks are important in

    terms of ・Enhancing accountability (e.g., health care) ・Providing scientific insight to experts (e.g., solar flare) 1 Magnetogram Visual explanation Introduction: Visual explanations can provide insights into unexplained phenomena
  4. Problem Statement: Visual explanation generation for classification problem 2 Input:

    • Image 𝒙 ∈ ℝ!×#×$ Outputs: • Predicted class • Visual explanation • Attention map: 𝜶 ∈ ℝ%×#×$ Visual Explanation IDRiD Important Regions Unimportant Regions
  5. Related Works: Explanation generation for transformers has not been fully

    established 3 Attention Branch Network [Fukui+, CVPR19] Generate explanation of CNN by branch structures Attention Rollout [Abnar+, 20] Explanation generation method chained with transformer attention Standard explanation generation method for transformers [Petsiuk+, BMCV18] Generic method for explanation generation (RiSE) Proposed a standard metric: Insertion-Deletion score (ID) Problem • Visual explanations for Lambda-based transformers has not been established • ID is inappropriate for images with sparse important regions Generic image Sparse image
  6. 4 Lambda Layer • Compatible with CNNs • Captures a

    wide range of relationships with less computation than ViT ViT Lambda Related Works: Lambda Networks[Bello+, ICLR21]
  7. 4 Lambda Layer 画像特化したtransformer ViTより少ない計算量で 全ピクセル間の関係を取得可能 Apply convolution to 𝒉

    to generate query, key, value 𝑄 = Conv 𝒉 , 𝑉 = Conv(𝒉) 𝐾 = Softmax Conv 𝒉 Apply convolution to value to generate 𝝀! Compute the product of key and value to generate 𝝀" 𝝀! = Conv 𝑉 , 𝝀" = 𝐾#𝑉 Compute output 𝒉$ by the following equation: 𝒉$ = 𝝀! + 𝝀" # 𝑄 Related Works: Lambda Networks[Bello+, ICLR21]
  8. 4 Lambda Layer 画像特化したtransformer ViTより少ない計算量で 全ピクセル間の関係を取得可能 𝝀! = Conv 𝑉

    , 𝝀" = 𝐾#𝑉 𝒉$ = 𝝀! + 𝝀" # 𝑄 𝝀& : Compressed 𝑄 ◦Explanation generation strategy 1. Visualize 𝝀& 2. Introduce a new module to generate explanation Related Works: Lambda Networks[Bello+, ICLR21]
  9. 5 extracted features Proposed Method: Lambda Attention Branch Networks can

    generate visual explanations for Lambda-based transformers
  10. 5 Introduces a branch structure to generate an attention map

    𝜶 ∈ ℝ%×'×( Proposed Method: Lambda Attention Branch Networks can generate visual explanations for Lambda-based transformers
  11. 5 • Performs classification based on 𝜶 ⊙ 𝒉'() •

    𝜶 contributes to both explanation and accuracy Proposed Method: Lambda Attention Branch Networks can generate visual explanations for Lambda-based transformers
  12. 6 Proposed Method: Introduced Saliency–guided training [Ismail+, NeurIPS21] to reduce

    noise in the attention map 1. Generate mask image ) 𝒙 based on attention map Mask image ) 𝒙 Attention map Input 𝒙 2. Minimize KL-divergence between the output of the 𝒙 and ) 𝒙 ℒ*' = 𝐷*' 𝑓 𝒙 |𝑓 ) 𝒙
  13. 7 Insertion-Deletion score: IDs = AUC Insertion − AUC(Deletion) Problem

    of Insertion-Deletion score (ID) • Out-of-distribution input • Prefer to generate coarse explanation Coarse attention map Deletion Input Fine-grained attention map Deletion Input Input Background of proposed metric: IDs is inappropriate for images with sparse important regions
  14. 1. Divide 𝒙 into 𝑚 × 𝑚 patches 𝒑)* 2.

    Insert / Delete pixels according with the importance of attention map 3. Plot 𝑛 with predicted probability 4. Compute AUC 𝒙! = - 𝒑"# 𝑖, 𝑗 ∈ Top 𝑛 Importance (otherwise) Insertion Deletion 9 𝒃"# 𝑝(G 𝑦 = 1|𝒙! ) 𝑛 𝑛 Proposed Metric: Patch Insertion-Deletion score
  15. Experimental Setting : Conducted experiments on two public datasets 10

    Indian Diabetic Retinopathy Image Dataset (IDRiD) • Dataset for detecting diabetic retinopathy from retinal fundus images • Binary classification task DeFN Magnetogram Dataset • Dataset for solar flare prediction • Binary classification task IDRiD Num of samples Training 330 Validation 83 Test 103 DeFN Magnetograms Time Period Num of samples Training 2010-2015 45530 Validation 2016 7795 Test 2017 7790
  16. IDRiD ID PID 𝑚 = 2 𝑚 = 4 𝑚

    = 8 𝑚 = 16 RISE [Petsiuk+, BMVC18] 0.319 0.179 0.130 0.136 0.148 Lambda -0.101 -0.105 -0.116 -0.123 0.093 Ours 0.431 0.458 0.473 0.470 0.455 Quantitative Results: IDRiD Outperform baseline methods in IDs and PIDs 10 𝑚 : patch size IDRiD Visual Explanation
  17. Quantitative Results: Magnetograms Outperform baseline methods in IDs and PIDs

    10 𝑚 : patch size DeFN ID PID 𝑚 = 16 𝑚 = 32 𝑚 = 64 𝑚 = 128 RISE [Petsiuk+, BMVC18] 0.235 0.261 0.296 0.379 0.461 Lambda 0.374 0.414 0.403 0.378 0.291 Ours 0.506 0.748 0.755 0.757 0.756 Magnetogram Visual explanation
  18. Ours Fine-grained / appropriate RISE Coarse / inappropriate Lambda Focus

    on outside corners RISE Lambda Ours 11 Input Qualitative Results: IDRiD The proposed method generated fine-grained explanation
  19. RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse

    / inappropriate Lambda Focus on outside corners Input Qualitative Results: Magnetograms Generate Fine-grained explanation
  20. RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse

    / inappropriate Lambda Focus on outside corners Input Qualitative Results: Magnetograms Generate Fine-grained explanation
  21. RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse

    / inappropriate Lambda Focus on outside corners Input Qualitative Results: Magnetograms Generate Fine-grained explanation
  22. RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse

    / inappropriate Lambda Focus on outside corners Input Qualitative Results: Magnetograms Generate Fine-grained explanation
  23. • We proposed Lambda Attention Branch Network, which has a

    parallel branching structure to obtain clear visual explanations • We also proposed the PID score, an effective evaluation metric for images with sparse important regions • LABN outperformed the baseline method in terms of the ID and PID scores 13 Conclusion