[ACCV22] Visual Explanation Generation Based on Lambda Attention Branch Networks

Slide 1

Slide 1 text

Tsumugi Iida1 Takumi Komatsu1 Kanta Kaneda1 Tsubasa Hirakawa2 Takayoshi Yamashita2 Hironobu Fujiyoshi2 Komei Sugiura1 1. Keio University 2. Chubu University Visual Explanation Generation Based on Lambda Attention Branch Networks

Slide 2

Slide 2 text

1 Visual explanations for deep neural networks are important in terms of ・Enhancing accountability (e.g., health care) ・Providing scientific insight to experts (e.g., solar flare) 1 Magnetogram Visual explanation Introduction: Visual explanations can provide insights into unexplained phenomena

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Problem Statement: Visual explanation generation for classification problem 2 Input: • Image 𝒙 ∈ ℝ!×#×$ Outputs: • Predicted class • Visual explanation • Attention map: 𝜶 ∈ ℝ%×#×$ Visual Explanation IDRiD Important Regions Unimportant Regions

Slide 5

Slide 5 text

Related Works: Explanation generation for transformers has not been fully established 3 Attention Branch Network [Fukui+, CVPR19] Generate explanation of CNN by branch structures Attention Rollout [Abnar+, 20] Explanation generation method chained with transformer attention Standard explanation generation method for transformers [Petsiuk+, BMCV18] Generic method for explanation generation (RiSE) Proposed a standard metric: Insertion-Deletion score (ID) Problem • Visual explanations for Lambda-based transformers has not been established • ID is inappropriate for images with sparse important regions Generic image Sparse image

Slide 6

Slide 6 text

4 Lambda Layer • Compatible with CNNs • Captures a wide range of relationships with less computation than ViT ViT Lambda Related Works: Lambda Networks[Bello+, ICLR21]

Slide 7

Slide 7 text

4 Lambda Layer 画像特化したtransformer ViTより少ない計算量で全ピクセル間の関係を取得可能 Apply convolution to 𝒉 to generate query, key, value 𝑄 = Conv 𝒉 , 𝑉 = Conv(𝒉) 𝐾 = Softmax Conv 𝒉 Apply convolution to value to generate 𝝀! Compute the product of key and value to generate 𝝀" 𝝀! = Conv 𝑉 , 𝝀" = 𝐾#𝑉 Compute output 𝒉$ by the following equation: 𝒉$ = 𝝀! + 𝝀" # 𝑄 Related Works: Lambda Networks[Bello+, ICLR21]

Slide 8

Slide 8 text

4 Lambda Layer 画像特化したtransformer ViTより少ない計算量で全ピクセル間の関係を取得可能 𝝀! = Conv 𝑉 , 𝝀" = 𝐾#𝑉 𝒉$ = 𝝀! + 𝝀" # 𝑄 𝝀& : Compressed 𝑄 ○Explanation generation strategy 1. Visualize 𝝀& 2. Introduce a new module to generate explanation Related Works: Lambda Networks[Bello+, ICLR21]

Slide 9

Slide 9 text

5 Proposed Method: Lambda Attention Branch Networks can generate visual explanations for Lambda-based transformers

Slide 10

Slide 10 text

5 extracted features Proposed Method: Lambda Attention Branch Networks can generate visual explanations for Lambda-based transformers

Slide 11

Slide 11 text

5 Introduces a branch structure to generate an attention map 𝜶 ∈ ℝ%×'×( Proposed Method: Lambda Attention Branch Networks can generate visual explanations for Lambda-based transformers

Slide 12

Slide 12 text

5 • Performs classification based on 𝜶 ⊙ 𝒉'() • 𝜶 contributes to both explanation and accuracy Proposed Method: Lambda Attention Branch Networks can generate visual explanations for Lambda-based transformers

Slide 13

Slide 13 text

6 Proposed Method: Introduced Saliency–guided training [Ismail+, NeurIPS21] to reduce noise in the attention map 1. Generate mask image ) 𝒙 based on attention map Mask image ) 𝒙 Attention map Input 𝒙 2. Minimize KL-divergence between the output of the 𝒙 and ) 𝒙 ℒ*' = 𝐷*' 𝑓 𝒙 |𝑓 ) 𝒙

Slide 14

Slide 14 text

7 Insertion-Deletion score: IDs = AUC Insertion − AUC(Deletion) Problem of Insertion-Deletion score (ID) • Out-of-distribution input • Prefer to generate coarse explanation Coarse attention map Deletion Input Fine-grained attention map Deletion Input Input Background of proposed metric: IDs is inappropriate for images with sparse important regions

Slide 15

Slide 15 text

1. Divide 𝒙 into 𝑚 × 𝑚 patches 𝒑)* 2. Insert / Delete pixels according with the importance of attention map 3. Plot 𝑛 with predicted probability 4. Compute AUC 𝒙! = - 𝒑"# 𝑖, 𝑗 ∈ Top 𝑛 Importance (otherwise) Insertion Deletion 9 𝒃"# 𝑝(G 𝑦 = 1|𝒙! ) 𝑛 𝑛 Proposed Metric: Patch Insertion-Deletion score

Slide 16

Slide 16 text

Experimental Setting : Conducted experiments on two public datasets 10 Indian Diabetic Retinopathy Image Dataset (IDRiD) • Dataset for detecting diabetic retinopathy from retinal fundus images • Binary classification task DeFN Magnetogram Dataset • Dataset for solar flare prediction • Binary classification task IDRiD Num of samples Training 330 Validation 83 Test 103 DeFN Magnetograms Time Period Num of samples Training 2010-2015 45530 Validation 2016 7795 Test 2017 7790

Slide 17

Slide 17 text

IDRiD ID PID 𝑚 = 2 𝑚 = 4 𝑚 = 8 𝑚 = 16 RISE [Petsiuk+, BMVC18] 0.319 0.179 0.130 0.136 0.148 Lambda -0.101 -0.105 -0.116 -0.123 0.093 Ours 0.431 0.458 0.473 0.470 0.455 Quantitative Results: IDRiD Outperform baseline methods in IDs and PIDs 10 𝑚 : patch size IDRiD Visual Explanation

Slide 18

Slide 18 text

Quantitative Results: Magnetograms Outperform baseline methods in IDs and PIDs 10 𝑚 : patch size DeFN ID PID 𝑚 = 16 𝑚 = 32 𝑚 = 64 𝑚 = 128 RISE [Petsiuk+, BMVC18] 0.235 0.261 0.296 0.379 0.461 Lambda 0.374 0.414 0.403 0.378 0.291 Ours 0.506 0.748 0.755 0.757 0.756 Magnetogram Visual explanation

Slide 19

Slide 19 text

Ours Fine-grained / appropriate RISE Coarse / inappropriate Lambda Focus on outside corners RISE Lambda Ours 11 Input Qualitative Results: IDRiD The proposed method generated fine-grained explanation

Slide 20

Slide 20 text

RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse / inappropriate Lambda Focus on outside corners Input Qualitative Results: Magnetograms Generate Fine-grained explanation

Slide 21

Slide 21 text

RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse / inappropriate Lambda Focus on outside corners Input Qualitative Results: Magnetograms Generate Fine-grained explanation

Slide 22

Slide 22 text

RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse / inappropriate Lambda Focus on outside corners Input Qualitative Results: Magnetograms Generate Fine-grained explanation

Slide 23

Slide 23 text

RISE Lambda Ours 11 Ours Fine-grained / appropriate RISE Coarse / inappropriate Lambda Focus on outside corners Input Qualitative Results: Magnetograms Generate Fine-grained explanation

Slide 24

Slide 24 text

• We proposed Lambda Attention Branch Network, which has a parallel branching structure to obtain clear visual explanations • We also proposed the PID score, an effective evaluation metric for images with sparse important regions • LABN outperformed the baseline method in terms of the ID and PID scores 13 Conclusion