Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[ACCV22] Visual Explanation Generation Based on Lambda Attention Branch Networks

[ACCV22] Visual Explanation Generation Based on Lambda Attention Branch Networks

More Decks by Semantic Machine Intelligence Lab., Keio Univ.

Other Decks in Technology

Transcript

  1. Tsumugi Iida1 Takumi Komatsu1 Kanta Kaneda1 Tsubasa Hirakawa2
    Takayoshi Yamashita2 Hironobu Fujiyoshi2 Komei Sugiura1
    1. Keio University 2. Chubu University
    Visual Explanation Generation Based on
    Lambda Attention Branch Networks

    View Slide

  2. 1
    Visual explanations for deep neural networks are important in terms of
    ・Enhancing accountability (e.g., health care)
    ・Providing scientific insight to experts (e.g., solar flare)
    1
    Magnetogram Visual explanation
    Introduction: Visual explanations can provide insights into unexplained phenomena

    View Slide

  3. 1
    Visual explanations for deep neural networks are important in terms of
    ・Enhancing accountability (e.g., health care)
    ・Providing scientific insight to experts (e.g., solar flare)
    1
    Magnetogram Visual explanation
    Introduction: Visual explanations can provide insights into unexplained phenomena

    View Slide

  4. Problem Statement: Visual explanation generation for classification problem
    2
    Input:
    • Image 𝒙 ∈ ℝ!×#×$
    Outputs:
    • Predicted class
    • Visual explanation
    • Attention map: 𝜶 ∈ ℝ%×#×$
    Visual Explanation
    IDRiD
    Important Regions
    Unimportant Regions

    View Slide

  5. Related Works:
    Explanation generation for transformers has not been fully established
    3
    Attention Branch Network
    [Fukui+, CVPR19]
    Generate explanation of CNN by branch structures
    Attention Rollout
    [Abnar+, 20]
    Explanation generation method chained with transformer attention
    Standard explanation generation method for transformers
    [Petsiuk+, BMCV18] Generic method for explanation generation (RiSE)
    Proposed a standard metric: Insertion-Deletion score (ID)
    Problem
    • Visual explanations for Lambda-based transformers has not been established
    • ID is inappropriate for images
    with sparse important regions
    Generic image Sparse image

    View Slide

  6. 4
    Lambda Layer
    • Compatible with CNNs
    • Captures a wide range of relationships
    with less computation than ViT
    ViT Lambda
    Related Works: Lambda Networks[Bello+, ICLR21]

    View Slide

  7. 4
    Lambda Layer
    画像特化したtransformer
    ViTより少ない計算量で
    全ピクセル間の関係を取得可能
    Apply convolution to 𝒉 to generate query, key, value
    𝑄 = Conv 𝒉 ,
    𝑉 = Conv(𝒉)
    𝐾 = Softmax Conv 𝒉
    Apply convolution to value to generate 𝝀!
    Compute the product of key and value to generate 𝝀"
    𝝀! = Conv 𝑉 , 𝝀" = 𝐾#𝑉
    Compute output 𝒉$
    by the following equation:
    𝒉$ = 𝝀! + 𝝀"
    #
    𝑄
    Related Works: Lambda Networks[Bello+, ICLR21]

    View Slide

  8. 4
    Lambda Layer
    画像特化したtransformer
    ViTより少ない計算量で
    全ピクセル間の関係を取得可能
    𝝀! = Conv 𝑉 , 𝝀" = 𝐾#𝑉
    𝒉$ = 𝝀! + 𝝀"
    #
    𝑄
    𝝀&
    : Compressed 𝑄
    ○Explanation generation strategy
    1. Visualize 𝝀&
    2. Introduce a new module to
    generate explanation
    Related Works: Lambda Networks[Bello+, ICLR21]

    View Slide

  9. 5
    Proposed Method: Lambda Attention Branch Networks can generate visual
    explanations for Lambda-based transformers

    View Slide

  10. 5
    extracted features
    Proposed Method: Lambda Attention Branch Networks can generate visual
    explanations for Lambda-based transformers

    View Slide

  11. 5
    Introduces a branch structure
    to generate an attention map 𝜶 ∈ ℝ%×'×(
    Proposed Method: Lambda Attention Branch Networks can generate visual
    explanations for Lambda-based transformers

    View Slide

  12. 5
    • Performs classification based on 𝜶 ⊙ 𝒉'()
    • 𝜶 contributes to both explanation and accuracy
    Proposed Method: Lambda Attention Branch Networks can generate visual
    explanations for Lambda-based transformers

    View Slide

  13. 6
    Proposed Method: Introduced Saliency–guided training [Ismail+, NeurIPS21]
    to reduce noise in the attention map
    1. Generate mask image )
    𝒙 based on attention map
    Mask image )
    𝒙
    Attention map
    Input 𝒙
    2. Minimize KL-divergence between the output of the 𝒙 and )
    𝒙
    ℒ*' = 𝐷*' 𝑓 𝒙 |𝑓 )
    𝒙

    View Slide

  14. 7
    Insertion-Deletion score: IDs = AUC Insertion − AUC(Deletion)
    Problem of Insertion-Deletion score (ID)
    • Out-of-distribution input
    • Prefer to generate coarse explanation
    Coarse attention map Deletion Input Fine-grained
    attention map
    Deletion Input
    Input
    Background of proposed metric:
    IDs is inappropriate for images with sparse important regions

    View Slide

  15. 1. Divide 𝒙 into 𝑚 × 𝑚 patches 𝒑)*
    2. Insert / Delete pixels according with the
    importance of attention map
    3. Plot 𝑛 with predicted probability
    4. Compute AUC
    𝒙!
    = -
    𝒑"#
    𝑖, 𝑗 ∈ Top 𝑛 Importance
    (otherwise)
    Insertion Deletion
    9
    𝒃"#
    𝑝(G
    𝑦 = 1|𝒙!
    )
    𝑛 𝑛
    Proposed Metric: Patch Insertion-Deletion score

    View Slide

  16. Experimental Setting : Conducted experiments on two public datasets
    10
    Indian Diabetic Retinopathy
    Image Dataset (IDRiD)
    • Dataset for detecting
    diabetic retinopathy from
    retinal fundus images
    • Binary classification task
    DeFN Magnetogram Dataset
    • Dataset for solar flare
    prediction
    • Binary classification task
    IDRiD Num of samples
    Training 330
    Validation 83
    Test 103
    DeFN
    Magnetograms
    Time Period Num of samples
    Training 2010-2015 45530
    Validation 2016 7795
    Test 2017 7790

    View Slide

  17. IDRiD ID
    PID
    𝑚 = 2 𝑚 = 4 𝑚 = 8 𝑚 = 16
    RISE
    [Petsiuk+, BMVC18]
    0.319 0.179 0.130 0.136 0.148
    Lambda -0.101 -0.105 -0.116 -0.123 0.093
    Ours 0.431 0.458 0.473 0.470 0.455
    Quantitative Results: IDRiD
    Outperform baseline methods in IDs and PIDs
    10
    𝑚 : patch size
    IDRiD
    Visual Explanation

    View Slide

  18. Quantitative Results: Magnetograms
    Outperform baseline methods in IDs and PIDs
    10
    𝑚 : patch size
    DeFN ID
    PID
    𝑚 = 16 𝑚 = 32 𝑚 = 64 𝑚 = 128
    RISE
    [Petsiuk+, BMVC18]
    0.235 0.261 0.296 0.379 0.461
    Lambda 0.374 0.414 0.403 0.378 0.291
    Ours 0.506 0.748 0.755 0.757 0.756
    Magnetogram
    Visual explanation

    View Slide

  19. Ours
    Fine-grained / appropriate
    RISE
    Coarse / inappropriate
    Lambda
    Focus on outside corners
    RISE Lambda Ours
    11
    Input
    Qualitative Results: IDRiD
    The proposed method generated fine-grained explanation

    View Slide

  20. RISE Lambda Ours
    11
    Ours
    Fine-grained / appropriate
    RISE
    Coarse / inappropriate
    Lambda
    Focus on outside corners
    Input
    Qualitative Results: Magnetograms
    Generate Fine-grained explanation

    View Slide

  21. RISE Lambda Ours
    11
    Ours
    Fine-grained / appropriate
    RISE
    Coarse / inappropriate
    Lambda
    Focus on outside corners
    Input
    Qualitative Results: Magnetograms
    Generate Fine-grained explanation

    View Slide

  22. RISE Lambda Ours
    11
    Ours
    Fine-grained / appropriate
    RISE
    Coarse / inappropriate
    Lambda
    Focus on outside corners
    Input
    Qualitative Results: Magnetograms
    Generate Fine-grained explanation

    View Slide

  23. RISE Lambda Ours
    11
    Ours
    Fine-grained / appropriate
    RISE
    Coarse / inappropriate
    Lambda
    Focus on outside corners
    Input
    Qualitative Results: Magnetograms
    Generate Fine-grained explanation

    View Slide

  24. • We proposed Lambda Attention Branch
    Network, which has a parallel branching
    structure to obtain clear visual explanations
    • We also proposed the PID score, an
    effective evaluation metric for images with
    sparse important regions
    • LABN outperformed the baseline method
    in terms of the ID and PID scores
    13
    Conclusion

    View Slide