Slide 1

Slide 1 text

NeurIPS2020 papers on Dataset Shift and Machine Learning Masanari Kimura (@machinery81)

Slide 2

Slide 2 text

TL;DR • Briefly describe the problem of dataset shift. • Categorize NeurIPS2020 papers that deal with dataset shift. • Introduce the papers accepted for NeurIPS2020 that focus on dataset shift. • https://papers.nips.cc/paper/2020 1

Slide 3

Slide 3 text

Taxonomy of NeurIPS2020 papers about Dataset Shift 2

Slide 4

Slide 4 text

3 Taxonomy of NeurIPS2020 papers about Dataset Shift Covariate Shift, Target Shift, Affine Distribution Shift • [1] Uehara, Masatoshi, Masahiro Kato, and Shota Yasui. n.d. “Off-Policy Evaluation and Learning for External Validity under a Covariate Shift.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [3] Fang, Tongtong, et al. "Rethinking Importance Weighting for Deep Learning under Distribution Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [8] Reisizadeh, Amirhossein, et al. "Robust Federated Learning: The Case of Affine Distribution Shifts." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. Meta-Analysis of Distribution Shift • [4] Taori, Rohan, et al. "Measuring robustness to natural distribution shifts in image classification." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [6] Kulinski, Sean, Saurabh Bagchi, and David I. Inouye. "Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 5

Slide 5 text

4 Domain Adaptation • [5] Tachet des Combes, Remi, et al. "Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [7] Chen, Yining, et al. "Self-training Avoids Using Spurious Features Under Domain Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [9] Kang, Guoliang, et al. "Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [10] Venkat, Naveen, et al. "Your Classifier can Secretly Suffice Multi-Source Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [11] Zhang, Kun, et al. "Domain adaptation as a problem of inference on graphical models." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [12] Cui, Shuhao, et al. "Heuristic Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [13] Park, Kwanyong, et al. "Discover, Hallucinate, and Adapt: Open Compound Domain Adaptation for Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [14] Ge, Yixiao, et al. "Self-paced contrastive learning with hybrid memory for domain adaptive object re-id." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [15] Balaji, Yogesh, Rama Chellappa, and Soheil Feizi. "Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [16] Saito, Kuniaki, et al. "Universal domain adaptation through self supervision." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [17] Wang, Ximei, et al. "Transferable Calibration with Lower Bias and Variance in Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [18] Combes, Remi Tachet des, et al. "Domain adaptation with conditional distribution matching and generalized label shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [19] Luo, Yawei, et al. "Adversarial style mining for one-shot unsupervised domain adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 6

Slide 6 text

5 Sampling Bias, Selection Bias • [20] Purushwalkam, Senthil, and Abhinav Gupta. 2020. “Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [21] Mazaheri, Bijan, Siddharth Jain, and Jehoshua Bruck. 2020. “Robust Correction of Sampling Bias Using Cumulative Distribution Functions.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [22] Flanigan, Bailey, Paul Gölz, Anupam Gupta, and Ariel Procaccia. 2020. “Neutralizing Self-Selection Bias in Sampling for Sortition.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. Out-Of-Distribution Detection • [2] Tack, Jihoon, et al. "CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [25] Teney, Damien, et al. "On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [26] Jeong, Taewon, and Heeyoung Kim. "OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [27] Nandy, Jay, Wynne Hsu, and Mong Li Lee. "Towards Maximizing the Representation Gap between In-Domain & Out-of- Distribution Examples." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [28] Bitterwolf, Julian, Alexander Meinke, and Matthias Hein. "Certifiably Adversarially Robust Detection of Out-of-Distribution Data." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [29] Kirichenko, Polina, Pavel Izmailov, and Andrew Gordon Wilson. "Why normalizing flows fail to detect out-of-distribution data." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [30] Xiao, Zhisheng, Qing Yan, and Yali Amit. "Likelihood regret: An out-of-distribution detection score for variational auto-encoder." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [31] Liu, Weitang, et al. "Energy-based Out-of-distribution Detection." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 7

Slide 7 text

Dataset Shift and Machine Learning 6

Slide 8

Slide 8 text

Dataset Shift in Machine Learning 7 Characterization of dataset shift problems based on (Moreno-Torres et al., 2012) [23] • Covariate Shift • Target Shift • Concept Shift • Sample Selection Bias • Domain Shift • Source Component Shift [23] Moreno-Torres, Jose G., et al. "A unifying view on dataset shift in classification." Pattern recognition 45.1 (2012): 521-530.

Slide 9

Slide 9 text

Covariate Shift, Target Shift and Concept Shift 8 Definition. (Covariate Shift Assumption) 𝑝!" 𝑥 ≠ 𝑝!# 𝑥 𝑝!" 𝑦 𝑥 = 𝑝!#(𝑦|𝑥) Definition. (Target Shift Assumption) 𝑝!" 𝑦 ≠ 𝑝!# 𝑦 𝑝!" 𝑥 𝑦 = 𝑝!#(𝑥|𝑦) Definition. (Concept Shift Assumption) 𝑝!" 𝑦|𝑥 ≠ 𝑝!# 𝑦|𝑥 𝑝!" 𝑥 𝑦 ≠ 𝑝!#(𝑥|𝑦)

Slide 10

Slide 10 text

Covariate Shift: Example; 𝑓: prefecture↦income 9 Train Data Test Data 𝑓(𝑥) Train Test Definition. (Covariate Shift Assumption) 𝑝!" 𝑥 ≠ 𝑝!# 𝑥 𝑝!" 𝑦 𝑥 = 𝑝!# (𝑦|𝑥) Photo by https://doda.jp/guide/heikin/area/ [24] Shimodaira, Hidetoshi. 2000. “Improving Predictive Inference under Covariate Shift by Weighting the Log-Likelihood Function.” Journal of Statistical Planning and Inference 90 (2): 227–44.

Slide 11

Slide 11 text

Target Shift Example; 𝑓: prefecture↦income 10 Train Data Test Data 𝑓(𝑥) Train Test Definition. (Target Shift Assumption) 𝑝!" 𝑦 ≠ 𝑝!# 𝑦 𝑝!" 𝑥 𝑦 = 𝑝!# (𝑥|𝑦) Photo by https://doda.jp/guide/heikin/area/

Slide 12

Slide 12 text

Concept Shift Example ; 𝑓: prefecture↦income 11 Train Data Test Data 𝑓(𝑥) Train Test Definition. (Concept Shift Assumption) 𝑝!" 𝑦|𝑥 ≠ 𝑝!# 𝑦|𝑥 𝑝!" 𝑥 𝑦 ≠ 𝑝!# (𝑥|𝑦) Photo by https://nensyu-labo.com/heikin_suii.htm 1997 2014

Slide 13

Slide 13 text

Sample Selection Bias 12 Definition. (Sample Selection Bias Assumption) Let 𝜉 be the selection function that tends to include or exclude observations. 𝑝!" 𝜉|𝑥, 𝑦 ≠ 𝑝!# 𝜉|𝑥, 𝑦

Slide 14

Slide 14 text

Domain Shift 13 Definition. (Domain Shift Assumption) A situation characterized by the change in the measurement system or in the method of description. Photo by https://dailyportalz.jp/kiji/130606160826

Slide 15

Slide 15 text

Source Component Shift 14 Definition. (Source Component Shift) An adaptation scenario where the observed data are assumed to be composed of a certain number of different components with the proportions of components that vary between the train and test data.

Slide 16

Slide 16 text

Covariate Shift / Target Shift 15

Slide 17

Slide 17 text

• Importance weighting (IW) is the technique which handles distribution shift. However, IW cannot work well on comlex data. • In this paper, the authors rethink IW and theoretically show it suffers from a circular dependency. • They proposed the dynamic IW that make it enable us to enjoy IW with deep neural networks. 16 [3] Fang, Tongtong, et al. "Rethinking Importance Weighting for Deep Learning under Distribution Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 18

Slide 18 text

Importance Weighting for Distribution Shift Importance weighting is a most popular way to tackle the distribution shift: 𝔼$!"(&,() 𝑓 𝑥, 𝑦 = 𝔼$!#(&,() 𝑤∗ 𝑥, 𝑦 𝑓 𝑥, 𝑦 , where 𝑤∗ 𝑥, 𝑦 = 𝑝!# (𝑥, 𝑦) 𝑝!" (𝑥, 𝑦) . This means that the weighted expectation of 𝑓 over 𝑝!" 𝑥, 𝑦 becomes unbiased. Example. (Covariate Shift Adaptation) 𝔼#+,(%) 𝑓(𝑥) = * 𝑓 𝑥 𝑝!' 𝑥 𝑑𝑥 = * 𝑓 𝑥 𝑝!' 𝑥 𝑝!" 𝑥 𝑝!" 𝑥 𝑑𝑥 = 𝔼#+-(%) #+, % #+- % 𝑓(𝑥) 17

Slide 19

Slide 19 text

Components of Importance Weighting Importance Weighting can handle distribution shift in two steps: 1. Weight Estimation from 𝑝!#(𝑥, 𝑦) and 𝑝!#(𝑥); 2. Weighted Classification via 𝔼$!"(&,() 𝑓 𝑥, 𝑦 = 𝔼$!#(&,() 𝑤∗ 𝑥, 𝑦 𝑓 𝑥, 𝑦 . Weight Estimation should have expressive power if the form of data is complex. → Consider to boost the expressive power by an external feature extractor. 18

Slide 20

Slide 20 text

Circular Dependency 19 Chicken-or-Egg causality dilemma • We need 𝑤∗ to train 𝑓. • We need a trained 𝑓 to estimate 𝑤∗. Photo by (Fang et al., 2020)

Slide 21

Slide 21 text

Static/Dynamic Importance Weighting 20 Photo by (Fang et al., 2020)

Slide 22

Slide 22 text

Feasibility of non-linear transformation Theorem 1. For a fixed, deterministic and invertible transformation 𝜋 𝑥, 𝑦 ↦ 𝑧, let 𝑝!" (𝑧) and 𝑝!' (𝑧) be the p.d.f. induced by 𝑝!" (𝑥, 𝑦) and 𝑝!' (𝑥, 𝑦), and 𝜋. Then, 𝑤∗ 𝑥, 𝑦 = 𝑝!' (𝑥, 𝑦) 𝑝!" (𝑥, 𝑦) = 𝑝!' (𝑧) 𝑝!" (𝑧) = 𝑤∗ 𝑧 . 21 Photo by (Fang et al., 2020)

Slide 23

Slide 23 text

Algorithm 22 Photo by (Fang et al., 2020)

Slide 24

Slide 24 text

Experimental Results 23 Photo by (Fang et al., 2020) Photo by (Fang et al., 2020)

Slide 25

Slide 25 text

• Derive an efficiency bound of OPE under the covariate shift. • propose estimators constructed by the estimators of the density ratio, behavior policy, and conditional expected reward. 24 [3] Fang, Tongtong, et al. "Rethinking Importance Weighting for Deep Learning under Distribution Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 26

Slide 26 text

• They propose a new federated learning scheme called FLRA, a Federated Learning framework with Robustness to Affine distribution shifts. • FLRA has a small communication overhead and a low computation complexity. • They use the PAC-Bayes framework to prove a generalization error bound for FLRA’s learnt classifier 25 [8] Reisizadeh, Amirhossein, et al. "Robust Federated Learning: The Case of Affine Distribution Shifts." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 27

Slide 27 text

Meta-Analysis of Distribution Shift 26

Slide 28

Slide 28 text

• They study how robust current ImageNet models are to distribution shifts arising from natural variations in datasets. • Most research on robustness focuses on synthetic image perturbations, which leaves open how robustness on synthetic distribution shift relates to distribution shift arising in real data. • Informed by an evaluation of 204 ImageNet models in 213 different test conditions, they find that there is often little to no transfer of robustness from current synthetic to natural distribution shift. 27 [4] Taori, Rohan, et al. "Measuring robustness to natural distribution shifts in image classification." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 29

Slide 29 text

• They motivate and define the problem of feature shift detection for localizing which specific sensor values have been manipulated. • They define conditional distribution hypothesis tests and use this formalization as the key theoretical tool to approach this problem. • They propose a score-based test statistic inspired by Fisher divergence but adapted for a novel context as a distribution divergence measure. 28 [6] Kulinski, Sean, Saurabh Bagchi, and David I. Inouye. "Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 30

Slide 30 text

Domain Adaptation 29

Slide 31

Slide 31 text

30 [18] Combes, Remi Tachet des, et al. "Domain adaptation with conditional distribution matching and generalized label shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • Recent work has shown limitations of adversarial learning-based approach when label distributions differ between the source and target domains. • In this paper, they propose a new assumption, generalized label shift (GLS), to improve robustness against mismatched label distributions. • Under GLS, they provide theoretical guarantees on the transfer performance of any classifier.

Slide 32

Slide 32 text

• In unsupervised domain adaptation, existing theory focuses on situations where the source and target domains are close. • In practice, conditional entropy minimization and pseudo-labeling work even when the domain shifts are much larger than those analyzed by existing theory. • They identify and analyze one particular setting where the domain shift can be large, but these algorithms provably work. 31 [7] Chen, Yining, et al. "Self-training Avoids Using Spurious Features Under Domain Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 33

Slide 33 text

• Previous domain discrepancy minimization methods are mainly based on the adversarial training, which tends to ignore the pixel-wise relationships and less discriminative. • In this paper, they propose to build the pixel-level cycle association between source and target pixel pairs. • https://github.com/kgl-prml/PixelLevel-Cycle-Association 32 [9] Kang, Guoliang, et al. "Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 34

Slide 34 text

33 [10] Venkat, Naveen, et al. "Your Classifier can Secretly Suffice Multi-Source Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • Existing methods aim to minimize this domain-shift using auxiliary distribution alignment objectives for Multi- Source Domain Adaptation. • In this work, they present a different perspective to MSDA wherein deep models are observed to implicitly align the domains under label supervision. • Thus, they aim to utilize implicit alignment without additional training objectives to perform adaptation

Slide 35

Slide 35 text

• They propose to use a graphical model as a compact way to encode the change property of the joint distribution. • Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains. • This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes. 34 [11] Zhang, Kun, et al. "Domain adaptation as a problem of inference on graphical models." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 36

Slide 36 text

• In visual domain adaptation (DA), separating the domain-specific characteristics from the domain-invariant representations is an ill-posed problem. • In this paper, they address the modeling of domain- invariant and domain-specific information from the heuristic search perspective. • With the guidance of heuristic representations, they formulate a principled framework of Heuristic Domain Adaptation (HDA) with well-founded theoretical guarantees. • https://github.com/cuishuhao/HDA 35 [12] Cui, Shuhao, et al. "Heuristic Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 37

Slide 37 text

• They study the problem of open compound domain adaptation (OCDA). • In this setting, the target is a union of multiple homogeneous domains without domain labels. The unseen target data also needs to be considered at the test time, reflecting the realistic data collection from both mixed and novel situations. • To this end, they propose a new OCDA framework for semantic segmentation that incorporates three key functionalities: discover, hallucinate, and adapt. 36 [13] Park, Kwanyong, et al. "Discover, Hallucinate, and Adapt: Open Compound Domain Adaptation for Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 38

Slide 38 text

37 [14] Ge, Yixiao, et al. "Self-paced contrastive learning with hybrid memory for domain adaptive object re-id." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • They propose a unified contrastive learning framework to incorporate all available information from both source and target domains for joint feature learning. • They design a self-paced contrastive learning strategy with a novel clustering reliability criterion to prevent training error amplification caused by noisy pseudo-class labels.

Slide 39

Slide 39 text

• They derive a computationally-efficient dual form of the robust Optimal Transport optimization that is amenable to modern deep learning applications. • They demonstrate the effectiveness of our formulation in two applications of GANs and domain adaptation. 38 [15] Balaji, Yogesh, Rama Chellappa, and Soheil Feizi. "Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 40

Slide 40 text

• They propose DANCE, a universal domain adaptation framework that can be applied out-of- the-box without prior knowledge of specific category shift. • They design two novel loss functions, neighborhood clustering and entropy separation, for category shift-agnostic adaptation 39 [16] Saito, Kuniaki, et al. "Universal domain adaptation through self supervision." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 41

Slide 41 text

• They uncover a dilemma in the open problem of Calibration in DA. • They propose a Transferable Calibration (TransCal) method, achieving more accurate calibration with lower bias and variance in a unified hyperparameter- free optimization framework 40 [17] Wang, Ximei, et al. "Transferable Calibration with Lower Bias and Variance in Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 42

Slide 42 text

• They present an adversarial style mining (ASM) method to solve One-Shot Unsupervised Domain Adaptation (OSUDA) problems. ASM combines a style transfer module and a task-specific model into an adversarial manner, making them mutually benefit to each other during the learning process. 41 [19] Luo, Yawei, et al. "Adversarial style mining for one-shot unsupervised domain adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 43

Slide 43 text

Sampling Bias / Selection Bias 42

Slide 44

Slide 44 text

• They present quantitative experiments to demystify the performance gains of self-supervised representation learning. • They hypothesize that this could be due to dataset biases: the pre-training and downstream datasets are biased in an advantageous manner. 43 [20] Purushwalkam, Senthil, and Abhinav Gupta. 2020. “Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 45

Slide 45 text

• Current approaches for alleviating the covariate shift rely on estimating the ratio of training and target probability density functions. • These techniques require parameter tuning and can be unstable across different datasets. • They propose a CDF-based framework for handling the covariate shift. 44 [21] Mazaheri, Bijan, Siddharth Jain, and Jehoshua Bruck. 2020. “Robust Correction of Sampling Bias Using Cumulative Distribution Functions.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 46

Slide 46 text

• They proposed sampling algorithm that, even in the presence of limited participation and self-selection bias, retains the individual fairness properties, while also allowing the deterministic satisfaction of quotas. • Proposed algorithm satisfies: • End-to-End Fairness • Deterministic Quota Satisfaction • Computational Efficiency 45 [22] Flanigan, Bailey, Paul Gölz, Anupam Gupta, and Ariel Procaccia. 2020. “Neutralizing Self-Selection Bias in Sampling for Sortition.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 47

Slide 47 text

Out-Of-Distribution Detection 46

Slide 48

Slide 48 text

• The authors propose a contrasive learning-based out-of-distribution detection • Proposed method contrasts the sample with distributionally-shifted augmentation • The propose a new detection score 47 [2] Tack, Jihoon, et al. "CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 49

Slide 49 text

Out-Of-Distribution Detection 48 [32] Bulusu, Saikiran, et al. "Anomalous example detection in deep learning: A survey." IEEE Access 8 (2020): 132330-132347. Photo by (Saikiran et al., 2020) Out-Of-Distribution detection • is a task of identifying whether test input is drawn far from training distribution or not; • aims to detect OOD samples using only training data.

Slide 50

Slide 50 text

Contrasive Learning The idea of contrasive learning is to learn an encoder 𝑓) to extract the information to distinguish similar samples from the others: ℒ*+, 𝑥, 𝑥- , 𝑥. = − 1 𝑥- log ∑ %.∈ %/ exp 𝑠𝑖𝑚 𝑧 𝑥 , 𝑧 𝑥0 𝜏 ∑ %.∈ %/ ∪ %0 exp 𝑠𝑖𝑚 𝑧 𝑥 , 𝑧 𝑥0 𝜏 , where, 𝑧 𝑥 = 𝑓) 𝑥 or 𝑧 𝑥 = 𝑔2 𝑓) (𝑥) . 49

Slide 51

Slide 51 text

SimCLR (Ting et al., 2020) ℒ!"#$%& ℬ; 𝒯 = 1 2ℬ ( "'( ℬ ℒ*+, ) 𝑥" ( , ) 𝑥" - , , ℬ." + ℒ*+, ) 𝑥" - , ) 𝑥" ( , , ℬ." , where , ℬ = ) 𝑥" ( "'( / ∪ ) 𝑥" - "'( / and , ℬ." = ) 𝑥0 ( 01" / ∪ ) 𝑥0 - 01" / . 50 [33] Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. “A Simple Framework for Contrastive Learning of Visual Representations.” In Proceedings of the 37th International Conference on Machine Learning, edited by Hal Daumé Iii and Aarti Singh, 119:1597–1607. Proceedings of Machine Learning Research. PMLR. Photo by (Chen et al., 2020)

Slide 52

Slide 52 text

Contrasive Learning for Distribution-Shifting Transformations Contrasting Shifted Instances: ℒ*+,.34 = ℒ356789 @ 3∈𝒮 ℬ3 ; 𝒯 , 𝑤ℎ𝑒𝑟𝑒 ℬ3 = 𝑆 𝑥5 5;< = . Classifying Shifted Instances: 𝐿*>?.34 = 1 2𝐵 1 𝐾 L 3∈𝒮 − log 𝑝*>?.34 𝑦3 = 𝑆|N 𝑥3 The final loss: ℒ734 = ℒ*+,.34 + 𝜆 ⋅ 𝐿*>?.34 51

Slide 53

Slide 53 text

Experimental Results: Unlabeled Datasets 52

Slide 54

Slide 54 text

Experimental Results: Labeled Datasets 53

Slide 55

Slide 55 text

Ablation Study 54

Slide 56

Slide 56 text

• VQA-CP has become the standard OOD benchmark for visual question answering, but they discovered three troubling practices in its current use: 1. Most published methods rely on explicit knowledge of the construction of the OOD splits. 2. The OOD test set is used for model selection. 3. A model’s in-domain performance is assessed after retraining it on in-domain splits 55 [25] Teney, Damien, et al. "On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 57

Slide 57 text

VQA-CP dataset 56 [25] Teney, Damien, et al. "On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. Photo by (Damien et al., 2020)

Slide 58

Slide 58 text

Bad Practice on the VQA-CP dataset 57 [25] Teney, Damien, et al. "On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. Photo by (Damien et al., 2020) Many existing methods exploit the fact that the training and test distributions are approximately inverse of each other.

Slide 59

Slide 59 text

• They proposed OOD-MAML, which is a meta- learning method used for implementing K-shot N- way classification and OOD detection simultaneously. • In OOD-MAML, they introduced two types of meta-parameters: 1. related to the base model as in the case of MAML 2. fakesample parameters, plays the role of generating OOD samples 58 [26] Jeong, Taewon, and Heeyoung Kim. "OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 60

Slide 60 text

• The existing formulation for DPN models often lead to indistinguishable representations between in-domain examples with high data uncertainty among multiple classes and OOD examples. • In this work, we have proposed a novel loss function for DPN models that maximizes the representation gap between in-domain and OOD examples. 59 [27] Nandy, Jay, Wynne Hsu, and Mong Li Lee. "Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 61

Slide 61 text

• They propose GOOD, a novel training method to achieve guaranteed OOD detection in a worst-case setting. • GOOD provably outperforms OE, the state-of-the-art in OOD detection, in worst case OOD detection and has state-of-the-art performance on EMNIST which is a particularly challenging out-distribution dataset. 60 [28] Bitterwolf, Julian, Alexander Meinke, and Matthias Hein. "Certifiably Adversarially Robust Detection of Out-of-Distribution Data.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 62

Slide 62 text

• They investigate why normalizing flows perform poorly for OOD detection. • They demonstrate that flows learn local pixel correlations and generic image-to-latent-space transformations which are not specific to the target image datasets, focusing on flows based on coupling layers. 61 [29] Kirichenko, Polina, Pavel Izmailov, and Andrew Gordon Wilson. "Why normalizing flows fail to detect out-of-distribution data.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 63

Slide 63 text

• An important application of generative modeling should be the ability to detect out-of-distribution (OOD) samples by setting a threshold on the likelihood. • In this paper, they make the observation that many of these methods fail when applied to generative models based on VAEs • They proposed Likelihood Regret, an efficient OOD score for VAEs. 62 [30] Xiao, Zhisheng, Qing Yan, and Yali Amit. "Likelihood regret: An out-of-distribution detection score for variational auto-encoder.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 64

Slide 64 text

• Previous methods relying on the softmax confidence score suffer from overconfident posterior distributions for OOD data. • They propose a unified framework for OOD detection that uses an energy score. • They show that energy scores better distinguish samples than the softmax scores. • Unlike softmax confidence scores, energy scores are theoretically aligned with the probability density of the inputs and are less susceptible to the overconfidence issue. 63 [31] Liu, Weitang, et al. "Energy-based Out-of-distribution Detection." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Slide 65

Slide 65 text

Conclusion and Future Trends 64 Most real-world machine learning problems contain the dataset shift. • However, most machine learning algorithms rely on the i.i.d. assumption. • i.e. Empirical Risk Minimization and Law of Large numbers Do the good properties that hold under the i.i.d. assumption carry over under dataset shift? • Experimental performance • Statistical properties: consistency, unbiasedness, asymptotic variance, etc. • Convergence guarantee • Explainability

Slide 66

Slide 66 text

References • [1] Uehara, Masatoshi, Masahiro Kato, and Shota Yasui. n.d. “Off-Policy Evaluation and Learning for External Validity under a Covariate Shift.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [2] Tack, Jihoon, et al. "CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [3] Fang, Tongtong, et al. "Rethinking Importance Weighting for Deep Learning under Distribution Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [4] Taori, Rohan, et al. "Measuring robustness to natural distribution shifts in image classification." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [5] Tachet des Combes, Remi, et al. "Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [6] Kulinski, Sean, Saurabh Bagchi, and David I. Inouye. "Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [7] Chen, Yining, et al. "Self-training Avoids Using Spurious Features Under Domain Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [8] Reisizadeh, Amirhossein, et al. "Robust Federated Learning: The Case of Affine Distribution Shifts." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [9] Kang, Guoliang, et al. "Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [10] Venkat, Naveen, et al. "Your Classifier can Secretly Suffice Multi-Source Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [11] Zhang, Kun, et al. "Domain adaptation as a problem of inference on graphical models." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. 65

Slide 67

Slide 67 text

• [12] Cui, Shuhao, et al. "Heuristic Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [13] Park, Kwanyong, et al. "Discover, Hallucinate, and Adapt: Open Compound Domain Adaptation for Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [14] Ge, Yixiao, et al. "Self-paced contrastive learning with hybrid memory for domain adaptive object re-id." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [15] Balaji, Yogesh, Rama Chellappa, and Soheil Feizi. "Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [16] Saito, Kuniaki, et al. "Universal domain adaptation through self supervision." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [17] Wang, Ximei, et al. "Transferable Calibration with Lower Bias and Variance in Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [18] Combes, Remi Tachet des, et al. "Domain adaptation with conditional distribution matching and generalized label shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [19] Luo, Yawei, et al. "Adversarial style mining for one-shot unsupervised domain adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [20] Purushwalkam, Senthil, and Abhinav Gupta. 2020. “Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [21] Mazaheri, Bijan, Siddharth Jain, and Jehoshua Bruck. 2020. “Robust Correction of Sampling Bias Using Cumulative Distribution Functions.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [22] Flanigan, Bailey, Paul Gölz, Anupam Gupta, and Ariel Procaccia. 2020. “Neutralizing Self-Selection Bias in Sampling for Sortition.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [23] Moreno-Torres, Jose G., et al. "A unifying view on dataset shift in classification." Pattern recognition 45.1 (2012): 521-530. • [24] Shimodaira, Hidetoshi. 2000. “Improving Predictive Inference under Covariate Shift by Weighting the Log-Likelihood Function.” Journal of Statistical Planning and Inference 90 (2): 227–44. 66

Slide 68

Slide 68 text

• [25] Teney, Damien, et al. "On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [26] Jeong, Taewon, and Heeyoung Kim. "OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [27] Nandy, Jay, Wynne Hsu, and Mong Li Lee. "Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [28] Bitterwolf, Julian, Alexander Meinke, and Matthias Hein. "Certifiably Adversarially Robust Detection of Out-of-Distribution Data." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [29] Kirichenko, Polina, Pavel Izmailov, and Andrew Gordon Wilson. "Why normalizing flows fail to detect out-of-distribution data." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [30] Xiao, Zhisheng, Qing Yan, and Yali Amit. "Likelihood regret: An out-of-distribution detection score for variational auto-encoder." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [31] Liu, Weitang, et al. "Energy-based Out-of-distribution Detection." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [32] Bulusu, Saikiran, et al. "Anomalous example detection in deep learning: A survey." IEEE Access 8 (2020): 132330-132347. • [33] Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. “A Simple Framework for Contrastive Learning of Visual Representations.” In Proceedings of the 37th International Conference on Machine Learning, edited by Hal Daumé Iii and Aarti Singh, 119:1597– 1607. Proceedings of Machine Learning Research. PMLR. 67