NeurIPS2020 papers on Dataset Shift and Machine Learning

NeurIPS2020 papers on Dataset Shift and Machine Learning Masanari Kimura
(@machinery81)

TL;DR • Briefly describe the problem of dataset shift. •
Categorize NeurIPS2020 papers that deal with dataset shift. • Introduce the papers accepted for NeurIPS2020 that focus on dataset shift. • https://papers.nips.cc/paper/2020 1

Taxonomy of NeurIPS2020 papers about Dataset Shift 2

3 Taxonomy of NeurIPS2020 papers about Dataset Shift Covariate Shift,
Target Shift, Affine Distribution Shift • [1] Uehara, Masatoshi, Masahiro Kato, and Shota Yasui. n.d. “Off-Policy Evaluation and Learning for External Validity under a Covariate Shift.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [3] Fang, Tongtong, et al. "Rethinking Importance Weighting for Deep Learning under Distribution Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [8] Reisizadeh, Amirhossein, et al. "Robust Federated Learning: The Case of Affine Distribution Shifts." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. Meta-Analysis of Distribution Shift • [4] Taori, Rohan, et al. "Measuring robustness to natural distribution shifts in image classification." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [6] Kulinski, Sean, Saurabh Bagchi, and David I. Inouye. "Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

4 Domain Adaptation • [5] Tachet des Combes, Remi, et
al. "Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [7] Chen, Yining, et al. "Self-training Avoids Using Spurious Features Under Domain Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [9] Kang, Guoliang, et al. "Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [10] Venkat, Naveen, et al. "Your Classifier can Secretly Suffice Multi-Source Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [11] Zhang, Kun, et al. "Domain adaptation as a problem of inference on graphical models." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [12] Cui, Shuhao, et al. "Heuristic Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [13] Park, Kwanyong, et al. "Discover, Hallucinate, and Adapt: Open Compound Domain Adaptation for Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [14] Ge, Yixiao, et al. "Self-paced contrastive learning with hybrid memory for domain adaptive object re-id." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [15] Balaji, Yogesh, Rama Chellappa, and Soheil Feizi. "Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [16] Saito, Kuniaki, et al. "Universal domain adaptation through self supervision." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [17] Wang, Ximei, et al. "Transferable Calibration with Lower Bias and Variance in Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [18] Combes, Remi Tachet des, et al. "Domain adaptation with conditional distribution matching and generalized label shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [19] Luo, Yawei, et al. "Adversarial style mining for one-shot unsupervised domain adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

5 Sampling Bias, Selection Bias • [20] Purushwalkam, Senthil, and
Abhinav Gupta. 2020. “Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [21] Mazaheri, Bijan, Siddharth Jain, and Jehoshua Bruck. 2020. “Robust Correction of Sampling Bias Using Cumulative Distribution Functions.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [22] Flanigan, Bailey, Paul Gölz, Anupam Gupta, and Ariel Procaccia. 2020. “Neutralizing Self-Selection Bias in Sampling for Sortition.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. Out-Of-Distribution Detection • [2] Tack, Jihoon, et al. "CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [25] Teney, Damien, et al. "On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [26] Jeong, Taewon, and Heeyoung Kim. "OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [27] Nandy, Jay, Wynne Hsu, and Mong Li Lee. "Towards Maximizing the Representation Gap between In-Domain & Out-of- Distribution Examples." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [28] Bitterwolf, Julian, Alexander Meinke, and Matthias Hein. "Certifiably Adversarially Robust Detection of Out-of-Distribution Data." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [29] Kirichenko, Polina, Pavel Izmailov, and Andrew Gordon Wilson. "Why normalizing flows fail to detect out-of-distribution data." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [30] Xiao, Zhisheng, Qing Yan, and Yali Amit. "Likelihood regret: An out-of-distribution detection score for variational auto-encoder." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [31] Liu, Weitang, et al. "Energy-based Out-of-distribution Detection." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Dataset Shift and Machine Learning 6

Dataset Shift in Machine Learning 7 Characterization of dataset shift
problems based on (Moreno-Torres et al., 2012) [23] • Covariate Shift • Target Shift • Concept Shift • Sample Selection Bias • Domain Shift • Source Component Shift [23] Moreno-Torres, Jose G., et al. "A unifying view on dataset shift in classification." Pattern recognition 45.1 (2012): 521-530.

Covariate Shift, Target Shift and Concept Shift 8 Definition. (Covariate
Shift Assumption) 𝑝!" 𝑥 ≠ 𝑝!# 𝑥 𝑝!" 𝑦 𝑥 = 𝑝!#(𝑦|𝑥) Definition. (Target Shift Assumption) 𝑝!" 𝑦 ≠ 𝑝!# 𝑦 𝑝!" 𝑥 𝑦 = 𝑝!#(𝑥|𝑦) Definition. (Concept Shift Assumption) 𝑝!" 𝑦|𝑥 ≠ 𝑝!# 𝑦|𝑥 𝑝!" 𝑥 𝑦 ≠ 𝑝!#(𝑥|𝑦)

Covariate Shift: Example; 𝑓: prefecture↦income 9 Train Data Test Data
𝑓(𝑥) Train Test Definition. (Covariate Shift Assumption) 𝑝!" 𝑥 ≠ 𝑝!# 𝑥 𝑝!" 𝑦 𝑥 = 𝑝!# (𝑦|𝑥) Photo by https://doda.jp/guide/heikin/area/ [24] Shimodaira, Hidetoshi. 2000. “Improving Predictive Inference under Covariate Shift by Weighting the Log-Likelihood Function.” Journal of Statistical Planning and Inference 90 (2): 227–44.

Target Shift Example; 𝑓: prefecture↦income 10 Train Data Test Data
𝑓(𝑥) Train Test Definition. (Target Shift Assumption) 𝑝!" 𝑦 ≠ 𝑝!# 𝑦 𝑝!" 𝑥 𝑦 = 𝑝!# (𝑥|𝑦) Photo by https://doda.jp/guide/heikin/area/

Concept Shift Example ; 𝑓: prefecture↦income 11 Train Data Test
Data 𝑓(𝑥) Train Test Definition. (Concept Shift Assumption) 𝑝!" 𝑦|𝑥 ≠ 𝑝!# 𝑦|𝑥 𝑝!" 𝑥 𝑦 ≠ 𝑝!# (𝑥|𝑦) Photo by https://nensyu-labo.com/heikin_suii.htm 1997 2014

Sample Selection Bias 12 Definition. (Sample Selection Bias Assumption) Let
𝜉 be the selection function that tends to include or exclude observations. 𝑝!" 𝜉|𝑥, 𝑦 ≠ 𝑝!# 𝜉|𝑥, 𝑦

Domain Shift 13 Definition. (Domain Shift Assumption) A situation characterized
by the change in the measurement system or in the method of description. Photo by https://dailyportalz.jp/kiji/130606160826

Source Component Shift 14 Definition. (Source Component Shift) An adaptation
scenario where the observed data are assumed to be composed of a certain number of different components with the proportions of components that vary between the train and test data.

Covariate Shift / Target Shift 15

• Importance weighting (IW) is the technique which handles distribution
shift. However, IW cannot work well on comlex data. • In this paper, the authors rethink IW and theoretically show it suffers from a circular dependency. • They proposed the dynamic IW that make it enable us to enjoy IW with deep neural networks. 16 [3] Fang, Tongtong, et al. "Rethinking Importance Weighting for Deep Learning under Distribution Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Importance Weighting for Distribution Shift Importance weighting is a most
popular way to tackle the distribution shift: 𝔼$!"(&,() 𝑓 𝑥, 𝑦 = 𝔼$!#(&,() 𝑤∗ 𝑥, 𝑦 𝑓 𝑥, 𝑦 , where 𝑤∗ 𝑥, 𝑦 = 𝑝!# (𝑥, 𝑦) 𝑝!" (𝑥, 𝑦) . This means that the weighted expectation of 𝑓 over 𝑝!" 𝑥, 𝑦 becomes unbiased. Example. (Covariate Shift Adaptation) 𝔼#+,(%) 𝑓(𝑥) = * 𝑓 𝑥 𝑝!' 𝑥 𝑑𝑥 = * 𝑓 𝑥 𝑝!' 𝑥 𝑝!" 𝑥 𝑝!" 𝑥 𝑑𝑥 = 𝔼#+-(%) #+, % #+- % 𝑓(𝑥) 17

Components of Importance Weighting Importance Weighting can handle distribution shift
in two steps: 1. Weight Estimation from 𝑝!#(𝑥, 𝑦) and 𝑝!#(𝑥); 2. Weighted Classification via 𝔼$!"(&,() 𝑓 𝑥, 𝑦 = 𝔼$!#(&,() 𝑤∗ 𝑥, 𝑦 𝑓 𝑥, 𝑦 . Weight Estimation should have expressive power if the form of data is complex. → Consider to boost the expressive power by an external feature extractor. 18

Circular Dependency 19 Chicken-or-Egg causality dilemma • We need 𝑤∗
to train 𝑓. • We need a trained 𝑓 to estimate 𝑤∗. Photo by (Fang et al., 2020)

Static/Dynamic Importance Weighting 20 Photo by (Fang et al., 2020)

Feasibility of non-linear transformation Theorem 1. For a fixed, deterministic
and invertible transformation 𝜋 𝑥, 𝑦 ↦ 𝑧, let 𝑝!" (𝑧) and 𝑝!' (𝑧) be the p.d.f. induced by 𝑝!" (𝑥, 𝑦) and 𝑝!' (𝑥, 𝑦), and 𝜋. Then, 𝑤∗ 𝑥, 𝑦 = 𝑝!' (𝑥, 𝑦) 𝑝!" (𝑥, 𝑦) = 𝑝!' (𝑧) 𝑝!" (𝑧) = 𝑤∗ 𝑧 . 21 Photo by (Fang et al., 2020)

Algorithm 22 Photo by (Fang et al., 2020)

Experimental Results 23 Photo by (Fang et al., 2020) Photo
by (Fang et al., 2020)

• Derive an efficiency bound of OPE under the covariate
shift. • propose estimators constructed by the estimators of the density ratio, behavior policy, and conditional expected reward. 24 [3] Fang, Tongtong, et al. "Rethinking Importance Weighting for Deep Learning under Distribution Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• They propose a new federated learning scheme called FLRA,
a Federated Learning framework with Robustness to Affine distribution shifts. • FLRA has a small communication overhead and a low computation complexity. • They use the PAC-Bayes framework to prove a generalization error bound for FLRA’s learnt classifier 25 [8] Reisizadeh, Amirhossein, et al. "Robust Federated Learning: The Case of Affine Distribution Shifts." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Meta-Analysis of Distribution Shift 26

• They study how robust current ImageNet models are to
distribution shifts arising from natural variations in datasets. • Most research on robustness focuses on synthetic image perturbations, which leaves open how robustness on synthetic distribution shift relates to distribution shift arising in real data. • Informed by an evaluation of 204 ImageNet models in 213 different test conditions, they find that there is often little to no transfer of robustness from current synthetic to natural distribution shift. 27 [4] Taori, Rohan, et al. "Measuring robustness to natural distribution shifts in image classification." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• They motivate and define the problem of feature shift
detection for localizing which specific sensor values have been manipulated. • They define conditional distribution hypothesis tests and use this formalization as the key theoretical tool to approach this problem. • They propose a score-based test statistic inspired by Fisher divergence but adapted for a novel context as a distribution divergence measure. 28 [6] Kulinski, Sean, Saurabh Bagchi, and David I. Inouye. "Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Domain Adaptation 29

30 [18] Combes, Remi Tachet des, et al. "Domain adaptation
with conditional distribution matching and generalized label shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • Recent work has shown limitations of adversarial learning-based approach when label distributions differ between the source and target domains. • In this paper, they propose a new assumption, generalized label shift (GLS), to improve robustness against mismatched label distributions. • Under GLS, they provide theoretical guarantees on the transfer performance of any classifier.

• In unsupervised domain adaptation, existing theory focuses on situations
where the source and target domains are close. • In practice, conditional entropy minimization and pseudo-labeling work even when the domain shifts are much larger than those analyzed by existing theory. • They identify and analyze one particular setting where the domain shift can be large, but these algorithms provably work. 31 [7] Chen, Yining, et al. "Self-training Avoids Using Spurious Features Under Domain Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• Previous domain discrepancy minimization methods are mainly based on
the adversarial training, which tends to ignore the pixel-wise relationships and less discriminative. • In this paper, they propose to build the pixel-level cycle association between source and target pixel pairs. • https://github.com/kgl-prml/PixelLevel-Cycle-Association 32 [9] Kang, Guoliang, et al. "Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

33 [10] Venkat, Naveen, et al. "Your Classifier can Secretly
Suffice Multi-Source Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • Existing methods aim to minimize this domain-shift using auxiliary distribution alignment objectives for Multi- Source Domain Adaptation. • In this work, they present a different perspective to MSDA wherein deep models are observed to implicitly align the domains under label supervision. • Thus, they aim to utilize implicit alignment without additional training objectives to perform adaptation

• They propose to use a graphical model as a
compact way to encode the change property of the joint distribution. • Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains. • This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes. 34 [11] Zhang, Kun, et al. "Domain adaptation as a problem of inference on graphical models." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• In visual domain adaptation (DA), separating the domain-specific characteristics
from the domain-invariant representations is an ill-posed problem. • In this paper, they address the modeling of domain- invariant and domain-specific information from the heuristic search perspective. • With the guidance of heuristic representations, they formulate a principled framework of Heuristic Domain Adaptation (HDA) with well-founded theoretical guarantees. • https://github.com/cuishuhao/HDA 35 [12] Cui, Shuhao, et al. "Heuristic Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• They study the problem of open compound domain adaptation
(OCDA). • In this setting, the target is a union of multiple homogeneous domains without domain labels. The unseen target data also needs to be considered at the test time, reflecting the realistic data collection from both mixed and novel situations. • To this end, they propose a new OCDA framework for semantic segmentation that incorporates three key functionalities: discover, hallucinate, and adapt. 36 [13] Park, Kwanyong, et al. "Discover, Hallucinate, and Adapt: Open Compound Domain Adaptation for Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

37 [14] Ge, Yixiao, et al. "Self-paced contrastive learning with
hybrid memory for domain adaptive object re-id." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • They propose a unified contrastive learning framework to incorporate all available information from both source and target domains for joint feature learning. • They design a self-paced contrastive learning strategy with a novel clustering reliability criterion to prevent training error amplification caused by noisy pseudo-class labels.

• They derive a computationally-efficient dual form of the robust
Optimal Transport optimization that is amenable to modern deep learning applications. • They demonstrate the effectiveness of our formulation in two applications of GANs and domain adaptation. 38 [15] Balaji, Yogesh, Rama Chellappa, and Soheil Feizi. "Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• They propose DANCE, a universal domain adaptation framework that
can be applied out-of- the-box without prior knowledge of specific category shift. • They design two novel loss functions, neighborhood clustering and entropy separation, for category shift-agnostic adaptation 39 [16] Saito, Kuniaki, et al. "Universal domain adaptation through self supervision." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• They uncover a dilemma in the open problem of
Calibration in DA. • They propose a Transferable Calibration (TransCal) method, achieving more accurate calibration with lower bias and variance in a unified hyperparameter- free optimization framework 40 [17] Wang, Ximei, et al. "Transferable Calibration with Lower Bias and Variance in Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• They present an adversarial style mining (ASM) method to
solve One-Shot Unsupervised Domain Adaptation (OSUDA) problems. ASM combines a style transfer module and a task-specific model into an adversarial manner, making them mutually benefit to each other during the learning process. 41 [19] Luo, Yawei, et al. "Adversarial style mining for one-shot unsupervised domain adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Sampling Bias / Selection Bias 42

• They present quantitative experiments to demystify the performance gains
of self-supervised representation learning. • They hypothesize that this could be due to dataset biases: the pre-training and downstream datasets are biased in an advantageous manner. 43 [20] Purushwalkam, Senthil, and Abhinav Gupta. 2020. “Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• Current approaches for alleviating the covariate shift rely on
estimating the ratio of training and target probability density functions. • These techniques require parameter tuning and can be unstable across different datasets. • They propose a CDF-based framework for handling the covariate shift. 44 [21] Mazaheri, Bijan, Siddharth Jain, and Jehoshua Bruck. 2020. “Robust Correction of Sampling Bias Using Cumulative Distribution Functions.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• They proposed sampling algorithm that, even in the presence
of limited participation and self-selection bias, retains the individual fairness properties, while also allowing the deterministic satisfaction of quotas. • Proposed algorithm satisfies: • End-to-End Fairness • Deterministic Quota Satisfaction • Computational Efficiency 45 [22] Flanigan, Bailey, Paul Gölz, Anupam Gupta, and Ariel Procaccia. 2020. “Neutralizing Self-Selection Bias in Sampling for Sortition.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Out-Of-Distribution Detection 46

• The authors propose a contrasive learning-based out-of-distribution detection •
Proposed method contrasts the sample with distributionally-shifted augmentation • The propose a new detection score 47 [2] Tack, Jihoon, et al. "CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Out-Of-Distribution Detection 48 [32] Bulusu, Saikiran, et al. "Anomalous example
detection in deep learning: A survey." IEEE Access 8 (2020): 132330-132347. Photo by (Saikiran et al., 2020) Out-Of-Distribution detection • is a task of identifying whether test input is drawn far from training distribution or not; • aims to detect OOD samples using only training data.

Contrasive Learning The idea of contrasive learning is to learn
an encoder 𝑓) to extract the information to distinguish similar samples from the others: ℒ*+, 𝑥, 𝑥- , 𝑥. = − 1 𝑥- log ∑ %.∈ %/ exp 𝑠𝑖𝑚 𝑧 𝑥 , 𝑧 𝑥0 𝜏 ∑ %.∈ %/ ∪ %0 exp 𝑠𝑖𝑚 𝑧 𝑥 , 𝑧 𝑥0 𝜏 , where, 𝑧 𝑥 = 𝑓) 𝑥 or 𝑧 𝑥 = 𝑔2 𝑓) (𝑥) . 49

SimCLR (Ting et al., 2020) ℒ!"#$%& ℬ; 𝒯 = 1
2ℬ ( "'( ℬ ℒ*+, ) 𝑥" ( , ) 𝑥" - , , ℬ." + ℒ*+, ) 𝑥" - , ) 𝑥" ( , , ℬ." , where , ℬ = ) 𝑥" ( "'( / ∪ ) 𝑥" - "'( / and , ℬ." = ) 𝑥0 ( 01" / ∪ ) 𝑥0 - 01" / . 50 [33] Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. “A Simple Framework for Contrastive Learning of Visual Representations.” In Proceedings of the 37th International Conference on Machine Learning, edited by Hal Daumé Iii and Aarti Singh, 119:1597–1607. Proceedings of Machine Learning Research. PMLR. Photo by (Chen et al., 2020)

Contrasive Learning for Distribution-Shifting Transformations Contrasting Shifted Instances: ℒ*+,.34 =
ℒ356789 @ 3∈𝒮 ℬ3 ; 𝒯 , 𝑤ℎ𝑒𝑟𝑒 ℬ3 = 𝑆 𝑥5 5;< = . Classifying Shifted Instances: 𝐿*>?.34 = 1 2𝐵 1 𝐾 L 3∈𝒮 − log 𝑝*>?.34 𝑦3 = 𝑆|N 𝑥3 The final loss: ℒ734 = ℒ*+,.34 + 𝜆 ⋅ 𝐿*>?.34 51

Experimental Results: Unlabeled Datasets 52

Experimental Results: Labeled Datasets 53

Ablation Study 54

• VQA-CP has become the standard OOD benchmark for visual
question answering, but they discovered three troubling practices in its current use: 1. Most published methods rely on explicit knowledge of the construction of the OOD splits. 2. The OOD test set is used for model selection. 3. A model’s in-domain performance is assessed after retraining it on in-domain splits 55 [25] Teney, Damien, et al. "On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

VQA-CP dataset 56 [25] Teney, Damien, et al. "On the
Value of Out-of-Distribution Testing: An Example of Goodhart's Law.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. Photo by (Damien et al., 2020)

Bad Practice on the VQA-CP dataset 57 [25] Teney, Damien,
et al. "On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. Photo by (Damien et al., 2020) Many existing methods exploit the fact that the training and test distributions are approximately inverse of each other.

• They proposed OOD-MAML, which is a meta- learning method
used for implementing K-shot N- way classification and OOD detection simultaneously. • In OOD-MAML, they introduced two types of meta-parameters: 1. related to the base model as in the case of MAML 2. fakesample parameters, plays the role of generating OOD samples 58 [26] Jeong, Taewon, and Heeyoung Kim. "OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• The existing formulation for DPN models often lead to
indistinguishable representations between in-domain examples with high data uncertainty among multiple classes and OOD examples. • In this work, we have proposed a novel loss function for DPN models that maximizes the representation gap between in-domain and OOD examples. 59 [27] Nandy, Jay, Wynne Hsu, and Mong Li Lee. "Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• They propose GOOD, a novel training method to achieve
guaranteed OOD detection in a worst-case setting. • GOOD provably outperforms OE, the state-of-the-art in OOD detection, in worst case OOD detection and has state-of-the-art performance on EMNIST which is a particularly challenging out-distribution dataset. 60 [28] Bitterwolf, Julian, Alexander Meinke, and Matthias Hein. "Certifiably Adversarially Robust Detection of Out-of-Distribution Data.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• They investigate why normalizing flows perform poorly for OOD
detection. • They demonstrate that flows learn local pixel correlations and generic image-to-latent-space transformations which are not specific to the target image datasets, focusing on flows based on coupling layers. 61 [29] Kirichenko, Polina, Pavel Izmailov, and Andrew Gordon Wilson. "Why normalizing flows fail to detect out-of-distribution data.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• An important application of generative modeling should be the
ability to detect out-of-distribution (OOD) samples by setting a threshold on the likelihood. • In this paper, they make the observation that many of these methods fail when applied to generative models based on VAEs • They proposed Likelihood Regret, an efficient OOD score for VAEs. 62 [30] Xiao, Zhisheng, Qing Yan, and Yali Amit. "Likelihood regret: An out-of-distribution detection score for variational auto-encoder.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

• Previous methods relying on the softmax confidence score suffer
from overconfident posterior distributions for OOD data. • They propose a unified framework for OOD detection that uses an energy score. • They show that energy scores better distinguish samples than the softmax scores. • Unlike softmax confidence scores, energy scores are theoretically aligned with the probability density of the inputs and are less susceptible to the overconfidence issue. 63 [31] Liu, Weitang, et al. "Energy-based Out-of-distribution Detection." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020.

Conclusion and Future Trends 64 Most real-world machine learning problems
contain the dataset shift. • However, most machine learning algorithms rely on the i.i.d. assumption. • i.e. Empirical Risk Minimization and Law of Large numbers Do the good properties that hold under the i.i.d. assumption carry over under dataset shift? • Experimental performance • Statistical properties: consistency, unbiasedness, asymptotic variance, etc. • Convergence guarantee • Explainability

References • [1] Uehara, Masatoshi, Masahiro Kato, and Shota Yasui.
n.d. “Off-Policy Evaluation and Learning for External Validity under a Covariate Shift.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [2] Tack, Jihoon, et al. "CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [3] Fang, Tongtong, et al. "Rethinking Importance Weighting for Deep Learning under Distribution Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [4] Taori, Rohan, et al. "Measuring robustness to natural distribution shifts in image classification." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [5] Tachet des Combes, Remi, et al. "Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [6] Kulinski, Sean, Saurabh Bagchi, and David I. Inouye. "Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [7] Chen, Yining, et al. "Self-training Avoids Using Spurious Features Under Domain Shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [8] Reisizadeh, Amirhossein, et al. "Robust Federated Learning: The Case of Affine Distribution Shifts." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [9] Kang, Guoliang, et al. "Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [10] Venkat, Naveen, et al. "Your Classifier can Secretly Suffice Multi-Source Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [11] Zhang, Kun, et al. "Domain adaptation as a problem of inference on graphical models." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. 65

• [12] Cui, Shuhao, et al. "Heuristic Domain Adaptation." 34th
Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [13] Park, Kwanyong, et al. "Discover, Hallucinate, and Adapt: Open Compound Domain Adaptation for Semantic Segmentation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [14] Ge, Yixiao, et al. "Self-paced contrastive learning with hybrid memory for domain adaptive object re-id." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [15] Balaji, Yogesh, Rama Chellappa, and Soheil Feizi. "Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [16] Saito, Kuniaki, et al. "Universal domain adaptation through self supervision." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [17] Wang, Ximei, et al. "Transferable Calibration with Lower Bias and Variance in Domain Adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [18] Combes, Remi Tachet des, et al. "Domain adaptation with conditional distribution matching and generalized label shift." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [19] Luo, Yawei, et al. "Adversarial style mining for one-shot unsupervised domain adaptation." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [20] Purushwalkam, Senthil, and Abhinav Gupta. 2020. “Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [21] Mazaheri, Bijan, Siddharth Jain, and Jehoshua Bruck. 2020. “Robust Correction of Sampling Bias Using Cumulative Distribution Functions.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [22] Flanigan, Bailey, Paul Gölz, Anupam Gupta, and Ariel Procaccia. 2020. “Neutralizing Self-Selection Bias in Sampling for Sortition.” 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [23] Moreno-Torres, Jose G., et al. "A unifying view on dataset shift in classification." Pattern recognition 45.1 (2012): 521-530. • [24] Shimodaira, Hidetoshi. 2000. “Improving Predictive Inference under Covariate Shift by Weighting the Log-Likelihood Function.” Journal of Statistical Planning and Inference 90 (2): 227–44. 66

• [25] Teney, Damien, et al. "On the Value of
Out-of-Distribution Testing: An Example of Goodhart's Law." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [26] Jeong, Taewon, and Heeyoung Kim. "OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [27] Nandy, Jay, Wynne Hsu, and Mong Li Lee. "Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [28] Bitterwolf, Julian, Alexander Meinke, and Matthias Hein. "Certifiably Adversarially Robust Detection of Out-of-Distribution Data." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [29] Kirichenko, Polina, Pavel Izmailov, and Andrew Gordon Wilson. "Why normalizing flows fail to detect out-of-distribution data." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [30] Xiao, Zhisheng, Qing Yan, and Yali Amit. "Likelihood regret: An out-of-distribution detection score for variational auto-encoder." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [31] Liu, Weitang, et al. "Energy-based Out-of-distribution Detection." 34th Conference on Neural Information Processing Systems (NeurIPS) 2020. Neural Information Processing Systems, 2020. • [32] Bulusu, Saikiran, et al. "Anomalous example detection in deep learning: A survey." IEEE Access 8 (2020): 132330-132347. • [33] Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. “A Simple Framework for Contrastive Learning of Visual Representations.” In Proceedings of the 37th International Conference on Machine Learning, edited by Hal Daumé Iii and Aarti Singh, 119:1597– 1607. Proceedings of Machine Learning Research. PMLR. 67

NeurIPS2020 papers on Dataset Shift and Machine...

NeurIPS2020 papers on Dataset Shift and Machine Learning

More Decks by Masanari Kimura

Other Decks in Research

Featured

Transcript

NeurIPS2020 papers onDataset Shift and Machine...