and D. Vetrov, “Greedy policy search: A simple baseline for learnable test-time augmentation,” arXiv preprint arXiv:2002.09103, vol. 2, no. 7, 2020. [34] T. Pearce, A. Brintrup, M. Zaki, and A. Neely, “High-quality prediction intervals for deep learning: A distribution-free, ensembled approach,” in International Conference on Machine Learning. PMLR, 2018, pp. [35] D. Su, Y. Y. Ting, and J. Ansel, “Tight prediction intervals using expanded interval minimization,” arXiv preprint arXiv:1806.11222, 2018. [36] A. G. Roy, S. Conjeti, N. Navab, C. Wachinger, A. D. N. Initiative et al., “Bayesian quicknat: Model uncertainty in deep whole-brain segmentation for structure-wise quality control,” NeuroImage, vol. 195, pp. 11–22, 2019. [37] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 1321–1330. [38] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in International Conference on Learning Representations, 2018. [39] S. Thulasidasan, G. Chennupati, J. A. Bilmes, T. Bhattacharya, and S. Michalak, “On mixup training: Improved calibration and predictive uncertainty for deep neural networks,” in Advances in Neural Informa- tion Processing Systems, 2019, pp. 13 888–13 899. [40] K.Patel,W.Beluch,D.Zhang,M.Pfeiffer,andB.Yang,“On-manifold adversarial data augmentation improves uncertainty calibration,” in 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021, pp. 8029–8036.