Recent Findings on Density-Ratio Approachesin Machine Learning

Slide 1

Slide 1 text

Recent Findings on Density-Ratio Approaches in Machine Learning Workshop on FIMI March. 30th, 2022 Masahiro Kato The University of Tokyo, Imaizumi Lab / CyberAgent, Inc. AILab

Slide 52

Slide 52 text

Reference • Kato, M., and Teshima, T. (2022), “Non-negative Bregman Divergence Minimization for Deep Direct Density Ratio Estimation,” ,” in International Conference on Machine Learning. • Kato, M., Imaizumi, M., McAlinn, K., Yasui, S., and Kakehi, H. (2022), “Learning Causal Relationships from Conditional Moment Restrictions by Importance Weighting,” in International Conference on Learning Representations. • Kato, M., Imaizumi, M., and Minami, K. (2022), “Unified Perspective on Probability Divergence via Maximum Likelihood Density Ratio Estimation: Bridging KL-Divergence and Integral Probability Metrics,” . • Kanamori, T., Hido, S., and Sugiyama, M. (2009), “A least-squares approach to direct importance estimation.” Journal of Machine Learning Research, 10(Jul.):1391–1445. • Kiryo, R., Niu, G., du Plessis, M. C., and Sugiyama, M. (2017), “Positive-Unlabeled Learning with Non-Negative Risk Estimator,” in Conference on Neural Information Processing Systems. • Imbens, G. W. and Lancaster, T. (1996), “Efficient estimation and stratified sampling,” Journal of Econometrics, 74, 289–318. • Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J.(2018), “Double/debiased machine learning for treatment and structural parameters,” Econometrics Journal, 21, C1–C68. • Good, I. J. and Gaskins, R. A. (1971), “Nonparametric Roughness Penalties for Probability Densities,” Biometrika, 58, 255–277. • Sugiyama, M., Nakajima, S., Kashima, H., von Bünau, P., and Kawanabe, M. (2007). Direct importance estimation with model selection and its application to covariate shift adaptation. In Proceedings of the 20th International Conference on Neural Information Processing Systems (NIPS'07). Curran Associates Inc., Red Hook, NY, USA, 1433–1440. • Sugiyama, M., Suzuki, T., and Kanamori, T. (2011), “Density Ratio Matching under the Bregman Divergence: A Unified Framework of Density Ratio Estimation,” Annals of the Institute of Statistical Mathematics, 64.— (2012), Density Ratio Estimation in Machine Learning, New York, NY, USA: Cambridge University Press, 1st ed. • Sugiyama, M., (2016), “Introduction to Statistical Machine Learning.” • Silverman, B. W. (1982), “On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method,” The Annals of Statistics, 10, 795 – 810. 2 • Suzuki, T., Sugiyama, M., Sese, Jun., and Kanamori, T. (2008). Approximating mutual information by maximum likelihood density ratio estimation. In Proceedings of the Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008,volume 4 of Proceedings of Machine Learning Research, pp. 5–20. PMLR. • Uehara, M., Sato, I., Suzuki, M., Nakayama, K., and Matsuo, Y. (2016), “Generative Adversarial Nets from a Density Ratio Estimation Perspective.” • Tran, D., Ranganath, R., and Blei, D. M. (2017), “Hierarchical Implicit Models and Likelihood-Free Variational Inference,” in International Conference on Neural Information, Red Hook, NY, USA, p. 5529– 5539. • Nguyen, X., Wainwright, M. J., and Jordan, M. (2008), “Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization,” in Conference on Neural Information Processing Systems, vol. 20. • Whitney K. Newey and James L. Powell. Instrumental variable estimation of nonparametric models. Econometrica, 71(5):1565–1578, 2003. • Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., and Kanamori, T. (2011), “Statistical outlier detection using direct density ratio estimation,” Knowledge and Information Systems, 26, 309–336 • Lai, T. and Robbins, H. (1985), “Asymptotically efficient adaptive allocation rules,” Advances in Applied Mathematics • Kaufmann, E., Cappe, O., and Garivier, A. (2016), “On the Complexity of Best-Arm Identification in Multi-Armed ´ Bandit Models,” Journal of Machine Learning Research, 17, 1–42 • Fan, X., Grama, I., and Liu, Q. (2013), “Cramer large deviation expansions for martingales under Bernstein’s condi- ´ tion,” Stochastic Processes and their Applications, 123, 3919–3942. • Fan, X., Grama, I., and Liu, Q. (2014), “A generalization of Cramer large deviations for martingales,” ´ Comptes Rendus Mathematique, 352, 853– 858. • Shimodaira, H. (2000), “Improving predictive inference under covariate shift by weighting the log-likelihood function,” Journal of statistical planning and inference, 90, 227–244. 52

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text