Reference

• Kato, M., and Teshima, T. (2022), “Non-negative Bregman Divergence Minimization for Deep Direct Density Ratio Estimation,” ,” in International Conference on Machine Learning.

• Kato, M., Imaizumi, M., McAlinn, K., Yasui, S., and Kakehi, H. (2022), “Learning Causal Relationships from Conditional Moment Restrictions by Importance Weighting,” in International Conference on

Learning Representations.

• Kato, M., Imaizumi, M., and Minami, K. (2022), “Unified Perspective on Probability Divergence via Maximum Likelihood Density Ratio Estimation: Bridging KL-Divergence and Integral Probability Metrics,” .

• Kanamori, T., Hido, S., and Sugiyama, M. (2009), “A least-squares approach to direct importance estimation.” Journal of Machine Learning Research, 10(Jul.):1391–1445.

• Kiryo, R., Niu, G., du Plessis, M. C., and Sugiyama, M. (2017), “Positive-Unlabeled Learning with Non-Negative Risk Estimator,” in Conference on Neural Information Processing Systems.

• Imbens, G. W. and Lancaster, T. (1996), “Efficient estimation and stratified sampling,” Journal of Econometrics, 74, 289–318.

• Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J.(2018), “Double/debiased machine learning for treatment and structural parameters,” Econometrics Journal,

21, C1–C68.

• Good, I. J. and Gaskins, R. A. (1971), “Nonparametric Roughness Penalties for Probability Densities,” Biometrika, 58, 255–277.

• Sugiyama, M., Nakajima, S., Kashima, H., von Bünau, P., and Kawanabe, M. (2007). Direct importance estimation with model selection and its application to covariate shift adaptation. In Proceedings of the

20th International Conference on Neural Information Processing Systems (NIPS'07). Curran Associates Inc., Red Hook, NY, USA, 1433–1440.

• Sugiyama, M., Suzuki, T., and Kanamori, T. (2011), “Density Ratio Matching under the Bregman Divergence: A Unified Framework of Density Ratio Estimation,” Annals of the Institute of Statistical

Mathematics, 64.— (2012), Density Ratio Estimation in Machine Learning, New York, NY, USA: Cambridge University Press, 1st ed.

• Sugiyama, M., (2016), “Introduction to Statistical Machine Learning.”

• Silverman, B. W. (1982), “On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method,” The Annals of Statistics, 10, 795 – 810. 2

• Suzuki, T., Sugiyama, M., Sese, Jun., and Kanamori, T. (2008). Approximating mutual information by maximum likelihood density ratio estimation. In Proceedings of the Workshop on New Challenges for

Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008,volume 4 of Proceedings of Machine Learning Research, pp. 5–20. PMLR.

• Uehara, M., Sato, I., Suzuki, M., Nakayama, K., and Matsuo, Y. (2016), “Generative Adversarial Nets from a Density Ratio Estimation Perspective.”

• Tran, D., Ranganath, R., and Blei, D. M. (2017), “Hierarchical Implicit Models and Likelihood-Free Variational Inference,” in International Conference on Neural Information, Red Hook, NY, USA, p. 5529–

5539.

• Nguyen, X., Wainwright, M. J., and Jordan, M. (2008), “Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization,” in Conference on Neural Information Processing

Systems, vol. 20.

• Whitney K. Newey and James L. Powell. Instrumental variable estimation of nonparametric models. Econometrica, 71(5):1565–1578, 2003.

• Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., and Kanamori, T. (2011), “Statistical outlier detection using direct density ratio estimation,” Knowledge and Information Systems, 26, 309–336

• Lai, T. and Robbins, H. (1985), “Asymptotically efficient adaptive allocation rules,” Advances in Applied Mathematics

• Kaufmann, E., Cappe, O., and Garivier, A. (2016), “On the Complexity of Best-Arm Identification in Multi-Armed ´ Bandit Models,” Journal of Machine Learning Research, 17, 1–42

• Fan, X., Grama, I., and Liu, Q. (2013), “Cramer large deviation expansions for martingales under Bernstein’s condi- ´ tion,” Stochastic Processes and their Applications, 123, 3919–3942.

• Fan, X., Grama, I., and Liu, Q. (2014), “A generalization of Cramer large deviations for martingales,” ´ Comptes Rendus Mathematique, 352, 853– 858.

• Shimodaira, H. (2000), “Improving predictive inference under covariate shift by weighting the log-likelihood function,” Journal of statistical planning and inference, 90, 227–244.

52