Estimate intervention effects – Need causal graph to select variables to be adjusted, e.g., using backdoor criterion (Pearl, 1995) • Also useful for machine learning – E.g., domain adaptation (Zhang et al., 2020), fairness (Kuzner et al., 2017), and interpretability (Blobaum & Shimizu, 2017) 3 Messerli (2012) Chocolate Nobel laureates GDP Number of Nobel laureates Chocolate consumption
Use background knowledge • Often need to use both background knowledge AND DATA • Causal discovery: Infer the causal graph from data 4 ? or or Chocolate Nobel laureates GDP Chocolate Nobel GDP Chocolate Nobel GDP Chocolate Nobel GDP
non-parametric approach uses conditional independence (Pearl 2001; Spirtes 1993) – Make no assumptions about function forms or distribution – The limit is finding the Markov equivalent models • Additional assumptions needed to go beyond the limit – Restrictions on functional forms and distributions – Uniquely Identifiable or Smaller numbers of Equivalent models • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014). – Non-Gaussian assumption to exploit independence – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020) 6
non-parametric approach uses conditional independence (Pearl 2001; Spirtes 1993) – Make no assumptions about function forms or distribution – The limit is finding the Markov equivalent models • Additional assumptions needed to go beyond the limit – Restrictions on functional forms and distributions – Uniquely identifiable or smaller numbers of equivalent models • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014). – Non-Gaussian assumption to exploit independence – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020) 7
non-parametric approach uses conditional independence (Pearl 2001; Spirtes 1993) – Make no assumptions about function forms or distribution – The limit is finding the Markov equivalent models • Additional assumptions needed to go beyond the limit – Restrictions on functional forms and distributions – Uniquely identifiable or smaller numbers of equivalent models • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014). – Non-Gaussian assumption to exploit independence – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020) 8
and find a causal graph(s) that is consistent with the data – Typical example 1: • Directed acyclic graph (DAG) • No hidden common cause (all observed) – Typical example 2: • DAG • Hidden common causes may exist 10 x3 x1 e3 e1 x2 e2 Error variable 𝑥! = 𝑓! (parents of 𝑥! , 𝑒! )
graph – Directed acyclic graph – No hidden common causes (all have been observed) 2. Find the graph that best matches the data among such causal graphs that satisfy the assumptions. 12 If x and y are independent in the data, select (c) on the right. If x and y are dependent in the data, select (a) and (b). (a) and (b) are indistinguishable (not uniquely identifiable): Markov equivalence class Three candidates x y x y x y (a) (b) (c)
graph – Directed acyclic graph – No hidden common causes (all have been observed) 2. Find the graph that best matches the data among such causal graphs that satisfy the assumptions. 13 If x and y are independent in the data, select (c) on the right. If x and y are dependent in the data, select (a) and (b). (a) and (b) are indistinguishable (not uniquely identifiable): Markov equivalence class Three candidates x y x y x y (a) (b) (c)
et al., 1995) • Those for time series cases (Malinsky & Spirtes, 2018) • Equivalence class including cyclic graphs (Richardson, 1996) • Lower bound on intervention effects (Maathuis et al., 2009; Malinsky & Spirtes, 2017) 14 x y f w z x y w z x y f1 w z f2 F. Eberhardt CRM Workshop 2016
information available than conditional independence • E.g., linearity + non-Gaussian continuous distribution 16 Results in different distributions of x1 and x2 No difference in terms of their conditional independence x y x y (a) (b)
independence of error variables, e.g., by HSIC (Gretton et al., 2005) – Prediction accuracy using Markov boundary (Biza et al., 2020) – Compare to the results of other datasets in which causal graphs expected to be similar – Check against background knowledge 22
Peters+14JMLR) • 𝑥% = 𝑓%(par(𝑥%)) + 𝑒% • 𝑥% = 𝑔% &"(𝑓%(par(𝑥%)) + 𝑒%) • Discrete variables – Poisson DAG model and its extensions (Park+18JMLR) • Mixed types of variables: LiNGAM + logistic-type model – Identifiability condition for two variables (Wenjuan+18IJCAI) – Probably ok also for multivariate cases using the idea of Thm.28 of Peters et al. (2014) 25
Peters+14JMLR) • 𝑥% = 𝑓%(par(𝑥%)) + 𝑒% • 𝑥% = 𝑔% &"(𝑓%(par(𝑥%)) + 𝑒%) • Discrete variables – Poisson DAG model and its extensions (Park+18JMLR) • Mixed types of variables: LiNGAM + logistic-type model – Identifiability condition for two variables (Wenjuan+18IJCAI) – Probably ok also for multivariate cases using the idea of Thm.28 of Peters et al. (2014) 26
common causes • For unconfounded pairs with no hidden common causes, estimate the causal directions • For confounded pairs with hidden common causes, leave them remain unknown 32 𝑥# 𝑥" 𝑓" 𝑥$ Underlying model Output 𝑥0 𝑥# 𝑥" 𝑥$ 𝑥0 𝑓#
causes leads to dependence btw. explanatory variable and its residual (Tashiro et al., 2014) • Key result (Maeda & Shimizu, 2020) – Find a set of variables that that gives independent residual when a variable is regressed on every its subset – If succeeded, variables in such a set (x1 and x2) are the unconfounded ancestors of the variable (x4) • For nonlinear additive models, existence of hidden intermediate variables also leads to dependence (Maeda & Shimizu, 2021) 33 𝑥# 𝑥" 𝑓" !! !" "" !# !$ "! !! 𝑥# 𝑥" 𝑓$
causes leads to dependence btw. explanatory variable and its residual (Tashiro et al., 2014) • Key result (Maeda & Shimizu, 2020) – Find a set of variables that that gives independent residual when a variable is regressed on every its subset – If succeeded, variables in such a set (x1 and x2) are unconfounded ancestors of the variable (x4) • For nonlinear additive models, existence of hidden intermediate variables also leads to dependence (Maeda & Shimizu, 2021) 34 𝑥# 𝑥" 𝑓" !! !" "" !# !$ "! !! 𝑥# 𝑥" 𝑓$
causes leads to dependence btw. explanatory variable and its residual (Tashiro et al., 2014) • Key result (Maeda & Shimizu, 2020) – Find a set of variables that that gives independent residual when a variable is regressed on every its subset – If succeeded, variables in such a set (x1 and x2) are unconfounded ancestors of the variable (x4) • For nonlinear additive models, existence of hidden intermediate variables also leads to dependence (Maeda & Shimizu, 2021) 35 𝑥# 𝑥" 𝑓" !! !" "" !# !$ "! !! 𝑥# 𝑥" 𝑓$
– 2 pure measurement variables per latent needed to identify the measurement model (Silva et al., 2006; Xie et al., 2020) • Estimate the latent factors and then their causal graph 39 𝑥" 𝑥! $ 𝑓" $ 𝑓! 𝑥# 𝑥$ ? 𝒇 = 𝐵𝒇+𝝐 𝒙 = 𝐺𝒇+𝒆
for science – Many well-developed methods available in cases that a causal graph can be drawn with background knowledge – Helping drawing causal graphs with data is the key: Causal discovery • LiNGAM-related papers: https://sites.google.com/view/sshimizu06/lingam/lingampapers • Next default assumptions – Hidden common cause / latent factors – Mixed data: Continuous and discrete – (Cyclicity (Lacerda et al., 2008)) 42
discovery of linear non-Gaussian acyclic models with latent confounders. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (AISTATS2020), 2020 • F. H. Messerli, Chocolate Consumption, Cognitive Function, and Nobel Laureates. New England Journal of Medicine, 2012. • T. Rosenström, M. Jokela, S. Puttonen, M. Hintsanen, L. Pulkki-Råback, J. S. Viikari, O. T. Raitakari and L. Keltikangas-Järvinen. Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PLoS ONE, 7(11): e50841, 2012 • A. Moneta, D. Entner, P. O. Hoyer and A. Coad. Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics, 75(5): 705-730, 2013. • O. Boukrina and W. W. Graves. Neural networks underlying contributions from semantics in reading aloud. Frontiers in Human Neuroscience, 7:518, 2013. • P. Campomanes, M. Neri, B. A.C. Horta, U. F. Roehrig, S. Vanni, I. Tavernelli and U. Rothlisberger. Origin of the spectral shifts among the early intermediates of the rhodopsin photocycle. Journal of the American Chemical Society, 136(10): 3842-3851, 2014. • Peters, Janzing, and Schölkopf. (2018). Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press. • S. Shimizu and P. Blöbaum. Recent advances in semi-parametric methods for causal discovery. In Direction Dependence in Statistical Models: Methods of Analysis (W. Wiedermann, D. Kim, E. Sungur, and A. von Eye, eds.), Chapter. Wiley, 2020. 43
P. Spirtes, C. Glymour, R. Scheines. Causation, Prediction, and Search. Springer, 1993. • S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 2003--2030, 2006 • S. Shimizu. LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika, 41(1): 65--98, 2014 • P. Spirtes, C. Meek, T. S. Richardson. Causal Inference in the Presence of Latent Variables and Selection Bias. In Proc. 11th Conf. on Uncertainty in Artificial Intelligence (UAI1995), 1995. • D. Malinsky and P. Spirtes. Causal Structure Learning from Multivariate Time Series in Settings with Unmeasured Confounding. In Proc. 2018 ACM SIGKDD Workshop on Causal Discovery (KDD-CD), 2018. • T. S. Richardson. A Discovery Algorithm for Directed Cyclic Graphs. In Proc. 12th Conf. on Uncertainty in Artificial Intelligence (UAI1996), 1996. 44
causal effects in high-dimensional and possibly confounded systems. International J. Approximate Reasoning, 2017 • S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. O. Hoyer and K. Bollen. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12(Apr): 1225--1248, 2011. • A. Hyvärinen and S. M. Smith. Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research, 14(Jan): 111--152, 2013. • A. Hyvarinen. New approximations of differential entropy for independent component analysis and projection pursuit, In Advances in Neural Information Processing Systems 12 (NIPS1999), 1999 • P. O. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Schölkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21 (NIPS2008), pp. 689-696, 2009. • K. Zhang and A. Hyvärinen. Distinguishing causes from effects using nonlinear acyclic causal models. In JMLR Workshop and Conference Proceedings, Causality: Objectives and Assessment (Proc. NIPS2008 workshop on causality), 6: 157-164, 2010. • J. Peters, J. Mooij, D. Janzing and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15: 2009--2053, 2014. 45
O. Hoyer. Discovering cyclic causal models by independent components analysis. In Proc. 24th Conf. on Uncertainty in Artificial Intelligence (UAI2008), pp. 366-374, Helsinki, Finland, 2008. • P. O. Hoyer, S. Shimizu, A. Kerminen and M. Palviainen. Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2): 362-378, 2008. • S. Salehkaleybar, A. Ghassami, N. Kiyavash, K. Zhang. Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables. Journal of Machine Learning Research, 21:1-24, 2020. • S. Shimizu, P. O. Hoyer and A. Hyvärinen. Estimation of linear non-Gaussian acyclic models for latent factors. Neurocomputing, 72: 2024-2027, 2009. • Y. Zeng, S. Shimizu, R. Cai, F. Xie, M. Yamamoto, Z. Hao. Causal Discovery with Multi-Domain LiNGAM for Latent Factors. Proc. IJCAI2021. • Zheng, Xun and Aragam, Bryon and Ravikumar, Pradeep K and Xing, Eric P. DAGs with NO TEARS: Continuous Optimization for Structure Learning, Part of Advances in Neural Information Processing Systems 31 (NeurIPS 2018), 2018 • J. D. Ramsey, S. J. Hanson and C. Glymour. Multi-subject search correctly identifies causal connections and most causal directions in the DCM models of the Smith et al. simulation study. NeuroImage, 58(3): 838--848, 2011. • S. Shimizu. Joint estimation of linear non-Gaussian acyclic models. Neurocomputing, 81: 104-107, 2012. 46
Causal Structure Discovery with Application to Prescriptive Pricing. In Proc. 27th International Joint Conference on Artificial Intelligence (IJCAI2018), pp. xx--xx, Stockholm, Sweden, 2018. • Y. Komatsu, S. Shimizu and H. Shimodaira. Assessing statistical reliability of LiNGAM via multiscale bootstrap. In Proc. International Conference on Artificial Neural Networks (ICANN2010), pp.309-314, Thessaloniki, Greece, 2010. • K. Biza, I. Tsamardinos, S. Triantafillou. Tuning causal discovery algorithms. In Proc. Probabilistic Graphical Models (PGM2020), 2020. • R. Silva, R. Scheines, C. Glymour, and P. Spirtes. Learning the structure of linear latent variable models. Journal of Machine Learning Research, 7:191–246, 2006. • F. Xie, R. Cai, B. Huang, C. Glymour, Z. Hao, and K. Zhang. Generalized independent noise condition for estimating latent variable causal graphs. NeurIPS, 33, 2020. • K. Zhang, M. Gong, P. Stojanov, B. Huang, Q. Liu, C. Glymour. Domain Adaptation as a Problem of Inference on Graphical Models. NeurIPS, 33, 2020. • M. J. Kusner, J. Loftus, C. Russell, R. Silva. Counterfactual Fairness. In Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017 • P. Blöbaum and S. Shimizu. Estimation of interventional effects of features on prediction. In Proc. 2017 IEEE International Workshop on Machine Learning for Signal Processing (MLSP2017), pp. xx--xx, Tokyo, Japan, 2017. 47