Slide 1

Slide 1 text

LiNGAM approach to causal discovery Shohei SHIMIZU Shiga University & RIKEN The KDD2021 Workshop on Causal Discovery (CD2021)

Slide 2

Slide 2 text

What is causal discovery? • Methodology for inferring causal graphs using data 2 Maeda and Shimizu (2020) Assumptions • Functional form? • Distribution? • Hidden common cause present? • Acyclic? etc. Data Causal graph

Slide 3

Slide 3 text

Causal graphs are the key to statistical causal inference • Estimate intervention effects – Need causal graph to select variables to be adjusted, e.g., using backdoor criterion (Pearl, 1995) • Also useful for machine learning – E.g., domain adaptation (Zhang et al., 2020), fairness (Kuzner et al., 2017), and interpretability (Blobaum & Shimizu, 2017) 3 Messerli (2012) Chocolate Nobel laureates GDP Number of Nobel laureates Chocolate consumption

Slide 4

Slide 4 text

How do we draw a causal graph? • Common way: Use background knowledge • Often need to use both background knowledge AND DATA • Causal discovery: Infer the causal graph from data 4 ? or or Chocolate Nobel laureates GDP Chocolate Nobel GDP Chocolate Nobel GDP Chocolate Nobel GDP

Slide 5

Slide 5 text

Application areas https://sites.google.com/view/sshimizu06/lingam/lingampapers/applications-and-tailor-made-methods 5 Epidemiology Economics Sleep problems Depression mood Sleep problems Depression mood ? or OpInc.gr(t) Empl.gr(t) Sales.gr(t) R&D.gr(t) Empl.gr(t+1) Sales.gr(t+1) R&D(.grt+1) OpInc.gr(t+1) Empl.gr(t+2) Sales.gr(t+2) R&D.gr(t+2) OpInc.gr(t+2) (Moneta et al., 2012) (Rosenstrom et al., 2012) Neuroscience Chemistry (Campomanes et al., 2014) (Boukrina & Graves, 2013) Prevention Medicine (Kotoku et al., 2020) Climatology (Liu & Niyogi, 2020)

Slide 6

Slide 6 text

Causal discovery is a challenge in causal inference • Classical non-parametric approach uses conditional independence (Pearl 2001; Spirtes 1993) – Make no assumptions about function forms or distribution – The limit is finding the Markov equivalent models • Additional assumptions needed to go beyond the limit – Restrictions on functional forms and distributions – Uniquely Identifiable or Smaller numbers of Equivalent models • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014). – Non-Gaussian assumption to exploit independence – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020) 6

Slide 7

Slide 7 text

Causal discovery is a challenge in causal inference • Classical non-parametric approach uses conditional independence (Pearl 2001; Spirtes 1993) – Make no assumptions about function forms or distribution – The limit is finding the Markov equivalent models • Additional assumptions needed to go beyond the limit – Restrictions on functional forms and distributions – Uniquely identifiable or smaller numbers of equivalent models • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014). – Non-Gaussian assumption to exploit independence – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020) 7

Slide 8

Slide 8 text

Causal discovery is a challenge in causal inference • Classical non-parametric approach uses conditional independence (Pearl 2001; Spirtes 1993) – Make no assumptions about function forms or distribution – The limit is finding the Markov equivalent models • Additional assumptions needed to go beyond the limit – Restrictions on functional forms and distributions – Uniquely identifiable or smaller numbers of equivalent models • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014). – Non-Gaussian assumption to exploit independence – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020) 8

Slide 9

Slide 9 text

Methods of causal discovery 9

Slide 10

Slide 10 text

Framework • Structural causal model (Pearl, 2001) • Make assumptions and find a causal graph(s) that is consistent with the data – Typical example 1: • Directed acyclic graph (DAG) • No hidden common cause (all observed) – Typical example 2: • DAG • Hidden common causes may exist 10 x3 x1 e3 e1 x2 e2 Error variable 𝑥! = 𝑓! (parents of 𝑥! , 𝑒! )

Slide 11

Slide 11 text

Non-parametric approach To what extent can we infer the causal graph without making any assumptions about the functional form or distribution? 11 Spirtes, Glymour, Shceines, 2001 (2nd ed)

Slide 12

Slide 12 text

Non-parametric approach: Example 1. Making assumptions on the underlying causal graph – Directed acyclic graph – No hidden common causes (all have been observed) 2. Find the graph that best matches the data among such causal graphs that satisfy the assumptions. 12 If x and y are independent in the data, select (c) on the right. If x and y are dependent in the data, select (a) and (b). (a) and (b) are indistinguishable (not uniquely identifiable): Markov equivalence class Three candidates x y x y x y (a) (b) (c)

Slide 13

Slide 13 text

Non-parametric approach: Example 1. Making assumptions on the underlying causal graph – Directed acyclic graph – No hidden common causes (all have been observed) 2. Find the graph that best matches the data among such causal graphs that satisfy the assumptions. 13 If x and y are independent in the data, select (c) on the right. If x and y are dependent in the data, select (a) and (b). (a) and (b) are indistinguishable (not uniquely identifiable): Markov equivalence class Three candidates x y x y x y (a) (b) (c)

Slide 14

Slide 14 text

Various extensions • Equivalent models including unobserved common causes (Spirtes et al., 1995) • Those for time series cases (Malinsky & Spirtes, 2018) • Equivalence class including cyclic graphs (Richardson, 1996) • Lower bound on intervention effects (Maathuis et al., 2009; Malinsky & Spirtes, 2017) 14 x y f w z x y w z x y f1 w z f2 F. Eberhardt CRM Workshop 2016

Slide 15

Slide 15 text

Semi-parametric approach: Make additional assumptions on function forms and distributions What are the assumptions for making causal graphs identifiable? 15

Slide 16

Slide 16 text

Make additional assumptions on functional forms and distributions • More information available than conditional independence • E.g., linearity + non-Gaussian continuous distribution 16 Results in different distributions of x1 and x2 No difference in terms of their conditional independence x y x y (a) (b)

Slide 17

Slide 17 text

LiNGAM model is identifiable (Shimizu, Hyvarinen, Hoyer & Kerminen, 2006) • Linear Non-Gaussian Acyclic Model: – 𝑘(𝑖) (𝑖 = 1, … , 𝑝): causal (topological) order of 𝑥! – Error variables 𝑒! independent and non-Gaussian • Coefficients and causal orders identifiable • Causal graph identifiable 17 or 𝑥" 𝑥# 𝑥$ Causal graph 𝑥! = # " # $"(!) 𝑏!# 𝑥# + 𝑒! 𝒙 = 𝐵𝒙 + 𝒆 𝑒$ 𝑒" 𝑒# 𝑏#" 𝑏#$ 𝑏"$

Slide 18

Slide 18 text

How do we use non-Gaussianity and independence? 18 𝑏!" 𝑥! = 𝑏!"𝑒" + 𝑒! and 𝑟" (!) are dependent, although they are uncorrelated Residual 𝑥" = 𝑒" and 𝑟! (") are independent 𝑟" (#) = 𝑥" − cov 𝑥", 𝑥# var 𝑥# 𝑥# = 1 − '!"()* +",+! *-. +! 𝑒" − '!"*-. +" *-. +! 𝑒# 𝑟# (") = 𝑥# − cov 𝑥# , 𝑥" var 𝑥" 𝑥" = 𝑥# − 𝑏#" 𝑥" = 𝑒# Underlying model 𝑥" = 𝑒" 𝑥# = 𝑏#" 𝑥" + 𝑒# (𝑏#" ≠ 0) 𝑥# 𝑥" 𝑒" 𝑒# 𝑒! , 𝑒" non-Gaussian Regress effect x2 on cause x1 Regress cause x1 on effect x2

Slide 19

Slide 19 text

Independence measure (Hyvarinen & Smith, 2013) • Can compute difference of mutual information of explanatory variable and its residual for different directions by one- dimensional entropy • Maximum entropy approximation of entropy 𝐻 (Hyvarinen, 1999) 19 𝐻(𝑢) ≈ 𝐻 𝑣 − 𝑘- [𝐸 log cosh 𝑢 − 𝛾].−𝑘. [𝐸 𝑢 exp (−𝑢./2 ]. 𝐼 𝑥" , 𝑟# " − 𝐼 𝑥# , 𝑟" # = 𝐻 𝑥" + 𝐻 𝑟# " sd 𝑟# " − 𝐻 𝑥# + 𝐻 𝑟" # sd 𝑟" #

Slide 20

Slide 20 text

Evaluation of estimated causal graphs 20

Slide 21

Slide 21 text

Before estimating causal graphs • Assessing assumptions by – Gaussianity test – Histograms • continuous? – Too high correlation? • multicollinearity? – Background knowledge 21

Slide 22

Slide 22 text

After estimating causal graphs • Assessing assumptions by – Testing independence of error variables, e.g., by HSIC (Gretton et al., 2005) – Prediction accuracy using Markov boundary (Biza et al., 2020) – Compare to the results of other datasets in which causal graphs expected to be similar – Check against background knowledge 22

Slide 23

Slide 23 text

Statistical reliability assessment • Bootstrap probability (bp) of directed paths and edges • Interpret causal effects whose bp larger than a threshold, say 5% 23 x3 x1 … … x3 x1 x0 x3 x1 x2 x3 x1 99% 96% Total effect: 20.9 10% LiNGAM Python package: https://github.com/cdt15/lingam

Slide 24

Slide 24 text

To relax the model assumptions 24

Slide 25

Slide 25 text

Other identifiable models • Nonlinearity + “additive” noise (Hoyer+08NIPS, Zhang+09UAI, Peters+14JMLR) • 𝑥% = 𝑓%(par(𝑥%)) + 𝑒% • 𝑥% = 𝑔% &"(𝑓%(par(𝑥%)) + 𝑒%) • Discrete variables – Poisson DAG model and its extensions (Park+18JMLR) • Mixed types of variables: LiNGAM + logistic-type model – Identifiability condition for two variables (Wenjuan+18IJCAI) – Probably ok also for multivariate cases using the idea of Thm.28 of Peters et al. (2014) 25

Slide 26

Slide 26 text

Other identifiable models • Nonlinearity + “additive” noise (Hoyer+08NIPS, Zhang+09UAI, Peters+14JMLR) • 𝑥% = 𝑓%(par(𝑥%)) + 𝑒% • 𝑥% = 𝑔% &"(𝑓%(par(𝑥%)) + 𝑒%) • Discrete variables – Poisson DAG model and its extensions (Park+18JMLR) • Mixed types of variables: LiNGAM + logistic-type model – Identifiability condition for two variables (Wenjuan+18IJCAI) – Probably ok also for multivariate cases using the idea of Thm.28 of Peters et al. (2014) 26

Slide 27

Slide 27 text

Other identifiable models • Nonlinearity + “additive” noise (Hoyer+08NIPS, Zhang+09UAI, Peters+14JMLR) • 𝑥% = 𝑓%(par(𝑥%)) + 𝑒% • 𝑥% = 𝑔% &"(𝑓%(par(𝑥%)) + 𝑒%) • Discrete variables – Poisson DAG model and its extensions (Park+18JMLR) • Mixed types of variables: LiNGAM + logistic-type model – Identifiability condition for two variables (Wenjuan+18IJCAI) 27

Slide 28

Slide 28 text

For better statistical reliability 28

Slide 29

Slide 29 text

For better statistical reliability • Use background knowledge in estimation – Causal orders – Specify functional forms – Specify distribution • E.g., in manufacturing, causal orders of these 3 groups often known – Manufacturing conditions – Intermediate characteristics – Final characteristic(s) 29 Final characteristic Manufacturing Condition 1 Manufacturing Condition 10 Intermediate chrctrstc 1 Intermediate chrctrstc 100 … Intermediate chrctrstc 82 Intermediate chrctrstc 8 Intermediate chrctrstc 66 Intermediate chrctrstc 66 Intermediate chrctrstc 16 … … … …

Slide 30

Slide 30 text

For better statistical reliability • Simultaneously analyze different datasets to use similarity (Ramsey et al. 2011; Shimizu, 2012) – Similarity: Causal orders same, distributions and coefficients may different – Accuracy greatly improved in fMRI simulated data (Ramsey et al., 2011) 30 x3 x1 x2 e1 e2 e3 4 -3 2 x3 x1 x2 e1 e2 e3 -0.5 5 Dataset 1 Dataset 2

Slide 31

Slide 31 text

LiNGAM with hidden common causes 31

Slide 32

Slide 32 text

Estimate causal structures of variables that do not share hidden common causes • For unconfounded pairs with no hidden common causes, estimate the causal directions • For confounded pairs with hidden common causes, leave them remain unknown 32 𝑥# 𝑥" 𝑓" 𝑥$ Underlying model Output 𝑥0 𝑥# 𝑥" 𝑥$ 𝑥0 𝑓#

Slide 33

Slide 33 text

Non-Gaussianity and independence work again • Existence of hidden common causes leads to dependence btw. explanatory variable and its residual (Tashiro et al., 2014) • Key result (Maeda & Shimizu, 2020) – Find a set of variables that that gives independent residual when a variable is regressed on every its subset – If succeeded, variables in such a set (x1 and x2) are the unconfounded ancestors of the variable (x4) • For nonlinear additive models, existence of hidden intermediate variables also leads to dependence (Maeda & Shimizu, 2021) 33 𝑥# 𝑥" 𝑓" !! !" "" !# !$ "! !! 𝑥# 𝑥" 𝑓$

Slide 34

Slide 34 text

Non-Gaussianity and independence work again • Existence of hidden common causes leads to dependence btw. explanatory variable and its residual (Tashiro et al., 2014) • Key result (Maeda & Shimizu, 2020) – Find a set of variables that that gives independent residual when a variable is regressed on every its subset – If succeeded, variables in such a set (x1 and x2) are unconfounded ancestors of the variable (x4) • For nonlinear additive models, existence of hidden intermediate variables also leads to dependence (Maeda & Shimizu, 2021) 34 𝑥# 𝑥" 𝑓" !! !" "" !# !$ "! !! 𝑥# 𝑥" 𝑓$

Slide 35

Slide 35 text

Non-Gaussianity and independence work again • Existence of hidden common causes leads to dependence btw. explanatory variable and its residual (Tashiro et al., 2014) • Key result (Maeda & Shimizu, 2020) – Find a set of variables that that gives independent residual when a variable is regressed on every its subset – If succeeded, variables in such a set (x1 and x2) are unconfounded ancestors of the variable (x4) • For nonlinear additive models, existence of hidden intermediate variables also leads to dependence (Maeda & Shimizu, 2021) 35 𝑥# 𝑥" 𝑓" !! !" "" !# !$ "! !! 𝑥# 𝑥" 𝑓$

Slide 36

Slide 36 text

Estimate causal structures of variables that share hidden common causes (Hoyer, Shimizu, Kerminen & Palviainen, 2008; Salehkaleybar et al., 2020) • LiNGAM with unobserved common cause is ICA (Hyvarinen et al.,2001) • Apply ICA and look at the zero/non-zero pattern 36 𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝒙 = (𝐼 − 𝐵)"# (𝐼 − 𝐵)"#𝛬 𝒆 𝒇 𝑥" 𝑥! = 1 0 𝜆"" 𝑏!" 1 𝜆!" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝑏!" 𝜆!" 𝜆"" 𝑥" 𝑥! = 1 𝑏"! 𝜆"" 0 1 𝜆!" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝑏"! 𝜆!" 𝜆"" 𝑥" 𝑥! = 1 0 𝜆"" 0 1 𝜆!" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝜆!" 𝜆"" Independent components

Slide 37

Slide 37 text

Estimate causal structures of variables that share hidden common causes (Hoyer, Shimizu, Kerminen & Palviainen, 2008; Salehkaleybar et al., 2020) • LiNGAM with unobserved common cause is ICA (Hyvarinen et al.,2001) • Apply ICA and look at the zero/non-zero pattern 37 𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝒙 = (𝐼 − 𝐵)"# (𝐼 − 𝐵)"#𝛬 𝒆 𝒇 𝑥" 𝑥! = 1 0 𝜆"" 𝑏!" 1 𝜆!" + 𝜆!"𝜆"" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝑏!" 𝜆!" 𝜆"" 𝑥" 𝑥! = 1 𝑏"! 𝜆"" + 𝑏"!𝜆!" 0 1 𝜆!" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝑏"! 𝜆!" 𝜆"" 𝑥" 𝑥! = 1 0 𝜆"" 0 1 𝜆!" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝜆!" 𝜆"" Independent components

Slide 38

Slide 38 text

LiNGAM for latent factors 38

Slide 39

Slide 39 text

LiNGAM for latent factors (Shimizu et al., 2009) • Model: – 2 pure measurement variables per latent needed to identify the measurement model (Silva et al., 2006; Xie et al., 2020) • Estimate the latent factors and then their causal graph 39 𝑥" 𝑥! $ 𝑓" $ 𝑓! 𝑥# 𝑥$ ? 𝒇 = 𝐵𝒇+𝝐 𝒙 = 𝐺𝒇+𝒆

Slide 40

Slide 40 text

Find common and unique factors across multiple datasets (Zeng et al., 2021) • Model • Score function: likelihood + DAGness (Zheng et al., 2018) • Feature extraction across multiple datasets + causal discovery of latent factors 40 𝒇(1) = 𝐵(1) 𝒇(1)+ 𝝐(1) 𝒙(1) = 𝐺(1) 𝒇(1)+ 𝒆(1) 𝑚 = 1, … , 𝑀 ! " ! (#) ! ! (!) ! $ (!) ! % (!) ! & (!) ? ! ! ($) ! $ ($) ! " ! (!) ! % (%) ! & (&) ? ! " # (!) ! " # (#) ! " # (#) = ! " ! (!)?

Slide 41

Slide 41 text

Final summary 41

Slide 42

Slide 42 text

Final summary • Statistical causal inference is a fundamental tool for science – Many well-developed methods available in cases that a causal graph can be drawn with background knowledge – Helping drawing causal graphs with data is the key: Causal discovery • LiNGAM-related papers: https://sites.google.com/view/sshimizu06/lingam/lingampapers • Next default assumptions – Hidden common cause / latent factors – Mixed data: Continuous and discrete – (Cyclicity (Lacerda et al., 2008)) 42

Slide 43

Slide 43 text

References • T. N. Maeda, S. Shimizu. RCD: Repetitive causal discovery of linear non-Gaussian acyclic models with latent confounders. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (AISTATS2020), 2020 • F. H. Messerli, Chocolate Consumption, Cognitive Function, and Nobel Laureates. New England Journal of Medicine, 2012. • T. Rosenström, M. Jokela, S. Puttonen, M. Hintsanen, L. Pulkki-Råback, J. S. Viikari, O. T. Raitakari and L. Keltikangas-Järvinen. Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PLoS ONE, 7(11): e50841, 2012 • A. Moneta, D. Entner, P. O. Hoyer and A. Coad. Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics, 75(5): 705-730, 2013. • O. Boukrina and W. W. Graves. Neural networks underlying contributions from semantics in reading aloud. Frontiers in Human Neuroscience, 7:518, 2013. • P. Campomanes, M. Neri, B. A.C. Horta, U. F. Roehrig, S. Vanni, I. Tavernelli and U. Rothlisberger. Origin of the spectral shifts among the early intermediates of the rhodopsin photocycle. Journal of the American Chemical Society, 136(10): 3842-3851, 2014. • Peters, Janzing, and Schölkopf. (2018). Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press. • S. Shimizu and P. Blöbaum. Recent advances in semi-parametric methods for causal discovery. In Direction Dependence in Statistical Models: Methods of Analysis (W. Wiedermann, D. Kim, E. Sungur, and A. von Eye, eds.), Chapter. Wiley, 2020. 43

Slide 44

Slide 44 text

References • J. Pearl. Causality. Cambridge University Press, 2001. • P. Spirtes, C. Glymour, R. Scheines. Causation, Prediction, and Search. Springer, 1993. • S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 2003--2030, 2006 • S. Shimizu. LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika, 41(1): 65--98, 2014 • P. Spirtes, C. Meek, T. S. Richardson. Causal Inference in the Presence of Latent Variables and Selection Bias. In Proc. 11th Conf. on Uncertainty in Artificial Intelligence (UAI1995), 1995. • D. Malinsky and P. Spirtes. Causal Structure Learning from Multivariate Time Series in Settings with Unmeasured Confounding. In Proc. 2018 ACM SIGKDD Workshop on Causal Discovery (KDD-CD), 2018. • T. S. Richardson. A Discovery Algorithm for Directed Cyclic Graphs. In Proc. 12th Conf. on Uncertainty in Artificial Intelligence (UAI1996), 1996. 44

Slide 45

Slide 45 text

References • D. Malinsky and P. Spirtes, Estimating bounds on causal effects in high-dimensional and possibly confounded systems. International J. Approximate Reasoning, 2017 • S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. O. Hoyer and K. Bollen. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12(Apr): 1225--1248, 2011. • A. Hyvärinen and S. M. Smith. Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research, 14(Jan): 111--152, 2013. • A. Hyvarinen. New approximations of differential entropy for independent component analysis and projection pursuit, In Advances in Neural Information Processing Systems 12 (NIPS1999), 1999 • P. O. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Schölkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21 (NIPS2008), pp. 689-696, 2009. • K. Zhang and A. Hyvärinen. Distinguishing causes from effects using nonlinear acyclic causal models. In JMLR Workshop and Conference Proceedings, Causality: Objectives and Assessment (Proc. NIPS2008 workshop on causality), 6: 157-164, 2010. • J. Peters, J. Mooij, D. Janzing and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15: 2009--2053, 2014. 45

Slide 46

Slide 46 text

References • G. Lacerda, P. Spirtes, J. Ramsey and P. O. Hoyer. Discovering cyclic causal models by independent components analysis. In Proc. 24th Conf. on Uncertainty in Artificial Intelligence (UAI2008), pp. 366-374, Helsinki, Finland, 2008. • P. O. Hoyer, S. Shimizu, A. Kerminen and M. Palviainen. Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2): 362-378, 2008. • S. Salehkaleybar, A. Ghassami, N. Kiyavash, K. Zhang. Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables. Journal of Machine Learning Research, 21:1-24, 2020. • S. Shimizu, P. O. Hoyer and A. Hyvärinen. Estimation of linear non-Gaussian acyclic models for latent factors. Neurocomputing, 72: 2024-2027, 2009. • Y. Zeng, S. Shimizu, R. Cai, F. Xie, M. Yamamoto, Z. Hao. Causal Discovery with Multi-Domain LiNGAM for Latent Factors. Proc. IJCAI2021. • Zheng, Xun and Aragam, Bryon and Ravikumar, Pradeep K and Xing, Eric P. DAGs with NO TEARS: Continuous Optimization for Structure Learning, Part of Advances in Neural Information Processing Systems 31 (NeurIPS 2018), 2018 • J. D. Ramsey, S. J. Hanson and C. Glymour. Multi-subject search correctly identifies causal connections and most causal directions in the DCM models of the Smith et al. simulation study. NeuroImage, 58(3): 838--848, 2011. • S. Shimizu. Joint estimation of linear non-Gaussian acyclic models. Neurocomputing, 81: 104-107, 2012. 46

Slide 47

Slide 47 text

References • W. Wenjuan, F. Lu, and L. Chunchen. Mixed Causal Structure Discovery with Application to Prescriptive Pricing. In Proc. 27th International Joint Conference on Artificial Intelligence (IJCAI2018), pp. xx--xx, Stockholm, Sweden, 2018. • Y. Komatsu, S. Shimizu and H. Shimodaira. Assessing statistical reliability of LiNGAM via multiscale bootstrap. In Proc. International Conference on Artificial Neural Networks (ICANN2010), pp.309-314, Thessaloniki, Greece, 2010. • K. Biza, I. Tsamardinos, S. Triantafillou. Tuning causal discovery algorithms. In Proc. Probabilistic Graphical Models (PGM2020), 2020. • R. Silva, R. Scheines, C. Glymour, and P. Spirtes. Learning the structure of linear latent variable models. Journal of Machine Learning Research, 7:191–246, 2006. • F. Xie, R. Cai, B. Huang, C. Glymour, Z. Hao, and K. Zhang. Generalized independent noise condition for estimating latent variable causal graphs. NeurIPS, 33, 2020. • K. Zhang, M. Gong, P. Stojanov, B. Huang, Q. Liu, C. Glymour. Domain Adaptation as a Problem of Inference on Graphical Models. NeurIPS, 33, 2020. • M. J. Kusner, J. Loftus, C. Russell, R. Silva. Counterfactual Fairness. In Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017 • P. Blöbaum and S. Shimizu. Estimation of interventional effects of features on prediction. In Proc. 2017 IEEE International Workshop on Machine Learning for Signal Processing (MLSP2017), pp. xx--xx, Tokyo, Japan, 2017. 47