Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LiNGAM approach to causal discovery (Preliminary version)

LiNGAM approach to causal discovery (Preliminary version)

Talk at the KDD2021 Workshop on Causal Discovery (CD2021) on 15 August.

As of 13 August, I'm still modifying the slides to make the talk shorter to 45 min.

638dd29baf6aa478d8eb0aeb0075c404?s=128

Shohei SHIMIZU

August 13, 2021
Tweet

Transcript

  1. LiNGAM approach to causal discovery Shohei SHIMIZU Shiga University &

    RIKEN The KDD2021 Workshop on Causal Discovery (CD2021)
  2. What is causal discovery? • Methodology for inferring causal graphs

    using data 2 Maeda and Shimizu (2020) Assumptions • Functional form? • Distribution? • Hidden common cause present? • Acyclic? etc. Data Causal graph
  3. Causal graphs are the key to statistical causal inference •

    Estimate intervention effects – Need causal graph to select variables to be adjusted, e.g., using backdoor criterion (Pearl, 1995) • Also useful for machine learning – E.g., domain adaptation (Zhang et al., 2020), fairness (Kuzner et al., 2017), and interpretability (Blobaum & Shimizu, 2017) 3 Messerli (2012) Chocolate Nobel laureates GDP Number of Nobel laureates Chocolate consumption
  4. How do we draw a causal graph? • Common way:

    Use background knowledge • Often need to use both background knowledge AND DATA • Causal discovery: Infer the causal graph from data 4 ? or or Chocolate Nobel laureates GDP Chocolate Nobel GDP Chocolate Nobel GDP Chocolate Nobel GDP
  5. Application areas https://sites.google.com/view/sshimizu06/lingam/lingampapers/applications-and-tailor-made-methods 5 Epidemiology Economics Sleep problems Depression mood

    Sleep problems Depression mood ? or OpInc.gr(t) Empl.gr(t) Sales.gr(t) R&D.gr(t) Empl.gr(t+1) Sales.gr(t+1) R&D(.grt+1) OpInc.gr(t+1) Empl.gr(t+2) Sales.gr(t+2) R&D.gr(t+2) OpInc.gr(t+2) (Moneta et al., 2012) (Rosenstrom et al., 2012) Neuroscience Chemistry (Campomanes et al., 2014) (Boukrina & Graves, 2013) Prevention Medicine (Kotoku et al., 2020) Climatology (Liu & Niyogi, 2020)
  6. Causal discovery is a challenge in causal inference • Classical

    non-parametric approach uses conditional independence (Pearl 2001; Spirtes 1993) – Make no assumptions about function forms or distribution – The limit is finding the Markov equivalent models • Additional assumptions needed to go beyond the limit – Restrictions on functional forms and distributions – Uniquely Identifiable or Smaller numbers of Equivalent models • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014). – Non-Gaussian assumption to exploit independence – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020) 6
  7. Causal discovery is a challenge in causal inference • Classical

    non-parametric approach uses conditional independence (Pearl 2001; Spirtes 1993) – Make no assumptions about function forms or distribution – The limit is finding the Markov equivalent models • Additional assumptions needed to go beyond the limit – Restrictions on functional forms and distributions – Uniquely identifiable or smaller numbers of equivalent models • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014). – Non-Gaussian assumption to exploit independence – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020) 7
  8. Causal discovery is a challenge in causal inference • Classical

    non-parametric approach uses conditional independence (Pearl 2001; Spirtes 1993) – Make no assumptions about function forms or distribution – The limit is finding the Markov equivalent models • Additional assumptions needed to go beyond the limit – Restrictions on functional forms and distributions – Uniquely identifiable or smaller numbers of equivalent models • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014). – Non-Gaussian assumption to exploit independence – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020) 8
  9. Methods of causal discovery 9

  10. Framework • Structural causal model (Pearl, 2001) • Make assumptions

    and find a causal graph(s) that is consistent with the data – Typical example 1: • Directed acyclic graph (DAG) • No hidden common cause (all observed) – Typical example 2: • DAG • Hidden common causes may exist 10 x3 x1 e3 e1 x2 e2 Error variable 𝑥! = 𝑓! (parents of 𝑥! , 𝑒! )
  11. Non-parametric approach To what extent can we infer the causal

    graph without making any assumptions about the functional form or distribution? 11 Spirtes, Glymour, Shceines, 2001 (2nd ed)
  12. Non-parametric approach: Example 1. Making assumptions on the underlying causal

    graph – Directed acyclic graph – No hidden common causes (all have been observed) 2. Find the graph that best matches the data among such causal graphs that satisfy the assumptions. 12 If x and y are independent in the data, select (c) on the right. If x and y are dependent in the data, select (a) and (b). (a) and (b) are indistinguishable (not uniquely identifiable): Markov equivalence class Three candidates x y x y x y (a) (b) (c)
  13. Non-parametric approach: Example 1. Making assumptions on the underlying causal

    graph – Directed acyclic graph – No hidden common causes (all have been observed) 2. Find the graph that best matches the data among such causal graphs that satisfy the assumptions. 13 If x and y are independent in the data, select (c) on the right. If x and y are dependent in the data, select (a) and (b). (a) and (b) are indistinguishable (not uniquely identifiable): Markov equivalence class Three candidates x y x y x y (a) (b) (c)
  14. Various extensions • Equivalent models including unobserved common causes (Spirtes

    et al., 1995) • Those for time series cases (Malinsky & Spirtes, 2018) • Equivalence class including cyclic graphs (Richardson, 1996) • Lower bound on intervention effects (Maathuis et al., 2009; Malinsky & Spirtes, 2017) 14 x y f w z x y w z x y f1 w z f2 F. Eberhardt CRM Workshop 2016
  15. Semi-parametric approach: Make additional assumptions on function forms and distributions

    What are the assumptions for making causal graphs identifiable? 15
  16. Make additional assumptions on functional forms and distributions • More

    information available than conditional independence • E.g., linearity + non-Gaussian continuous distribution 16 Results in different distributions of x1 and x2 No difference in terms of their conditional independence x y x y (a) (b)
  17. LiNGAM model is identifiable (Shimizu, Hyvarinen, Hoyer & Kerminen, 2006)

    • Linear Non-Gaussian Acyclic Model: – 𝑘(𝑖) (𝑖 = 1, … , 𝑝): causal (topological) order of 𝑥! – Error variables 𝑒! independent and non-Gaussian • Coefficients and causal orders identifiable • Causal graph identifiable 17 or 𝑥" 𝑥# 𝑥$ Causal graph 𝑥! = # " # $"(!) 𝑏!# 𝑥# + 𝑒! 𝒙 = 𝐵𝒙 + 𝒆 𝑒$ 𝑒" 𝑒# 𝑏#" 𝑏#$ 𝑏"$
  18. How do we use non-Gaussianity and independence? 18 𝑏!" 𝑥!

    = 𝑏!"𝑒" + 𝑒! and 𝑟" (!) are dependent, although they are uncorrelated Residual 𝑥" = 𝑒" and 𝑟! (") are independent 𝑟" (#) = 𝑥" − cov 𝑥", 𝑥# var 𝑥# 𝑥# = 1 − '!"()* +",+! *-. +! 𝑒" − '!"*-. +" *-. +! 𝑒# 𝑟# (") = 𝑥# − cov 𝑥# , 𝑥" var 𝑥" 𝑥" = 𝑥# − 𝑏#" 𝑥" = 𝑒# Underlying model 𝑥" = 𝑒" 𝑥# = 𝑏#" 𝑥" + 𝑒# (𝑏#" ≠ 0) 𝑥# 𝑥" 𝑒" 𝑒# 𝑒! , 𝑒" non-Gaussian Regress effect x2 on cause x1 Regress cause x1 on effect x2
  19. Independence measure (Hyvarinen & Smith, 2013) • Can compute difference

    of mutual information of explanatory variable and its residual for different directions by one- dimensional entropy • Maximum entropy approximation of entropy 𝐻 (Hyvarinen, 1999) 19 𝐻(𝑢) ≈ 𝐻 𝑣 − 𝑘- [𝐸 log cosh 𝑢 − 𝛾].−𝑘. [𝐸 𝑢 exp (−𝑢./2 ]. 𝐼 𝑥" , 𝑟# " − 𝐼 𝑥# , 𝑟" # = 𝐻 𝑥" + 𝐻 𝑟# " sd 𝑟# " − 𝐻 𝑥# + 𝐻 𝑟" # sd 𝑟" #
  20. Evaluation of estimated causal graphs 20

  21. Before estimating causal graphs • Assessing assumptions by – Gaussianity

    test – Histograms • continuous? – Too high correlation? • multicollinearity? – Background knowledge 21
  22. After estimating causal graphs • Assessing assumptions by – Testing

    independence of error variables, e.g., by HSIC (Gretton et al., 2005) – Prediction accuracy using Markov boundary (Biza et al., 2020) – Compare to the results of other datasets in which causal graphs expected to be similar – Check against background knowledge 22
  23. Statistical reliability assessment • Bootstrap probability (bp) of directed paths

    and edges • Interpret causal effects whose bp larger than a threshold, say 5% 23 x3 x1 … … x3 x1 x0 x3 x1 x2 x3 x1 99% 96% Total effect: 20.9 10% LiNGAM Python package: https://github.com/cdt15/lingam
  24. To relax the model assumptions 24

  25. Other identifiable models • Nonlinearity + “additive” noise (Hoyer+08NIPS, Zhang+09UAI,

    Peters+14JMLR) • 𝑥% = 𝑓%(par(𝑥%)) + 𝑒% • 𝑥% = 𝑔% &"(𝑓%(par(𝑥%)) + 𝑒%) • Discrete variables – Poisson DAG model and its extensions (Park+18JMLR) • Mixed types of variables: LiNGAM + logistic-type model – Identifiability condition for two variables (Wenjuan+18IJCAI) – Probably ok also for multivariate cases using the idea of Thm.28 of Peters et al. (2014) 25
  26. Other identifiable models • Nonlinearity + “additive” noise (Hoyer+08NIPS, Zhang+09UAI,

    Peters+14JMLR) • 𝑥% = 𝑓%(par(𝑥%)) + 𝑒% • 𝑥% = 𝑔% &"(𝑓%(par(𝑥%)) + 𝑒%) • Discrete variables – Poisson DAG model and its extensions (Park+18JMLR) • Mixed types of variables: LiNGAM + logistic-type model – Identifiability condition for two variables (Wenjuan+18IJCAI) – Probably ok also for multivariate cases using the idea of Thm.28 of Peters et al. (2014) 26
  27. Other identifiable models • Nonlinearity + “additive” noise (Hoyer+08NIPS, Zhang+09UAI,

    Peters+14JMLR) • 𝑥% = 𝑓%(par(𝑥%)) + 𝑒% • 𝑥% = 𝑔% &"(𝑓%(par(𝑥%)) + 𝑒%) • Discrete variables – Poisson DAG model and its extensions (Park+18JMLR) • Mixed types of variables: LiNGAM + logistic-type model – Identifiability condition for two variables (Wenjuan+18IJCAI) 27
  28. For better statistical reliability 28

  29. For better statistical reliability • Use background knowledge in estimation

    – Causal orders – Specify functional forms – Specify distribution • E.g., in manufacturing, causal orders of these 3 groups often known – Manufacturing conditions – Intermediate characteristics – Final characteristic(s) 29 Final characteristic Manufacturing Condition 1 Manufacturing Condition 10 Intermediate chrctrstc 1 Intermediate chrctrstc 100 … Intermediate chrctrstc 82 Intermediate chrctrstc 8 Intermediate chrctrstc 66 Intermediate chrctrstc 66 Intermediate chrctrstc 16 … … … …
  30. For better statistical reliability • Simultaneously analyze different datasets to

    use similarity (Ramsey et al. 2011; Shimizu, 2012) – Similarity: Causal orders same, distributions and coefficients may different – Accuracy greatly improved in fMRI simulated data (Ramsey et al., 2011) 30 x3 x1 x2 e1 e2 e3 4 -3 2 x3 x1 x2 e1 e2 e3 -0.5 5 Dataset 1 Dataset 2
  31. LiNGAM with hidden common causes 31

  32. Estimate causal structures of variables that do not share hidden

    common causes • For unconfounded pairs with no hidden common causes, estimate the causal directions • For confounded pairs with hidden common causes, leave them remain unknown 32 𝑥# 𝑥" 𝑓" 𝑥$ Underlying model Output 𝑥0 𝑥# 𝑥" 𝑥$ 𝑥0 𝑓#
  33. Non-Gaussianity and independence work again • Existence of hidden common

    causes leads to dependence btw. explanatory variable and its residual (Tashiro et al., 2014) • Key result (Maeda & Shimizu, 2020) – Find a set of variables that that gives independent residual when a variable is regressed on every its subset – If succeeded, variables in such a set (x1 and x2) are the unconfounded ancestors of the variable (x4) • For nonlinear additive models, existence of hidden intermediate variables also leads to dependence (Maeda & Shimizu, 2021) 33 𝑥# 𝑥" 𝑓" !! !" "" !# !$ "! !! 𝑥# 𝑥" 𝑓$
  34. Non-Gaussianity and independence work again • Existence of hidden common

    causes leads to dependence btw. explanatory variable and its residual (Tashiro et al., 2014) • Key result (Maeda & Shimizu, 2020) – Find a set of variables that that gives independent residual when a variable is regressed on every its subset – If succeeded, variables in such a set (x1 and x2) are unconfounded ancestors of the variable (x4) • For nonlinear additive models, existence of hidden intermediate variables also leads to dependence (Maeda & Shimizu, 2021) 34 𝑥# 𝑥" 𝑓" !! !" "" !# !$ "! !! 𝑥# 𝑥" 𝑓$
  35. Non-Gaussianity and independence work again • Existence of hidden common

    causes leads to dependence btw. explanatory variable and its residual (Tashiro et al., 2014) • Key result (Maeda & Shimizu, 2020) – Find a set of variables that that gives independent residual when a variable is regressed on every its subset – If succeeded, variables in such a set (x1 and x2) are unconfounded ancestors of the variable (x4) • For nonlinear additive models, existence of hidden intermediate variables also leads to dependence (Maeda & Shimizu, 2021) 35 𝑥# 𝑥" 𝑓" !! !" "" !# !$ "! !! 𝑥# 𝑥" 𝑓$
  36. Estimate causal structures of variables that share hidden common causes

    (Hoyer, Shimizu, Kerminen & Palviainen, 2008; Salehkaleybar et al., 2020) • LiNGAM with unobserved common cause is ICA (Hyvarinen et al.,2001) • Apply ICA and look at the zero/non-zero pattern 36 𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝒙 = (𝐼 − 𝐵)"# (𝐼 − 𝐵)"#𝛬 𝒆 𝒇 𝑥" 𝑥! = 1 0 𝜆"" 𝑏!" 1 𝜆!" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝑏!" 𝜆!" 𝜆"" 𝑥" 𝑥! = 1 𝑏"! 𝜆"" 0 1 𝜆!" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝑏"! 𝜆!" 𝜆"" 𝑥" 𝑥! = 1 0 𝜆"" 0 1 𝜆!" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝜆!" 𝜆"" Independent components
  37. Estimate causal structures of variables that share hidden common causes

    (Hoyer, Shimizu, Kerminen & Palviainen, 2008; Salehkaleybar et al., 2020) • LiNGAM with unobserved common cause is ICA (Hyvarinen et al.,2001) • Apply ICA and look at the zero/non-zero pattern 37 𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝒙 = (𝐼 − 𝐵)"# (𝐼 − 𝐵)"#𝛬 𝒆 𝒇 𝑥" 𝑥! = 1 0 𝜆"" 𝑏!" 1 𝜆!" + 𝜆!"𝜆"" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝑏!" 𝜆!" 𝜆"" 𝑥" 𝑥! = 1 𝑏"! 𝜆"" + 𝑏"!𝜆!" 0 1 𝜆!" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝑏"! 𝜆!" 𝜆"" 𝑥" 𝑥! = 1 0 𝜆"" 0 1 𝜆!" 𝑒" 𝑒! 𝑓" 𝑥# 𝑥" 𝑓" 𝑒" 𝑒# 𝜆!" 𝜆"" Independent components
  38. LiNGAM for latent factors 38

  39. LiNGAM for latent factors (Shimizu et al., 2009) • Model:

    – 2 pure measurement variables per latent needed to identify the measurement model (Silva et al., 2006; Xie et al., 2020) • Estimate the latent factors and then their causal graph 39 𝑥" 𝑥! $ 𝑓" $ 𝑓! 𝑥# 𝑥$ ? 𝒇 = 𝐵𝒇+𝝐 𝒙 = 𝐺𝒇+𝒆
  40. Find common and unique factors across multiple datasets (Zeng et

    al., 2021) • Model • Score function: likelihood + DAGness (Zheng et al., 2018) • Feature extraction across multiple datasets + causal discovery of latent factors 40 𝒇(1) = 𝐵(1) 𝒇(1)+ 𝝐(1) 𝒙(1) = 𝐺(1) 𝒇(1)+ 𝒆(1) 𝑚 = 1, … , 𝑀 ! " ! (#) ! ! (!) ! $ (!) ! % (!) ! & (!) ? ! ! ($) ! $ ($) ! " ! (!) ! % (%) ! & (&) ? ! " # (!) ! " # (#) ! " # (#) = ! " ! (!)?
  41. Final summary 41

  42. Final summary • Statistical causal inference is a fundamental tool

    for science – Many well-developed methods available in cases that a causal graph can be drawn with background knowledge – Helping drawing causal graphs with data is the key: Causal discovery • LiNGAM-related papers: https://sites.google.com/view/sshimizu06/lingam/lingampapers • Next default assumptions – Hidden common cause / latent factors – Mixed data: Continuous and discrete – (Cyclicity (Lacerda et al., 2008)) 42
  43. References • T. N. Maeda, S. Shimizu. RCD: Repetitive causal

    discovery of linear non-Gaussian acyclic models with latent confounders. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (AISTATS2020), 2020 • F. H. Messerli, Chocolate Consumption, Cognitive Function, and Nobel Laureates. New England Journal of Medicine, 2012. • T. Rosenström, M. Jokela, S. Puttonen, M. Hintsanen, L. Pulkki-Råback, J. S. Viikari, O. T. Raitakari and L. Keltikangas-Järvinen. Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PLoS ONE, 7(11): e50841, 2012 • A. Moneta, D. Entner, P. O. Hoyer and A. Coad. Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics, 75(5): 705-730, 2013. • O. Boukrina and W. W. Graves. Neural networks underlying contributions from semantics in reading aloud. Frontiers in Human Neuroscience, 7:518, 2013. • P. Campomanes, M. Neri, B. A.C. Horta, U. F. Roehrig, S. Vanni, I. Tavernelli and U. Rothlisberger. Origin of the spectral shifts among the early intermediates of the rhodopsin photocycle. Journal of the American Chemical Society, 136(10): 3842-3851, 2014. • Peters, Janzing, and Schölkopf. (2018). Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press. • S. Shimizu and P. Blöbaum. Recent advances in semi-parametric methods for causal discovery. In Direction Dependence in Statistical Models: Methods of Analysis (W. Wiedermann, D. Kim, E. Sungur, and A. von Eye, eds.), Chapter. Wiley, 2020. 43
  44. References • J. Pearl. Causality. Cambridge University Press, 2001. •

    P. Spirtes, C. Glymour, R. Scheines. Causation, Prediction, and Search. Springer, 1993. • S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 2003--2030, 2006 • S. Shimizu. LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika, 41(1): 65--98, 2014 • P. Spirtes, C. Meek, T. S. Richardson. Causal Inference in the Presence of Latent Variables and Selection Bias. In Proc. 11th Conf. on Uncertainty in Artificial Intelligence (UAI1995), 1995. • D. Malinsky and P. Spirtes. Causal Structure Learning from Multivariate Time Series in Settings with Unmeasured Confounding. In Proc. 2018 ACM SIGKDD Workshop on Causal Discovery (KDD-CD), 2018. • T. S. Richardson. A Discovery Algorithm for Directed Cyclic Graphs. In Proc. 12th Conf. on Uncertainty in Artificial Intelligence (UAI1996), 1996. 44
  45. References • D. Malinsky and P. Spirtes, Estimating bounds on

    causal effects in high-dimensional and possibly confounded systems. International J. Approximate Reasoning, 2017 • S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. O. Hoyer and K. Bollen. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12(Apr): 1225--1248, 2011. • A. Hyvärinen and S. M. Smith. Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research, 14(Jan): 111--152, 2013. • A. Hyvarinen. New approximations of differential entropy for independent component analysis and projection pursuit, In Advances in Neural Information Processing Systems 12 (NIPS1999), 1999 • P. O. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Schölkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21 (NIPS2008), pp. 689-696, 2009. • K. Zhang and A. Hyvärinen. Distinguishing causes from effects using nonlinear acyclic causal models. In JMLR Workshop and Conference Proceedings, Causality: Objectives and Assessment (Proc. NIPS2008 workshop on causality), 6: 157-164, 2010. • J. Peters, J. Mooij, D. Janzing and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15: 2009--2053, 2014. 45
  46. References • G. Lacerda, P. Spirtes, J. Ramsey and P.

    O. Hoyer. Discovering cyclic causal models by independent components analysis. In Proc. 24th Conf. on Uncertainty in Artificial Intelligence (UAI2008), pp. 366-374, Helsinki, Finland, 2008. • P. O. Hoyer, S. Shimizu, A. Kerminen and M. Palviainen. Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2): 362-378, 2008. • S. Salehkaleybar, A. Ghassami, N. Kiyavash, K. Zhang. Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables. Journal of Machine Learning Research, 21:1-24, 2020. • S. Shimizu, P. O. Hoyer and A. Hyvärinen. Estimation of linear non-Gaussian acyclic models for latent factors. Neurocomputing, 72: 2024-2027, 2009. • Y. Zeng, S. Shimizu, R. Cai, F. Xie, M. Yamamoto, Z. Hao. Causal Discovery with Multi-Domain LiNGAM for Latent Factors. Proc. IJCAI2021. • Zheng, Xun and Aragam, Bryon and Ravikumar, Pradeep K and Xing, Eric P. DAGs with NO TEARS: Continuous Optimization for Structure Learning, Part of Advances in Neural Information Processing Systems 31 (NeurIPS 2018), 2018 • J. D. Ramsey, S. J. Hanson and C. Glymour. Multi-subject search correctly identifies causal connections and most causal directions in the DCM models of the Smith et al. simulation study. NeuroImage, 58(3): 838--848, 2011. • S. Shimizu. Joint estimation of linear non-Gaussian acyclic models. Neurocomputing, 81: 104-107, 2012. 46
  47. References • W. Wenjuan, F. Lu, and L. Chunchen. Mixed

    Causal Structure Discovery with Application to Prescriptive Pricing. In Proc. 27th International Joint Conference on Artificial Intelligence (IJCAI2018), pp. xx--xx, Stockholm, Sweden, 2018. • Y. Komatsu, S. Shimizu and H. Shimodaira. Assessing statistical reliability of LiNGAM via multiscale bootstrap. In Proc. International Conference on Artificial Neural Networks (ICANN2010), pp.309-314, Thessaloniki, Greece, 2010. • K. Biza, I. Tsamardinos, S. Triantafillou. Tuning causal discovery algorithms. In Proc. Probabilistic Graphical Models (PGM2020), 2020. • R. Silva, R. Scheines, C. Glymour, and P. Spirtes. Learning the structure of linear latent variable models. Journal of Machine Learning Research, 7:191–246, 2006. • F. Xie, R. Cai, B. Huang, C. Glymour, Z. Hao, and K. Zhang. Generalized independent noise condition for estimating latent variable causal graphs. NeurIPS, 33, 2020. • K. Zhang, M. Gong, P. Stojanov, B. Huang, Q. Liu, C. Glymour. Domain Adaptation as a Problem of Inference on Graphical Models. NeurIPS, 33, 2020. • M. J. Kusner, J. Loftus, C. Russell, R. Silva. Counterfactual Fairness. In Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017 • P. Blöbaum and S. Shimizu. Estimation of interventional effects of features on prediction. In Proc. 2017 IEEE International Workshop on Machine Learning for Signal Processing (MLSP2017), pp. xx--xx, Tokyo, Japan, 2017. 47