Shohei SHIMIZU
July 05, 2024
140

# Causal discovery based on non-Gaussianity and nonlinearity

6th Pacific Causal Inference Conference (PCIC 2024)

July 05, 2024

## Transcript

1. ### Causal discovery based on non-Gaussianity and nonlinearity SHIMIZU Shohei1,2 1Faculty

of Data Science, Shiga University 2Center for Advanced Intelligence Project (AIP), RIKEN 6th Pacific Causal Inference Conference (PCIC 2024) SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 1 / 17
2. ### Abstract Most causal inference tools require information on causal structures

Statistical causal discovery uses data to infer the causal structure of variables, i.e., the causal graph Outline the basic ideas of statistical causal discovery In particular, introduce methods based on non-Gaussianity and non-linearity to identify more causal directions SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 2 / 17
3. ### Causal inference tools need causal structural information Estimation of intervention

effects from observational data Draw the causal graph based on background knolwedge Derive which variables should be used for adjustment Observe and adjust for the variables (if any), and estimate the intervention effect E (Nobel|do(Chocolate)) = Epar(C) (E(N|C, par(C)) Messerli [2012] Sleep disorder Depression mood Third variable (Common cause) Chocolate Nobel laureates GDP SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 3 / 17
4. ### Causal discovery [Spirtes et al., 2001, Shimizu, 2022] Methodology for

inferring causal graphs using data Help select covariates in causal effect estimation Causal discovery • Infer the causal graph in data-driven ways • Need assumptions to infer the causal graph – Various methods for different assumptions – Basic setup • All the common causes are measured • Acyclicity !1 !3 or !2 !1 !3 or !2 or … ? !1 !3 !2 Use structural causal model [Pearl, 2000] to represent background knowledge and assumptions on the distributions, functional forms, and causal graph structure (acyclic or cyclic) xi = fi (par(xi ), ei ) SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 4 / 17
5. ### Basic idea of non-parametric approach [Spirtes et al., 2001] 1

Make assumptions on the underlying causal graph Directed acyclic graph No hidden common causes (all have been observed) 2 Find the graph that best matches the data among such causal graphs that satisfy the assumptions If x and y are independent in the data, select (c) If x and y are dependent in the data, select (a) and (b) (a) and (b) are indistinguishable: Markov Equivalence class Non-parametric approach: Example (Spirtes et al., 1993; 2001) Make assumptions on the underlying causal graph – Directed acyclic graph – No hidden common causes (all have been observed) Find the graph that best matches the data among such causal gra that satisfy the assumptions. If x and y are independent in the data, select (c) on the right. If x and y are dependent in the data, select (a) and (b). (a) and (b) are indistinguishable: Markov Equivalence class x y x y x y (a) (b) (c) SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 5 / 17
6. ### Additional information on functional forms and/or distributions helpful [Shimizu, 2014,

2022] Semiparametric approach (named after independent component analysis) Linearity and non-Gaussian continuous distributions result in different distributions of x and y [Shimizu et al., 2006] Estimation is made by maximizing the independence (beyond uncorrelatedness) between the variable and error Correct model gives stronger independence ditional information on functiona orms and/or distributions helpful iparametric approach ., linearity + non-Gaussian continuous tribution results in different dist. of x and izu, Hoyer, Hyvarinen & Kerminen, 2006; Shimizu, 2022) No difference in terms of their conditional independence (a) (b) " = \$#\$ ! + &# ! = \$\$# " + &\$ SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 6 / 17
7. ### Example identifiable models Linear non-Gaussian acyclic model: LiNGAM [Shimizu et

al., 2006] xi = par(xi ) bij xj + ei Linear non-Gaussian cyclic model with some constraints including stability [Lacerda et al., 2008] Nonlinear additive noise model and post-nonlinear causal model [Hoyer et al., 2009, Zhang and Chan, 2006, Zhang and Hyv¨ arinen, 2009, Peters et al., 2014] xi = g−1 i (fi (par(xi )) + ei ) Discrete variable model and mixed cases [Park and Raskutti, 2017, Wei et al., 2018, Zeng et al., 2022] SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 7 / 17
8. ### Example: Reliable AI with application to Credit rating [Takahashi et

al., 2024] Counterfactual explanatory scores for AI prediction [Galhotra et al., 2021] computed based on probabilities of necessity, sufficiency, and both [Pearl, 1999] Estimate the causal structure using background knowledge and LiNGAM Evaluated on anonymized credit rating data of 14,018 business customers provided by Shiga Bank Background knowledge: Credit rating is a sink he case with No graph. On the ture of the data and the causal worse than the case with No ider multiple causal discovery TO REAL DATA rate the effectiveness of this the causal graph is unknown data. anonymized credit rating data ovided by the Shiga Bank, Ltd. a bank to a debtor based on the nts [25]. Although there were g, we simpliﬁed the grades to which facilitated its modeling m. We used the industry type, er of employees, most recent and equity as the explanatory frequency discretization with tock, number of employees, otal liabilities and equity had hese discretized variables are many levels and can be seen machine learning model, we an algorithm that performs results of multiple decision y selected learning data and conducted business activities based on capital and that this inﬂuenced sales. These variables can affect the credit rating on the basis of prior information of the causal structure in Figure 5 (red lines). Fig. 5. Causal graph estimated by DirectLiNGAM (black lines). Prior information on the causal structure (red lines). Next, Nesuf estimated using the causal graph and No graph is shown in Figure 6. The importance ranking of each of these variables obtained by LEWIS can be explained as follows Fig. 6. Nesuf estimated from causal graph and No graph Fig. 7. Reversal probability score estimated from the causal graph. Nec (blue) is the probability that the prediction would change by lowering the value of that variable for a company whose rating is predicted to be high. Suf (orange) is the probability that the prediction would change by increasing the value of the variable for a company whose rating is predicted to be low. [3] A. D inte [4] D. (XA [5] S. B Con vol. [6] S. M R. K to g vol. [7] S. M pred ing [8] T. H exp plex Pro [9] J. K mac arX [10] S. G usin Inte [11] P. S Sea [12] J. P brid [13] C. S thin [14] Y. Z P. S arX [15] T. I pac vol. [16] P. B Ler, [Co [17] P. S SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 8 / 17
9. ### Example: Mutual performance enhancement with LLM (Large Language Model) [Takayama

et al., 2024] Statistical Causal Prompting Use LLM to obtain ”background” knowledge Use causal discovery to infer the remaining unknown relations 2nd step: Knowledge Generation on Causal Relationships by the LLM with ZSCOT 3rd step: Knowledge Integration and Evaluation of the Probability of Causal Relationships with the LLM 1st step: Data-Driven Causal Discovery (without any constraints on the edges) Dyspnoea X-ray Cancer Smoker Pollution True True True True Low True False True False High … … … … … SCD: statistical causal discovery Input: Dataset Smoker Output: Causal Graph Pollution X-ray Dyspnoea 4th step: Retrying Causal Discovery (with the constraints determined in 3rd step) Transforming the probability matrix generated in 3rd step into background knowledge for the causal discovery Input: prompts with the result of causal discovery Output: response with the qualitative discussion on causal relationships in detail as domain experts Input: prompts with the dialog in 2nd step Output: <yes> or <no> with its log probability Input: Dataset + constraints Output: Modified Causal Graph Smoker Pollution X-ray Dyspnoea Plausible both from domain experts’ and statistical points of view! Most likely response: Yes (log probability: -0.0408) 2nd most likely response: No (log probability: -3.219) → yes:96%, no: 4% According to the result shown above, the state of “cancer” may have no direct impact on the state of “X-ray”. Please explain whether this statistically suggested hypothesis is plausible with an explanation that leverages your expert knowledge on this causal relationship. Dyspnoea X-ray Cancer Smoker Pollution True True True True Low … … … … … Constraints: Cancer → X-ray: forced, X-ray → Cancer: forbidden, etc. Not so plausible from both domain experts’ point of view. Considering objectively this discussion above, if the state of “cancer” is changed, will it have a direct impact on the state of “X-ray”? Please answer this question with <yes> or <no>.. While it is suggested that there may not be a direct causal effect from the state of “Cancer” to the state of “X-ray”, it seems to be not plausible considering the domain knowledge, as explained below: … SCD: statistical causal discovery Cancer Cancer SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 9 / 17
10. ### Evaluated on a leakage-free dataset, a health checkup data Better

performance in terms of domain expertise (a) Without prior knowledge (b) With prior knowledge generated from Pattern2 and 4 The proposed method of generating prior knowledge with SCP in GPT-4 HbA1c Age DBP SBP Waist BMI LDL 0.11 0.08 0.32 0.07 0.10 0.75 0.04 0.30 0.03 0.21 0.89 0.30 -0.30 HbA1c Age DBP SBP Waist BMI LDL 0.05 0.17 0.90 0.34 0.29 0.10 0.20 0.79 SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 10 / 17
11. ### Extension: Linear unobserved variable models LiNGAM with unobserved common causes

[Tashiro et al., 2014, Maeda and Shimizu, 2020] xi = par(xi ) bij xj + par(xi ) λikuk + ei Complete algorithm for a general case: Non-ancestral bow-free acyclic path diagrams (non-ancestral BAPs) [Wang and Drton, 2023] Tashiro et al. [2014] finds all ancestral relations if the graph is ancestral BAPs [Wang and Drton, 2023] RCD: Repetitive causal discovery of linear non-Gaussian acyclic models with latent confounders Table 1: The results of the application to sociological data. Bidirected arrows (Latent confounders) Directed arrows (Causality) Method # of estimation # of successes Precision # of estimation # of successes Precision RCD 4 4 1.0 5 4 0.8 FCI 3 3 1.0 3 1 0.3 RFCI 3 3 1.0 3 1 0.3 GFCI 0 0 0.0 0 0 0.0 PC - - - 2 1 0.5 GES - - - 2 1 0.5 RESIT - - - 12 4 0.3 LiNGAM - - - 5 4 0.8 1 265 1 6 265 ( 4 6 2 325 )65 265 )65 6 265 )65 25 64 Figure 4: Variables and causal relations in the General Social Survey data set used for the evaluation. 1 265 1 6 265 ( 4 6 2 325 )65 265 )65 6 265 )65 25 64 Figure 5: Causal graph produced by RCD: The dashed arrow, x3 Ω x5 is incorrect inference, but the other arrows are reasonable based on Figure 4 Maeda and Shimizu [2020] SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 11 / 17
12. ### Extension: Nonlinear unobserved variable models Causal additive models with unobserved

variables [Maeda and Shimizu, 2021]: Acyclicity and kind of faithfulness xi = observed par(xi ) f (i) j (xj ) + unobserved par(xi ) g(i) k (uk) + ei Extends LiNGAM in two ways Hidden common causes (Additive) nonlinearity Background knowledge can be incorporated [Maeda and Shohei, 2024] More constraints [Schultheiss and B¨ uhlmann, 2024] miparametric approach: e models with unobserved variables (Maeda & Shimizu, 2021) kind of faithfulness) in two ways auses inearity 14 =>(@!) ) A ; (!A ) + ∑ UVOPQR>SRT <=>(@!) ' W ; (0W ) +&; Model Output !! !" "" !# !\$ !% "! !& !' !! !" !# !\$ !% !& !' Underlying structure Underlying model Estimated SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 12 / 17
13. ### Final summary Many well-developed methods available when causal graphs are

known from background knowledge Helping draw causal graphs with data is the key: Causal discovery Software Python packages for LiNGAM-related methods [Ikeuchi et al., 2023] and non-parametric methods [Zheng et al., 2024, Kalisch et al., 2012] Commercial software (no-code tools) Causalas by SCREEN AS, Node AI by NTT Communications, and more For more information on these methods and their applications, see https://www.shimizulab.org/lingam/lingampapers on my website (google Shohei Shimizu) SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 13 / 17
14. ### Reference I Sainyam Galhotra, Romila Pradhan, and Babak Salimi. Explaining

black-box algorithms using probabilistic contrastive counterfactuals. In Proceedings of the 2021 International Conference on Management of Data, pages 577–590, 2021. Patrik O. Hoyer, Dominik Janzing, Joris Mooij, Jonas Peters, and Bernhard Sch¨ olkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21, pages 689–696. Curran Associates Inc., 2009. Takashi Ikeuchi, Mayumi Ide, Yan Zeng, Takashi Nicholas Maeda, and Shohei Shimizu. Python package for causal discovery based on lingam. Journal of Machine Learning Research, 24 (14):1–8, 2023. URL http://jmlr.org/papers/v24/21-0321.html. Markus Kalisch, Martin M¨ achler, Diego Colombo, Marloes H Maathuis, and Peter B¨ uhlmann. Causal inference using graphical models with the R package pcalg. Journal of Statistical Software, 47(11):1–26, 2012. G. Lacerda, P. Spirtes, J. Ramsey, and P. O. Hoyer. Discovering cyclic causal models by independent components analysis. In Proc. 24th Conference on Uncertainty in Artificial Intelligence (UAI2008), pages 366–374, 2008. Takashi Nicholas Maeda and Shohei Shimizu. RCD: Repetitive causal discovery of linear non-Gaussian acyclic models with latent confounders. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (AISTATS2010), volume 108 of Proceedings of Machine Learning Research, pages 735–745. PMLR, 26–28 Aug 2020. SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 14 / 17
15. ### Reference II Takashi Nicholas Maeda and Shohei Shimizu. Causal additive

models with unobserved variables. In Proc. 37th Conference on Uncertainty in Artificial Intelligence (UAI2021), pages 97–106. PMLR, 2021. Takashi Nicholas Maeda and Shimizu Shohei. Use of prior knowledge to discover causal additive models with unobserved variables and its application to time series data. arXiv preprint arXiv:2401.07231, 2024. F. H. Messerli. Chocolate consumption, cognitive function, and Nobel laureates. New England Journal of Medicine, 367:1562–1564, 2012. Gunwoong Park and Garvesh Raskutti. Learning quadratic variance function (QVF) DAG models via overdispersion scoring (ODS). Journal of Machine Learning Research, 18:224–1, 2017. Judea Pearl. Probabilities of causation: Three counterfactual interpretations and their identification. Synthese, 121:93–149, 1999. Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000. Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Sch¨ olkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15:2009–2053, 2014. Christoph Schultheiss and Peter B¨ uhlmann. Assessing the overall and partial causal well-specification of nonlinear additive noise models. Journal of Machine Learning Research, 25(159):1–41, 2024. URL http://jmlr.org/papers/v25/23-1397.html. SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 15 / 17
16. ### Reference III Shohei Shimizu. LiNGAM: Non-Gaussian methods for estimating causal

structures. Behaviormetrika, 41(1):65–98, 2014. Shohei Shimizu. Statistical Causal Discovery: LiNGAM Approach. Springer, Tokyo, 2022. Shohei Shimizu, Patrik O. Hoyer, Aapo Hyv¨ arinen, and Antti Kerminen. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:2003–2030, 2006. Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. MIT Press, 2001. 2nd ed. D. Takahashi, S. Shimizu, and T. Tanaka. Counterfactual explanations of black-box machine learning models using causal discovery with applications to credit rating. In Proc. Int. Joint Conf. on Neural Networks (IJCNN2024), part of the 2024 IEEE World Congress on Computational Intelligence (WCCI2024), 2024. Masayuki Takayama, Tadahisa Okuda, Thong Pham, Tatsuyoshi Ikenoue, Shingo Fukuma, Shohei Shimizu, and Akiyoshi Sannai. Integrating large language models in causal discovery: A statistical causal approach. arXiv preprint arXiv:2402.01454, 2024. Tatsuya Tashiro, Shohei Shimizu, Aapo Hyv¨ arinen, and Takashi Washio. ParceLiNGAM: A causal ordering method robust against latent confounders. Neural Computation, 26(1): 57–83, 2014. Y. Samuel Wang and Mathias Drton. Causal discovery with unobserved confounding and non-gaussian data. Journal of Machine Learning Research, 24(271):1–61, 2023. URL http://jmlr.org/papers/v24/21-1329.html. SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 16 / 17
17. ### Reference IV Wenjuan Wei, Lu Feng, and Chunchen Liu. Mixed

causal structure discovery with application to prescriptive pricing. In Proc. 27rd International Joint Conference on Artificial Intelligence (IJCAI2018), pages 5126–5134, 2018. Yan Zeng, Shohei Shimizu, Hidetoshi Matsui, and Fuchun Sun. Causal discovery for linear mixed data. In Proceedings of the First Conference on Causal Learning and Reasoning (CLeaR2022), volume 177 of Proceedings of Machine Learning Research, pages 994–1009. PMLR, 11–13 Apr 2022. K. Zhang and L.-W. Chan. ICA with sparse connections. In Proc. 7th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2006), pages 530–537, 2006. K. Zhang and A. Hyv¨ arinen. On the identifiability of the post-nonlinear causal model. In Proc. 25th Conference on Uncertainty in Artificial Intelligence (UAI2009), pages 647–655, 2009. Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, and Kun Zhang. Causal-learn: Causal discovery in python. Journal of Machine Learning Research, 25(60):1–8, 2024. URL http://jmlr.org/papers/v25/23-0970.html. SHIMIZU Shohei (Shiga Univ & RIKEN) 5th July 2024 17 / 17