Slide 1

Slide 1 text

Non-Gaussian methods for causal discovery Shohei Shimizu Shiga University and RIKEN CMStatistics2023 Berlin Organized Session: Statistical Learning of Non-Gaussian Data

Slide 2

Slide 2 text

What is causal discovery? • Methodology for inferring causal graphs using data • Help select covariates in causal effect estimation 2 Maeda and Shimizu (2020) Assumptions • Functional form? • Distribution? • Hidden common cause present? • Acyclic? etc. Data Causal graph

Slide 3

Slide 3 text

Applications https://www.shimizulab.org/lingam/lingampapers/applications-and-tailor-made-methods 3 Epidemiology Economics OpInc.gr(t) Empl.gr(t) Sales.gr(t) R&D.gr(t) Empl.gr(t+1) Sales.gr(t+1) R&D(.grt+1) OpInc.gr(t+1) Empl.gr(t+2) Sales.gr(t+2) R&D.gr(t+2) OpInc.gr(t+2) (Moneta et al., 2012) (Rosenstrom et al., 2012) Neuroscience Chemistry (Campomanes et al., 2014) (Ogawa et al., 2022) Prevention Medicine (Kotoku et al., 2020) Finance (Jiang & Shimizu, 2023) Sleep problems Depression mood Sleep problems Depression mood ? or

Slide 4

Slide 4 text

Methods of causal discovery 4

Slide 5

Slide 5 text

Non-parametric approach: Example (Spirtes et al., 1993; 2001) 1. Make assumptions on the underlying causal graph – Directed acyclic graph – No hidden common causes (all have been observed) 2. Find the graph that best matches the data among such causal graphs that satisfy the assumptions. 5 If x and y are independent in the data, select (c) on the right. If x and y are dependent in the data, select (a) and (b). (a) and (b) are indistinguishable: Markov Equivalence class Three candidates x y x y x y (a) (b) (c)

Slide 6

Slide 6 text

Additional information on functional forms and/or distributions helpful • Semiparametric approach • E.g., linearity + non-Gaussian continuous distribution results in different dist. of x and y (Shimizu, Hoyer, Hyvarinen & Kerminen, 2006; Shimizu, 2022) 6 No difference in terms of their conditional independence x y x y (a) (b)

Slide 7

Slide 7 text

Semiparametric approach: Example identifiable models • Linear Non-Gaussian Acyclic Model: LiNGAM (Shimizu et al., 2006) • Nonlinearity + “additive” noise (Hoyer et al. 2009, Zhang & Hyvarinen, 2009, Peters et al. 2014) • Discrete variable model or mixed cases (Park et al., 2018; Wei et al., 2018; Zeng et al., 2022) 7 𝑥! 𝑥" 𝑥# Causal graph identifiable 𝑥! = # "#$(&!) 𝑏!( 𝑥( + 𝑒! 𝑒# 𝑒! 𝑒" 𝑥! = 𝑔! )*(𝑓! (par(𝑥! )) + 𝑒! ) 𝑥! = 𝑓! (par(𝑥! )) + 𝑒!

Slide 8

Slide 8 text

How independence and non-Gaussianity work? (Shimizu et al., 2011) 8 𝑥! = 𝑏!"𝑒" + 𝑒! and 𝑟" (!) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual 𝑥" = 𝑒" and 𝑟! (") are independent 𝑥! = 𝑒! 𝑥" = 𝑏"!𝑥! + 𝑒" (𝑏"!≠ 0) 𝑥" 𝑥! 𝑒! 𝑒" 𝑟" (!) = 𝑥" − cov 𝑥", 𝑥! var 𝑥! 𝑥! = 1 − %!"&'( )",)! (+, )! 𝑒" − %!"(+, )" (+, )! 𝑒! 𝑟! (") = 𝑥! − cov 𝑥!, 𝑥" var 𝑥" 𝑥" = 𝑥! − 𝑏!"𝑥" = 𝑒! 𝑒! , 𝑒" are non-Gaussian

Slide 9

Slide 9 text

Hidden common causes Additional information on functional forms and/or distributions helpful 9

Slide 10

Slide 10 text

Semiparametric approach: Linear non-Gaussian case • Dependence btw explanatory variables and the regression residuals implies existence of hidden variables and/or wrong causal direction (Tashiro et al., 2014) – Regress 𝑥* on 𝑥9 (in the presence of 𝑈) – The residual and 𝑥9 not independent because of hidden 𝑈 10 𝑥! 𝑥" 𝑈 𝑥! 𝑥" 𝑒" 𝑒! 𝑟" (!) = 𝑥" − cov 𝑥", 𝑥! var 𝑥! 𝑥! 𝑥! = (𝑏!"𝜆" + 𝜆!)𝑢 + 𝑏!"𝑒" + 𝑒! = 𝜆" − &'( )",)! (+, )! 𝑏!"𝜆" + 𝜆! 𝑢 + 1 − &'( )",)! (+, )! 𝑏!" 𝑒" − &'( )",)! (+, )! 𝑒! 𝜆# 𝜆$ 𝑏$#

Slide 11

Slide 11 text

Semiparametric approach: Causal additive models with unobserved variables (Maeda & Shimizu, 2021) • Acyclicity (and kind of faithfulness) • Extends LiNGAM in two ways – Hidden common causes – (Additive) nonlinearity • Can be applied to time series cases like structural VAR (Maeda & Shimizu, in prep.) 11 𝑥! =∑=>?@$A@B "#$(&!) 𝑓( ! (𝑥( ) + ∑CD=>?@$A@B "#$(&!) 𝑔E ! (𝑢E ) +𝑒! Model Output !! !" "" !# !$ !% "! !& !' !! !" !# !$ !% !& !' Underlying structure

Slide 12

Slide 12 text

Codes and Software 12

Slide 13

Slide 13 text

Python packages and other no-code tools • Semiparametric: LiNGAM (Ikeuchi et al., 2023) and causal-learn (Zheng et al., 2023) • Nonparametric: pcalg (Kalisch et al., 2012) , causal-learn, Tigramite • Commercial software (no-code tools) – Causalas by SCREEN AS, Node AI by NTT Communications, Ntech Predict by neutral, Causal analysis by NEC 13 2019/08/20 20(06 tLiNGAM.IPYNB - Colaboratory JNQPSUOVNQZBTOQ JNQPSUQBOEBTBTQE JNQPSUMJOHBN GSPNHSBQIWJ[JNQPSU%JHSBQI OQTFU@QSJOUPQUJPOT QSFDJTJPO TVQQSFTT5SVF TFFE FQTF σʔλΛ࡞੒ EFGNBLF@HSBQI EBH E%JHSBQI FOHJOFEPU JGDPFGJOEBH GPSGSPN@ UP DPFGJO[JQ EBH<GSPN> EBH<UP> EBH<DPFG> EFEHF GY\GSPN@^ GY\UP^ MBCFMG\DPFGG^ FMTF GPSGSPN@ UPJO[JQ EBH<GSPN> EBH<UP> EFEHF GY\GSPN@^ GY\UP^ MBCFM SFUVSOE x3 x0 3.00 x2 6.00 x5 4.00 x4 8.00 x1 3.00 1.00 2.00 EBH\ GSPN< > UP< > DPFG< > ^ NBLF@HSBQI EBH Total effects and Bootstrap prob. Causal graph Model Evaluation Independence of error variables Classical SEM model fit indices like RMSEA (semopy) Peason-correlation 0.03 F-correlation (Bach & Jordan) 0.86

Slide 14

Slide 14 text

Summary 14

Slide 15

Slide 15 text

Statistical causalinference is a fundamental tool for science • Many well-developed methods available when causal graphs are known from background knowledge • Helping draw causal graphs with data is the key: Causal discovery – LiNGAM-related papers: https://www.shimizulab.org/lingam/lingampapers • Next default assumptions: – Hidden common causes (Spirtes et al., 1995; Hoyer et al., 2008; Wang & Drton 2023) – Mixed data: Continuous and discrete variables (Sedgewick et al., 2019; Wei et al. 2018; Zeng et al., 2022) – (Cyclicity (Lacerda et al., 2008) & Non-stationarity (Huang et al., 2019)) 15