Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Non-Gaussian methods for causal discovery

Shohei SHIMIZU
December 17, 2023

Non-Gaussian methods for causal discovery

Shohei Shimizu (17 Dec 2023)
Non-Gaussian methods for causal discovery
16th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2023), Berlin
Organized Invited Session: Statistical Learning of Non-Gaussian Data

Shohei SHIMIZU

December 17, 2023
Tweet

More Decks by Shohei SHIMIZU

Other Decks in Science

Transcript

  1. Non-Gaussian methods for causal discovery Shohei Shimizu Shiga University and

    RIKEN CMStatistics2023 Berlin Organized Session: Statistical Learning of Non-Gaussian Data
  2. What is causal discovery? • Methodology for inferring causal graphs

    using data • Help select covariates in causal effect estimation 2 Maeda and Shimizu (2020) Assumptions • Functional form? • Distribution? • Hidden common cause present? • Acyclic? etc. Data Causal graph
  3. Applications https://www.shimizulab.org/lingam/lingampapers/applications-and-tailor-made-methods 3 Epidemiology Economics OpInc.gr(t) Empl.gr(t) Sales.gr(t) R&D.gr(t) Empl.gr(t+1)

    Sales.gr(t+1) R&D(.grt+1) OpInc.gr(t+1) Empl.gr(t+2) Sales.gr(t+2) R&D.gr(t+2) OpInc.gr(t+2) (Moneta et al., 2012) (Rosenstrom et al., 2012) Neuroscience Chemistry (Campomanes et al., 2014) (Ogawa et al., 2022) Prevention Medicine (Kotoku et al., 2020) Finance (Jiang & Shimizu, 2023) Sleep problems Depression mood Sleep problems Depression mood ? or
  4. Non-parametric approach: Example (Spirtes et al., 1993; 2001) 1. Make

    assumptions on the underlying causal graph – Directed acyclic graph – No hidden common causes (all have been observed) 2. Find the graph that best matches the data among such causal graphs that satisfy the assumptions. 5 If x and y are independent in the data, select (c) on the right. If x and y are dependent in the data, select (a) and (b). (a) and (b) are indistinguishable: Markov Equivalence class Three candidates x y x y x y (a) (b) (c)
  5. Additional information on functional forms and/or distributions helpful • Semiparametric

    approach • E.g., linearity + non-Gaussian continuous distribution results in different dist. of x and y (Shimizu, Hoyer, Hyvarinen & Kerminen, 2006; Shimizu, 2022) 6 No difference in terms of their conditional independence x y x y (a) (b)
  6. Semiparametric approach: Example identifiable models • Linear Non-Gaussian Acyclic Model:

    LiNGAM (Shimizu et al., 2006) • Nonlinearity + “additive” noise (Hoyer et al. 2009, Zhang & Hyvarinen, 2009, Peters et al. 2014) • Discrete variable model or mixed cases (Park et al., 2018; Wei et al., 2018; Zeng et al., 2022) 7 𝑥! 𝑥" 𝑥# Causal graph identifiable 𝑥! = # "#$(&!) 𝑏!( 𝑥( + 𝑒! 𝑒# 𝑒! 𝑒" 𝑥! = 𝑔! )*(𝑓! (par(𝑥! )) + 𝑒! ) 𝑥! = 𝑓! (par(𝑥! )) + 𝑒!
  7. How independence and non-Gaussianity work? (Shimizu et al., 2011) 8

    𝑥! = 𝑏!"𝑒" + 𝑒! and 𝑟" (!) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual 𝑥" = 𝑒" and 𝑟! (") are independent 𝑥! = 𝑒! 𝑥" = 𝑏"!𝑥! + 𝑒" (𝑏"!≠ 0) 𝑥" 𝑥! 𝑒! 𝑒" 𝑟" (!) = 𝑥" − cov 𝑥", 𝑥! var 𝑥! 𝑥! = 1 − %!"&'( )",)! (+, )! 𝑒" − %!"(+, )" (+, )! 𝑒! 𝑟! (") = 𝑥! − cov 𝑥!, 𝑥" var 𝑥" 𝑥" = 𝑥! − 𝑏!"𝑥" = 𝑒! 𝑒! , 𝑒" are non-Gaussian
  8. Semiparametric approach: Linear non-Gaussian case • Dependence btw explanatory variables

    and the regression residuals implies existence of hidden variables and/or wrong causal direction (Tashiro et al., 2014) – Regress 𝑥* on 𝑥9 (in the presence of 𝑈) – The residual and 𝑥9 not independent because of hidden 𝑈 10 𝑥! 𝑥" 𝑈 𝑥! 𝑥" 𝑒" 𝑒! 𝑟" (!) = 𝑥" − cov 𝑥", 𝑥! var 𝑥! 𝑥! 𝑥! = (𝑏!"𝜆" + 𝜆!)𝑢 + 𝑏!"𝑒" + 𝑒! = 𝜆" − &'( )",)! (+, )! 𝑏!"𝜆" + 𝜆! 𝑢 + 1 − &'( )",)! (+, )! 𝑏!" 𝑒" − &'( )",)! (+, )! 𝑒! 𝜆# 𝜆$ 𝑏$#
  9. Semiparametric approach: Causal additive models with unobserved variables (Maeda &

    Shimizu, 2021) • Acyclicity (and kind of faithfulness) • Extends LiNGAM in two ways – Hidden common causes – (Additive) nonlinearity • Can be applied to time series cases like structural VAR (Maeda & Shimizu, in prep.) 11 𝑥! =∑=>?@$A@B "#$(&!) 𝑓( ! (𝑥( ) + ∑CD=>?@$A@B "#$(&!) 𝑔E ! (𝑢E ) +𝑒! Model Output !! !" "" !# !$ !% "! !& !' !! !" !# !$ !% !& !' Underlying structure
  10. Python packages and other no-code tools • Semiparametric: LiNGAM (Ikeuchi

    et al., 2023) and causal-learn (Zheng et al., 2023) • Nonparametric: pcalg (Kalisch et al., 2012) , causal-learn, Tigramite • Commercial software (no-code tools) – Causalas by SCREEN AS, Node AI by NTT Communications, Ntech Predict by neutral, Causal analysis by NEC 13 2019/08/20 20(06 tLiNGAM.IPYNB - Colaboratory JNQPSUOVNQZBTOQ JNQPSUQBOEBTBTQE JNQPSUMJOHBN GSPNHSBQIWJ[JNQPSU%JHSBQI OQTFU@QSJOUPQUJPOT QSFDJTJPO TVQQSFTT5SVF TFFE FQTF σʔλΛ࡞੒ EFGNBLF@HSBQI EBH  E%JHSBQI FOHJOFEPU JGDPFGJOEBH GPSGSPN@ UP DPFGJO[JQ EBH<GSPN> EBH<UP> EBH<DPFG>  EFEHF GY\GSPN@^ GY\UP^ MBCFMG\DPFGG^ FMTF GPSGSPN@ UPJO[JQ EBH<GSPN> EBH<UP>  EFEHF GY\GSPN@^ GY\UP^ MBCFM SFUVSOE x3 x0 3.00 x2 6.00 x5 4.00 x4 8.00 x1 3.00 1.00 2.00 EBH\ GSPN<      > UP<      > DPFG<      > ^ NBLF@HSBQI EBH Total effects and Bootstrap prob. Causal graph Model Evaluation Independence of error variables Classical SEM model fit indices like RMSEA (semopy) Peason-correlation 0.03 F-correlation (Bach & Jordan) 0.86
  11. Statistical causalinference is a fundamental tool for science • Many

    well-developed methods available when causal graphs are known from background knowledge • Helping draw causal graphs with data is the key: Causal discovery – LiNGAM-related papers: https://www.shimizulab.org/lingam/lingampapers • Next default assumptions: – Hidden common causes (Spirtes et al., 1995; Hoyer et al., 2008; Wang & Drton 2023) – Mixed data: Continuous and discrete variables (Sedgewick et al., 2019; Wei et al. 2018; Zeng et al., 2022) – (Cyclicity (Lacerda et al., 2008) & Non-stationarity (Huang et al., 2019)) 15