Shohei SHIMIZU
December 17, 2023
150

# Non-Gaussian methods for causal discovery

Shohei Shimizu (17 Dec 2023)
Non-Gaussian methods for causal discovery
16th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2023), Berlin
Organized Invited Session: Statistical Learning of Non-Gaussian Data

## Shohei SHIMIZU

December 17, 2023

## Transcript

1. Non-Gaussian methods for
causal discovery
Shohei Shimizu
Shiga University and RIKEN
CMStatistics2023 Berlin
Organized Session: Statistical Learning of Non-Gaussian Data

2. What is causal discovery?
• Methodology for inferring causal graphs using data
• Help select covariates in causal effect estimation
2
Maeda and Shimizu (2020)
Assumptions
• Functional form?
• Distribution?
• Hidden common
cause present?
• Acyclic? etc.
Data Causal graph

3. Applications
3
Epidemiology Economics
OpInc.gr(t)
Empl.gr(t)
Sales.gr(t)
R&D.gr(t)
Empl.gr(t+1)
Sales.gr(t+1)
R&D(.grt+1)
OpInc.gr(t+1)
Empl.gr(t+2)
Sales.gr(t+2)
R&D.gr(t+2)
OpInc.gr(t+2)
(Moneta et al., 2012)
(Rosenstrom et al., 2012)
Neuroscience Chemistry
(Campomanes et al., 2014)
(Ogawa et al., 2022)
Prevention Medicine
(Kotoku et al., 2020)
Finance
(Jiang & Shimizu, 2023)
Sleep
problems
Depression
mood
Sleep
problems
Depression
mood ?
or

4. Methods of causal discovery
4

5. Non-parametric approach: Example
(Spirtes et al., 1993; 2001)
1. Make assumptions on the underlying causal graph
– Directed acyclic graph
– No hidden common causes (all have been observed)
2. Find the graph that best matches the data among such causal graphs
that satisfy the assumptions.
5
If x and y are independent in the data, select (c) on the right.
If x and y are dependent in the data, select (a) and (b).
(a) and (b) are indistinguishable: Markov Equivalence class
Three candidates
x y x y x y
(a) (b) (c)

• Semiparametric approach
• E.g., linearity + non-Gaussian continuous
distribution results in different dist. of x and y
(Shimizu, Hoyer, Hyvarinen & Kerminen, 2006; Shimizu, 2022)
6
No difference in terms of their conditional independence
x y x y
(a) (b)

7. Semiparametric approach:
Example identifiable models
• Linear Non-Gaussian Acyclic Model: LiNGAM (Shimizu et al., 2006)
(Hoyer et al. 2009, Zhang & Hyvarinen, 2009, Peters et al. 2014)
• Discrete variable model or mixed cases
(Park et al., 2018; Wei et al., 2018; Zeng et al., 2022)
7
𝑥! 𝑥"
𝑥#
Causal graph identifiable
𝑥!
= #
"#\$(&!)
𝑏!(
𝑥(
+ 𝑒!
𝑒#
𝑒! 𝑒"
𝑥!
= 𝑔!
)*(𝑓!
(par(𝑥!
)) + 𝑒!
)
𝑥!
= 𝑓!
(par(𝑥!
)) + 𝑒!

8. How independence and non-Gaussianity work?
(Shimizu et al., 2011)
8
𝑥! = 𝑏!"𝑒" + 𝑒!
and 𝑟"
(!) are dependent,
although they are uncorrelated
Underlying model
Regress effect on cause Regress cause on effect
Residual
𝑥" = 𝑒"
and 𝑟!
(") are independent
𝑥! = 𝑒!
𝑥" = 𝑏"!𝑥! + 𝑒" (𝑏"!≠ 0)
𝑥" 𝑥!
𝑒!
𝑒"
𝑟"
(!) = 𝑥" −
cov 𝑥", 𝑥!
var 𝑥!
𝑥!
= 1 − %!"&'( )",)!
(+, )!
𝑒" − %!"(+, )"
(+, )!
𝑒!
𝑟!
(") = 𝑥! −
cov 𝑥!, 𝑥"
var 𝑥"
𝑥"
= 𝑥! − 𝑏!"𝑥"
= 𝑒!
𝑒!
, 𝑒"
are non-Gaussian

9. Hidden common causes
9

10. Semiparametric approach:
Linear non-Gaussian case
• Dependence btw explanatory variables and the
regression residuals implies existence of hidden
variables and/or wrong causal direction (Tashiro et al., 2014)
– Regress 𝑥*
on 𝑥9
(in the presence of 𝑈)
– The residual and 𝑥9
not independent because of hidden 𝑈
10
𝑥! 𝑥"
𝑈
𝑥!
𝑥"
𝑒"
𝑒!
𝑟"
(!) = 𝑥" −
cov 𝑥", 𝑥!
var 𝑥!
𝑥!
𝑥! = (𝑏!"𝜆" + 𝜆!)𝑢 + 𝑏!"𝑒" + 𝑒!
= 𝜆" − &'( )",)!
(+, )!
𝑏!"𝜆" + 𝜆! 𝑢 + 1 − &'( )",)!
(+, )!
𝑏!" 𝑒" − &'( )",)!
(+, )!
𝑒!
𝜆#
𝜆\$
𝑏\$#

11. Semiparametric approach:
Causal additive models with unobserved variables
(Maeda & Shimizu, 2021)
• Acyclicity (and kind of faithfulness)
• Extends LiNGAM in two ways
– Hidden common causes
• Can be applied to time series cases like structural VAR
(Maeda & Shimizu, in prep.)
11
𝑥!
=∑=>?@\$A@B "#\$(&!)
𝑓(
! (𝑥(
) + ∑CD=>?@\$A@B "#\$(&!)
𝑔E
! (𝑢E
) +𝑒!
Model Output
!!
!"
""
!#
!\$
!%
"!
!&
!'
!!
!"
!#
!\$
!%
!&
!'
Underlying structure

12. Codes and Software
12

13. Python packages
and other no-code tools
• Semiparametric: LiNGAM (Ikeuchi et al., 2023)
and causal-learn (Zheng et al., 2023)
• Nonparametric: pcalg (Kalisch et al., 2012)
, causal-learn, Tigramite
• Commercial software (no-code tools)
– Causalas by SCREEN AS, Node AI by NTT Communications, Ntech Predict by neutral,
Causal analysis by NEC
13
2019/08/20 20(06
tLiNGAM.IPYNB - Colaboratory
JNQPSUOVNQZBTOQ
JNQPSUQBOEBTBTQE
JNQPSUMJOHBN
GSPNHSBQIWJ[JNQPSU%JHSBQI
OQTFU@QSJOUPQUJPOT QSFDJTJPO TVQQSFTT5SVF

TFFE
FQTF
σʔλΛ࡞੒
EFGNBLF@HSBQI EBH

E%JHSBQI FOHJOFEPU

JGDPFGJOEBH
GPSGSPN@ UP DPFGJO[JQ EBH<GSPN> EBH<UP> EBH<DPFG>

EFEHF GY\GSPN@^ GY\UP^ MBCFMG\DPFGG^

FMTF
GPSGSPN@ UPJO[JQ EBH<GSPN> EBH<UP>

EFEHF GY\GSPN@^ GY\UP^ MBCFM

SFUVSOE
x3
x0
3.00
x2
6.00
x5
4.00
x4
8.00
x1
3.00 1.00
2.00
EBH\
GSPN< >
UP< >
DPFG< >
^
NBLF@HSBQI EBH

Total effects and Bootstrap prob.
Causal graph Model Evaluation
Independence of error variables
Classical SEM model fit indices like RMSEA
(semopy)
Peason-correlation 0.03
F-correlation (Bach & Jordan) 0.86

14. Summary
14

15. Statistical causalinference is
a fundamental tool for science
• Many well-developed methods available when causal graphs are
known from background knowledge
• Helping draw causal graphs with data is the key: Causal
discovery
– LiNGAM-related papers: https://www.shimizulab.org/lingam/lingampapers
• Next default assumptions:
– Hidden common causes (Spirtes et al., 1995; Hoyer et al., 2008; Wang & Drton 2023)
– Mixed data: Continuous and discrete variables
(Sedgewick et al., 2019; Wei et al. 2018; Zeng et al., 2022)
– (Cyclicity (Lacerda et al., 2008) & Non-stationarity (Huang et al., 2019))
15