Shohei SHIMIZU
November 23, 2020
450

Linear non-Gaussian models with latent variables for causal discovery (for the NeurIPS2020 Workshop)

Shohei Shimizu
Shiga University and RIKEN

NeurIPS 2020 Workshop on Causal Discovery and Causality-Inspired Machine Learning, Dec. 11 (EST)

Shohei SHIMIZU

November 23, 2020

Transcript

1. Linear non-Gaussian models with latent variables for causal discovery Shohei

Shimizu Shiga University and RIKEN The 2020 NeurIPS Workshop on Causal Discovery and Causality-Inspired Machine Learning
2. Statistical Causal Inference • Infer causal relations from data –

Intervention effects – Counterfactuals • If we had increased chocolate consumption, would the number of Nobel laureates increase? • To what extent? 2 Messerli, (2012), New England Journal of Medicine
3. Ordinary (?) causal inference 1. Decide which quantity to be

estimated: Intervention effect 2. Draw the causal graph based on background knolwedge 3. Derive which variables should be used for adjustment 4. Observe and adjust for the variables (if any), and estimate the intervention effect as follows 3 Chocolate Nobel GDP 𝐸 Nobel 𝑑𝑜 Chocolate) = 𝐸!"#\$"%&'( ")*+(,') -.# [𝐸 Nobel Chocolate, variables adjusted for)] 𝐸 Nobel 𝒅𝒐 Cholocolate = a lot) − 𝐸 Nobel 𝒅𝒐(Chocolate = not much))
4. Causal discovery • Infer the causal graph in data-driven ways

• Need assumptions to infer the causal graph – Various methods for different assumptions – Basic setup • All the common causes are measured • Acyclicity 4 𝑥! 𝑥! ? or 𝑥" 𝑥! 𝑥! or 𝑥" or … 𝑥# : Chocolate 𝑥! : Nobel 𝑥" : GDP
5. Causal discovery is a challenge of causal inference (Pearl, 2019)

• Classic methods use conditional independence of variables (Pearl 2001; Spirtes 1993) – The limit is finding the Markov equivalent models • Need more assumptions to go beyond the limit – Restrictions on the functional forms or/and the distributions of variables • LiNGAM is an example (Shimizu et al., 2006; Shimizu, 2014) – non-Gaussian assumption to examine independence – Unique identification possible 5
6. How independence and non-Gaussianity work? (Shimizu et al., 2011) 6

𝑥/ = 𝑏/0𝑒0 + 𝑒/ and 𝑟0 (/) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual 𝑥0 = 𝑒0 and 𝑟/ (0) are independent 𝑥# = 𝑒# 𝑥\$ = 𝑏\$# 𝑥# + 𝑒\$ (𝑏\$# ≠ 0) 𝑥\$ 𝑥# 𝑒# 𝑒\$ 𝑟# (\$) = 𝑥# − cov 𝑥#, 𝑥\$ var 𝑥\$ 𝑥\$ = 1 − '!"()* +",+! *-. +! 𝑒# − '!"*-. +" *-. +! 𝑒\$ 𝑟\$ (#) = 𝑥\$ − cov 𝑥\$, 𝑥# var 𝑥# 𝑥# = 𝑥\$ − 𝑏\$#𝑥# = 𝑒\$ 𝑒# , 𝑒" are non-Gaussian
7. How independence and non-Gaussianity work? (Shimizu et al., 2011) 7

𝑥/ = 𝑏/0𝑒0 + 𝑒/ and 𝑟0 (/) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual 𝑥0 = 𝑒0 and 𝑟/ (0) are independent 𝑥# = 𝑒# 𝑥\$ = 𝑏\$# 𝑥# + 𝑒\$ (𝑏\$# ≠ 0) 𝑥\$ 𝑥# 𝑒# 𝑒\$ 𝑟# (\$) = 𝑥# − cov 𝑥#, 𝑥\$ var 𝑥\$ 𝑥\$ = 1 − '!"()* +",+! *-. +! 𝑒# − '!"*-. +" *-. +! 𝑒\$ 𝑟\$ (#) = 𝑥\$ − cov 𝑥\$, 𝑥# var 𝑥# 𝑥# = 𝑥\$ − 𝑏\$#𝑥# = 𝑒\$ 𝑒# , 𝑒" are non-Gaussian
8. How independence and non-Gaussianity work? (Shimizu et al., 2011) 8

𝑥/ = 𝑏/0𝑒0 + 𝑒/ and 𝑟0 (/) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual 𝑥0 = 𝑒0 and 𝑟/ (0) are independent 𝑥# = 𝑒# 𝑥\$ = 𝑏\$# 𝑥# + 𝑒\$ (𝑏\$# ≠ 0) 𝑥\$ 𝑥# 𝑒# 𝑒\$ 𝑟# (\$) = 𝑥# − cov 𝑥#, 𝑥\$ var 𝑥\$ 𝑥\$ = 1 − '!"()* +",+! *-. +! 𝑒# − '!"*-. +" *-. +! 𝑒\$ 𝑟\$ (#) = 𝑥\$ − cov 𝑥\$, 𝑥# var 𝑥# 𝑥# = 𝑥\$ − 𝑏\$#𝑥# = 𝑒\$ 𝑒# , 𝑒" are non-Gaussian
9. Other identifiable models • Continuous variables – Nonlinearity + “additive”

noise (Hoyer+08NIPS, Zhang+09UAI, Peters+14JMLR) • 𝑥3 = 𝑓3(par(𝑥3)) + 𝑒3 • 𝑥3 = 𝑔3 40(𝑓3(par(𝑥3)) + 𝑒3) • Discrete variables – Poisson DAG model and its extensions (Park+18JMLR) • Mixed types of variables: Continuous and discrete variables – A logistic-distribution assumption for discrete vars (Two variables) (Wenjuan+18IJCAI) 9
10. Python toolbox https://github.com/cdt15/lingam 10 • ICA-based LiNGAM algorithm • DirectLiNGAM

• AR-LiNGAM and VARMA-LiNGAM • LiNGAM for multiple datasets • (Bottomup-) ParceLiNGAM • Nonlinear method: ANM Planning to implement more JNQPSUMJOHBN GSPNHSBQIWJ[JNQPSU%JHSBQI OQTFU@QSJOUPQUJPOT QSFDJTJPO TVQQSFTT5SVF TFFE FQTF σʔλΛ࡞੒ EFGNBLF@HSBQI EBH  E%JHSBQI FOHJOFEPU JGDPFGJOEBH GPSGSPN@ UP DPFGJO[JQ EBH<GSPN> EBH<UP> EBH<DPFG>  EFEHF GY\GSPN@^ GY\UP^ MBCFMG\DPFGG^ FMTF GPSGSPN@ UPJO[JQ EBH<GSPN> EBH<UP>  EFEHF GY\GSPN@^ GY\UP^ MBCFM SFUVSOE x3 x0 3.00 x2 6.00 x5 4.00 x4 8.00 x1 3.00 1.00 2.00 EBH\ GSPN<      > UP<      > DPFG<      > ^ NBLF@HSBQI EBH Bootstrap prob. Causal graph

problems Depression mood ? or OpInc.gr(t) Empl.gr(t) Sales.gr(t) R&D.gr(t) Empl.gr(t+1) Sales.gr(t+1) R&D(.grt+1) OpInc.gr(t+1) Empl.gr(t+2) Sales.gr(t+2) R&D.gr(t+2) OpInc.gr(t+2) (Moneta et al., 2012) (Rosenstrom et al., 2012) Neuroscience Chemistry (Campomanes et al., 2014) (Boukrina & Graves, 2013)
12. Typical requirements/questions from users • Hidden common causes • Background

knowledge • Mixed types of variables - Marketing research • Cyclicity – Biology • Multicolinearity - Manufacturing 12
13. LiNGAM with hidden common causes (Hoyer, Shimizu, Kerminen, & Palviainen,

2008) • Example causal graph • The model: 13 𝑥\$ 𝑥# 𝑥E = 𝜆EE 𝑓E + 𝑒E 𝑥F = 𝑏FE 𝑥E + 𝜆FE 𝑓E + 𝑒F • Model: • Its matrix form: 𝑥! = # "#! 𝑏!" 𝑥" + # \$%& ' 𝜆!\$ 𝑓\$ + 𝑒! 𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝑒# 𝑒\$ 𝑓#
14. 1. Estimate causal structures of variables that share hidden common

causes • ICA: Independent Component Analysis (Comon, 1991; Eriksson et al., 2004; Hyvarinen et al., 2001) – Independent components are independent and non-Gaussian • LiNGAM with hidden common causes is ICA 14 𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝒙 = (𝐼 − 𝐵)(& (𝐼 − 𝐵)(&𝛬 𝒆 𝒇 LiNGAM with hidden common causes ICA
15. Basic idea (Hoyer, Shimizu, Kerminen & Palviainen, 2008) • All

the three models are identifiable ICA – The zero/non-zero patterns of the mixing matrices are different under the faithfulness – Apply ICA and see the zero/non-zero pattern 15 𝑥0 𝑥/ = 1 0 𝜆00 𝑏/0 1 𝜆/0 + 𝜆/0𝜆00 𝑒0 𝑒/ 𝑓0 𝑥\$ 𝑥# 𝑓# 𝑒# 𝑒\$ 𝑏"! 𝜆"! 𝜆!! 𝑥0 𝑥/ = 1 𝑏0/ 𝜆00 + 𝑏0/𝜆/0 0 1 𝜆/0 𝑒0 𝑒/ 𝑓0 𝑥\$ 𝑥# 𝑓# 𝑒# 𝑒\$ 𝑏!" 𝜆"! 𝜆!! 𝑥0 𝑥/ = 1 0 𝜆00 0 1 𝜆/0 𝑒0 𝑒/ 𝑓0 𝑥\$ 𝑥# 𝑓# 𝑒# 𝑒\$ 𝜆"! 𝜆!!
16. Identifiability (for more than 2 variables) (Salehkaleybar et al., 2020)

• Causal orders are identifiable, but intervention effects are not in some cases – If no overlap in descendants of observed variables and hidden common causes, the causal orders and intervention effects are identifiable – If there is some overlap, only causal orders are identifiable, their intervention effects are not 16 𝑥0 𝑥/ = 1 0 𝜆00 𝑏/0 1 𝜆/0 + 𝜆/0𝜆00 𝑒0 𝑒/ 𝑓0 𝑥\$ 𝑥# 𝑓# 𝑒# 𝑒\$ 𝑏"! 𝜆"! 𝜆!! 𝑥0 𝑥/ 𝑥5 = 1 0 0 𝜆00 𝑏/0 1 0 𝜆/0 + 𝜆/0𝜆00 0 0 1 𝜆50 𝑒! 𝑒" 𝑒# 𝑓 ! 𝑥\$ 𝑥# 𝑓# 𝑒# 𝑒\$ 𝑏"! 𝜆"! 𝜆!! Overlap No overlap 𝑥0 𝑒0 𝜆#!
17. 2. Estimate causal structures of variables that do not share

hidden common causes • Find unconfounded pairs of variables and confounded pairs • For unconfounded pairs, estimate the causal directions • Tashiro, Shimizu, Hyvarinen, and Washio (2014), Maeda and Shimizu (2020), Wang and Drton (2020) 17 𝑥\$ 𝑥# 𝑓# 𝑥0 Underlying model Output 𝑥1 𝑥\$ 𝑥# 𝑥0 𝑥1 𝑓\$
18. Final summary • Causal structure learning in the presence of

hidden common causes – A challenge of causal discovery – Independence matters rather than uncorrelatedness • Other important topics: – Mixed data with continuous and discrete variables (Wenjuan+18IJCAI) – Cyclic models (Lacerda+08UAI) – Background knowledge – More collaborations with domain experts • Other latent variable models – Latent factors (Shimizu et al., 2009) – latent class (Shimizu et al., 2008) etc. 18 Y. Zeng, S. Shimizu, R. Cai, F. Xie, M. Yamamoto, Z. Hao (2020, arXiv preprint)
19. References • J. Pearl. The seven tools of causal inference

with reflections on machine learning. Communications of the ACM, 62(3), 54-60, 2019 • S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 2003--2030, 2006 • S. Shimizu. LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika, 41(1): 65--98, 2014 • S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. O. Hoyer and K. Bollen. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12(Apr): 1225--1248, 2011. • J. Pearl. Causality. Cambridge University Press, 2001. • P. Spirtes, C. Glymour, R. Scheines. Causation, Prediction, and Search. Springer, 1993. • P. O. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Schölkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21 (NIPS2008), pp. 689-696, 2009. • K. Zhang and A. Hyvärinen. On the identifiability of the post-nonlinear causal model. In Proc. 25th Conf. on Uncertainty in Artificial Intelligence (UAI2009), pp. 647-655, Montreal, Canada, 2009. • J. Peters, J. Mooij, D. Janzing and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15: 2009--2053, 2014. 19