Linear non-Gaussian models with latent variables for causal discovery (for the NeurIPS2020 Workshop)

Slide 1

Slide 1 text

Linear non-Gaussian models with latent variables for causal discovery Shohei Shimizu Shiga University and RIKEN The 2020 NeurIPS Workshop on Causal Discovery and Causality-Inspired Machine Learning

Slide 2

Slide 2 text

Statistical Causal Inference • Infer causal relations from data – Intervention effects – Counterfactuals • If we had increased chocolate consumption, would the number of Nobel laureates increase? • To what extent? 2 Messerli, (2012), New England Journal of Medicine

Slide 3

Slide 3 text

Ordinary (?) causal inference 1. Decide which quantity to be estimated: Intervention effect 2. Draw the causal graph based on background knolwedge 3. Derive which variables should be used for adjustment 4. Observe and adjust for the variables (if any), and estimate the intervention effect as follows 3 Chocolate Nobel GDP 𝐸 Nobel 𝑑𝑜 Chocolate) = 𝐸!"#$"%&'( ")*+(,') -.# [𝐸 Nobel Chocolate, variables adjusted for)] 𝐸 Nobel 𝒅𝒐 Cholocolate = a lot) − 𝐸 Nobel 𝒅𝒐(Chocolate = not much))

Slide 4

Slide 4 text

Causal discovery • Infer the causal graph in data-driven ways • Need assumptions to infer the causal graph – Various methods for different assumptions – Basic setup • All the common causes are measured • Acyclicity 4 𝑥! 𝑥! ? or 𝑥" 𝑥! 𝑥! or 𝑥" or … 𝑥# : Chocolate 𝑥! : Nobel 𝑥" : GDP

Slide 5

Slide 5 text

Causal discovery is a challenge of causal inference (Pearl, 2019) • Classic methods use conditional independence of variables (Pearl 2001; Spirtes 1993) – The limit is finding the Markov equivalent models • Need more assumptions to go beyond the limit – Restrictions on the functional forms or/and the distributions of variables • LiNGAM is an example (Shimizu et al., 2006; Shimizu, 2014) – non-Gaussian assumption to examine independence – Unique identification possible 5

Slide 6

Slide 6 text

How independence and non-Gaussianity work? (Shimizu et al., 2011) 6 𝑥/ = 𝑏/0𝑒0 + 𝑒/ and 𝑟0 (/) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual 𝑥0 = 𝑒0 and 𝑟/ (0) are independent 𝑥# = 𝑒# 𝑥$ = 𝑏$# 𝑥# + 𝑒$ (𝑏$# ≠ 0) 𝑥$ 𝑥# 𝑒# 𝑒$ 𝑟# ($) = 𝑥# − cov 𝑥#, 𝑥$ var 𝑥$ 𝑥$ = 1 − '!"()* +",+! *-. +! 𝑒# − '!"*-. +" *-. +! 𝑒$ 𝑟$ (#) = 𝑥$ − cov 𝑥$, 𝑥# var 𝑥# 𝑥# = 𝑥$ − 𝑏$#𝑥# = 𝑒$ 𝑒# , 𝑒" are non-Gaussian

Slide 7

Slide 7 text

How independence and non-Gaussianity work? (Shimizu et al., 2011) 7 𝑥/ = 𝑏/0𝑒0 + 𝑒/ and 𝑟0 (/) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual 𝑥0 = 𝑒0 and 𝑟/ (0) are independent 𝑥# = 𝑒# 𝑥$ = 𝑏$# 𝑥# + 𝑒$ (𝑏$# ≠ 0) 𝑥$ 𝑥# 𝑒# 𝑒$ 𝑟# ($) = 𝑥# − cov 𝑥#, 𝑥$ var 𝑥$ 𝑥$ = 1 − '!"()* +",+! *-. +! 𝑒# − '!"*-. +" *-. +! 𝑒$ 𝑟$ (#) = 𝑥$ − cov 𝑥$, 𝑥# var 𝑥# 𝑥# = 𝑥$ − 𝑏$#𝑥# = 𝑒$ 𝑒# , 𝑒" are non-Gaussian

Slide 8

Slide 8 text

How independence and non-Gaussianity work? (Shimizu et al., 2011) 8 𝑥/ = 𝑏/0𝑒0 + 𝑒/ and 𝑟0 (/) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual 𝑥0 = 𝑒0 and 𝑟/ (0) are independent 𝑥# = 𝑒# 𝑥$ = 𝑏$# 𝑥# + 𝑒$ (𝑏$# ≠ 0) 𝑥$ 𝑥# 𝑒# 𝑒$ 𝑟# ($) = 𝑥# − cov 𝑥#, 𝑥$ var 𝑥$ 𝑥$ = 1 − '!"()* +",+! *-. +! 𝑒# − '!"*-. +" *-. +! 𝑒$ 𝑟$ (#) = 𝑥$ − cov 𝑥$, 𝑥# var 𝑥# 𝑥# = 𝑥$ − 𝑏$#𝑥# = 𝑒$ 𝑒# , 𝑒" are non-Gaussian

Slide 9

Slide 9 text

Other identifiable models • Continuous variables – Nonlinearity + “additive” noise (Hoyer+08NIPS, Zhang+09UAI, Peters+14JMLR) • 𝑥3 = 𝑓3(par(𝑥3)) + 𝑒3 • 𝑥3 = 𝑔3 40(𝑓3(par(𝑥3)) + 𝑒3) • Discrete variables – Poisson DAG model and its extensions (Park+18JMLR) • Mixed types of variables: Continuous and discrete variables – A logistic-distribution assumption for discrete vars (Two variables) (Wenjuan+18IJCAI) 9

Slide 10

Slide 10 text

Python toolbox https://github.com/cdt15/lingam 10 • ICA-based LiNGAM algorithm • DirectLiNGAM • AR-LiNGAM and VARMA-LiNGAM • LiNGAM for multiple datasets • (Bottomup-) ParceLiNGAM • Nonlinear method: ANM Planning to implement more JNQPSUMJOHBN GSPNHSBQIWJ[JNQPSU%JHSBQI OQTFU@QSJOUPQUJPOT QSFDJTJPO TVQQSFTT5SVF TFFE FQTF σʔλΛ࡞੒ EFGNBLF@HSBQI EBH E%JHSBQI FOHJOFEPU JGDPFGJOEBH GPSGSPN@ UP DPFGJO[JQ EBH<GSPN> EBH<UP> EBH<DPFG> EFEHF GY\GSPN@^ GY\UP^ MBCFMG\DPFGG^ FMTF GPSGSPN@ UPJO[JQ EBH<GSPN> EBH<UP> EFEHF GY\GSPN@^ GY\UP^ MBCFM SFUVSOE x3 x0 3.00 x2 6.00 x5 4.00 x4 8.00 x1 3.00 1.00 2.00 EBH\ GSPN< > UP< > DPFG< > ^ NBLF@HSBQI EBH Bootstrap prob. Causal graph

Slide 11

Slide 11 text

Applications https://sites.google.com/view/sshimizu06/lingam/lingampapers/applications-and-tailor-made-methods 11 Epidemiology Economics Sleep problems Depression mood Sleep problems Depression mood ? or OpInc.gr(t) Empl.gr(t) Sales.gr(t) R&D.gr(t) Empl.gr(t+1) Sales.gr(t+1) R&D(.grt+1) OpInc.gr(t+1) Empl.gr(t+2) Sales.gr(t+2) R&D.gr(t+2) OpInc.gr(t+2) (Moneta et al., 2012) (Rosenstrom et al., 2012) Neuroscience Chemistry (Campomanes et al., 2014) (Boukrina & Graves, 2013)

Slide 12

Slide 12 text

Typical requirements/questions from users • Hidden common causes • Background knowledge • Mixed types of variables - Marketing research • Cyclicity – Biology • Multicolinearity - Manufacturing 12

Slide 13

Slide 13 text

LiNGAM with hidden common causes (Hoyer, Shimizu, Kerminen, & Palviainen, 2008) • Example causal graph • The model: 13 𝑥$ 𝑥# 𝑥E = 𝜆EE 𝑓E + 𝑒E 𝑥F = 𝑏FE 𝑥E + 𝜆FE 𝑓E + 𝑒F • Model: • Its matrix form: 𝑥! = # "#! 𝑏!" 𝑥" + # $%& ' 𝜆!$ 𝑓$ + 𝑒! 𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝑒# 𝑒$ 𝑓#

Slide 14

Slide 14 text

1. Estimate causal structures of variables that share hidden common causes • ICA: Independent Component Analysis (Comon, 1991; Eriksson et al., 2004; Hyvarinen et al., 2001) – Independent components are independent and non-Gaussian • LiNGAM with hidden common causes is ICA 14 𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝒙 = (𝐼 − 𝐵)(& (𝐼 − 𝐵)(&𝛬 𝒆 𝒇 LiNGAM with hidden common causes ICA

Slide 15

Slide 15 text

Basic idea (Hoyer, Shimizu, Kerminen & Palviainen, 2008) • All the three models are identifiable ICA – The zero/non-zero patterns of the mixing matrices are different under the faithfulness – Apply ICA and see the zero/non-zero pattern 15 𝑥0 𝑥/ = 1 0 𝜆00 𝑏/0 1 𝜆/0 + 𝜆/0𝜆00 𝑒0 𝑒/ 𝑓0 𝑥$ 𝑥# 𝑓# 𝑒# 𝑒$ 𝑏"! 𝜆"! 𝜆!! 𝑥0 𝑥/ = 1 𝑏0/ 𝜆00 + 𝑏0/𝜆/0 0 1 𝜆/0 𝑒0 𝑒/ 𝑓0 𝑥$ 𝑥# 𝑓# 𝑒# 𝑒$ 𝑏!" 𝜆"! 𝜆!! 𝑥0 𝑥/ = 1 0 𝜆00 0 1 𝜆/0 𝑒0 𝑒/ 𝑓0 𝑥$ 𝑥# 𝑓# 𝑒# 𝑒$ 𝜆"! 𝜆!!

Slide 16

Slide 16 text

Identifiability (for more than 2 variables) (Salehkaleybar et al., 2020) • Causal orders are identifiable, but intervention effects are not in some cases – If no overlap in descendants of observed variables and hidden common causes, the causal orders and intervention effects are identifiable – If there is some overlap, only causal orders are identifiable, their intervention effects are not 16 𝑥0 𝑥/ = 1 0 𝜆00 𝑏/0 1 𝜆/0 + 𝜆/0𝜆00 𝑒0 𝑒/ 𝑓0 𝑥$ 𝑥# 𝑓# 𝑒# 𝑒$ 𝑏"! 𝜆"! 𝜆!! 𝑥0 𝑥/ 𝑥5 = 1 0 0 𝜆00 𝑏/0 1 0 𝜆/0 + 𝜆/0𝜆00 0 0 1 𝜆50 𝑒! 𝑒" 𝑒# 𝑓 ! 𝑥$ 𝑥# 𝑓# 𝑒# 𝑒$ 𝑏"! 𝜆"! 𝜆!! Overlap No overlap 𝑥0 𝑒0 𝜆#!

Slide 17

Slide 17 text

2. Estimate causal structures of variables that do not share hidden common causes • Find unconfounded pairs of variables and confounded pairs • For unconfounded pairs, estimate the causal directions • Tashiro, Shimizu, Hyvarinen, and Washio (2014), Maeda and Shimizu (2020), Wang and Drton (2020) 17 𝑥$ 𝑥# 𝑓# 𝑥0 Underlying model Output 𝑥1 𝑥$ 𝑥# 𝑥0 𝑥1 𝑓$

Slide 18

Slide 18 text

Final summary • Causal structure learning in the presence of hidden common causes – A challenge of causal discovery – Independence matters rather than uncorrelatedness • Other important topics: – Mixed data with continuous and discrete variables (Wenjuan+18IJCAI) – Cyclic models (Lacerda+08UAI) – Background knowledge – More collaborations with domain experts • Other latent variable models – Latent factors (Shimizu et al., 2009) – latent class (Shimizu et al., 2008) etc. 18 Y. Zeng, S. Shimizu, R. Cai, F. Xie, M. Yamamoto, Z. Hao (2020, arXiv preprint)

Slide 19

Slide 19 text

References • J. Pearl. The seven tools of causal inference with reflections on machine learning. Communications of the ACM, 62(3), 54-60, 2019 • S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 2003--2030, 2006 • S. Shimizu. LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika, 41(1): 65--98, 2014 • S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. O. Hoyer and K. Bollen. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12(Apr): 1225--1248, 2011. • J. Pearl. Causality. Cambridge University Press, 2001. • P. Spirtes, C. Glymour, R. Scheines. Causation, Prediction, and Search. Springer, 1993. • P. O. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Schölkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21 (NIPS2008), pp. 689-696, 2009. • K. Zhang and A. Hyvärinen. On the identifiability of the post-nonlinear causal model. In Proc. 25th Conf. on Uncertainty in Artificial Intelligence (UAI2009), pp. 647-655, Montreal, Canada, 2009. • J. Peters, J. Mooij, D. Janzing and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15: 2009--2053, 2014. 19

Slide 20

Slide 20 text

References • G. Park and G. Raskutti. Learning quadratic variance function (QVF) DAG models via OverDispersion Scoring (ODS). Journal of Machine Learning Research, 18: 1-44, 2018. • W. Wenjuan, F. Lu, and L. Chunchen. Mixed Causal Structure Discovery with Application to Prescriptive Pricing. In Proc. 27th International Joint Conference on Artificial Intelligence (IJCAI2018), pp. xx--xx, Stockholm, Sweden, 2018. • T. Rosenström, M. Jokela, S. Puttonen, M. Hintsanen, L. Pulkki-Råback, J. S. Viikari, O. T. Raitakari and L. Keltikangas-Järvinen. Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PLoS ONE, 7(11): e50841, 2012 • A. Moneta, D. Entner, P. O. Hoyer and A. Coad. Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics, 75(5): 705-730, 2013. • O. Boukrina and W. W. Graves. Neural networks underlying contributions from semantics in reading aloud. Frontiers in Human Neuroscience, 7:518, 2013. • P. Campomanes, M. Neri, B. A.C. Horta, U. F. Roehrig, S. Vanni, I. Tavernelli and U. Rothlisberger. Origin of the spectral shifts among the early intermediates of the rhodopsin photocycle. Journal of the American Chemical Society, 136(10): 3842-3851, 2014 • P. O. Hoyer, S. Shimizu, A. Kerminen and M. Palviainen. Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2): 362-378, 2008 • P. Comon. Independent component analysis, a new concept? Signal processing, 1994 • J. Eriksson, V. Koivunen. Identifiability, separability, and uniqueness of linear ICA models. IEEE signal processing letters, 2004 20

Slide 21

Slide 21 text

References • A. Hyvärinen, J. Karhunen, E. Oja. Independent Component Analysis, Wiley, 2001 • S. Salehkaleybar, A. Ghassami, N. Kiyavash, K. Zhang. Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables. Journal of Machine Learning Research, 21:1-24, 2020 • T. Tashiro, S. Shimizu, A. Hyvärinen and T. Washio. ParceLiNGAM: A causal ordering method robust against latent confounders. Neural Computation, 26(1): 57--83, 2014 • T. N. Maeda, S. Shimizu. RCD: Repetitive causal discovery of linear non-Gaussian acyclic models with latent confounders. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (AISTATS2020), 2020 • Y. S. Wang, M. Drton. Causal Discovery with Unobserved Confounding and non-Gaussian Data. Arxiv preprint arXiv:2007.11131, 2020 • G. Lacerda, P. Spirtes, J. Ramsey and P. O. Hoyer. Discovering cyclic causal models by independent components analysis. In Proc. 24th Conf. on Uncertainty in Artificial Intelligence (UAI2008), pp. 366-374, Helsinki, Finland, 2008. • S. Shimizu, P. O. Hoyer and A. Hyvärinen. Estimation of linear non-Gaussian acyclic models for latent factors. Neurocomputing, 72: 2024-2027, 2009. • Y. Zeng, S. Shimizu, R. Cai, F. Xie, M. Yamamoto, Z. Hao. Causal Discovery with Multi-Domain LiNGAM for Latent Factors. Arxiv preprint arXiv:2009.09176, 2020. • S. Shimizu and A. Hyvärinen. Discovery of linear non-gaussian acyclic models in the presence of latent classes. In Proc. 14th Int. Conf. on Neural Information Processing (ICONIP2007), pp. 752-761, Kitakyushu, Japan, 2008. 21