Linear non-Gaussian models with latent variables for causal discovery (Pacific Causal Inference Conference)

Slide 1

Slide 1 text

Linear non-Gaussian models with latent variables for causal discovery Shohei Shimizu Shiga University and RIKEN

Slide 2

Slide 2 text

Causal discovery • A challenge of causal inference (Pearl, 2019) • Exploratory analysis for finding causal hypotheses – Infers a causal graph(s) in data-driven ways – Compute intervention effects based on the inferred graph – Leads to developing better hypotheses combined with domain knowledge and useful for designing future surveys and experiments 2 Sleep problems Depression mood Sleep problems Depression mood ? or OpInc.gr(t) Empl.gr(t) Sales.gr(t) R&D.gr(t) Empl.gr(t+1) Sales.gr(t+1) R&D(.grt+1) OpInc.gr(t+1) Empl.gr(t+2) Sales.gr(t+2) R&D.gr(t+2) OpInc.gr(t+2) (Moneta et al., 2012) (Rosenstrom et al., 2012) Chemistry (Campomanes et al., 2014) Epidemiology Economics

Slide 3

Slide 3 text

A linear non-Gaussian acyclic model: LiNGAM (Shimizu et al., 2006; Shimizu, 2014) • Classic methods use conditional independence of variables (Pearl 2001; Spirtes 1993) – The limit is finding the Markov equivalent models • Need more assumptions to go beyond the limit – Restrictions on the functional forms or/and the distributions of variables • LiNGAM is an example – non-Gaussian assumption to examine independence – Unique identification or smaller numbers of equivalent models 3

Slide 4

Slide 4 text

How independence and non-Gaussiniaty work? (Shimizu et al., 2011) 4 ! = !"" + ! and " (!) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual " = " and ! (") are independent ! = ! " = "! ! + " ("! ≠ 0) " ! ! " ! (") = ! − cov !, " var " " = 1 − %!"&'( )",)! (+, )! ! − %!"(+, )" (+, )! " " (!) = " − cov ", ! var ! ! = " − "!! = " ! , " are non-Gaussian

Slide 5

Slide 5 text

Python toolbox https://github.com/cdt15/lingam 5 • ICA-based LiNGAM algorithm • DirectLiNGAM • AR-LiNGAM and VARMA-LiNGAM • LiNGAM for multiple datasets • (Bottomup-) ParceLiNGAM Planning to implement more JNQPSUMJOHBN GSPNHSBQIWJ[JNQPSU%JHSBQI OQTFU@QSJOUPQUJPOT QSFDJTJPO TVQQSFTT5SVF TFFE FQTF σʔλΛ࡞੒ EFGNBLF@HSBQI EBH E%JHSBQI FOHJOFEPU JGDPFGJOEBH GPSGSPN@ UP DPFGJO[JQ EBH<GSPN> EBH<UP> EBH<DPFG> EFEHF GY\GSPN@^ GY\UP^ MBCFMG\DPFGG^ FMTF GPSGSPN@ UPJO[JQ EBH<GSPN> EBH<UP> EFEHF GY\GSPN@^ GY\UP^ MBCFM SFUVSOE x3 x0 3.00 x2 6.00 x5 4.00 x4 8.00 x1 3.00 1.00 2.00 EBH\ GSPN< > UP< > DPFG< > ^ NBLF@HSBQI EBH Bootstrap prob. Causal graph

Slide 6

Slide 6 text

LiNGAM with hidden common causes (Hoyer, Shimizu, Kerminen, & Palviainen, 2008) • Example causal graph • The model: 6 " ! ' = '' ' + ' ( = (' ' + (' ' + ( • Model: • Its matrix form: ! = # "#! !" " + # $%& ' !$ $ + ! = + + ! " !

Slide 7

Slide 7 text

Two lines of researches 1. Estimate causal structures of variables that share hidden common causes 2. Estimate causal structures of variables that do not share hidden common causes 7 " ! ! " ! ! or ? " ! ! . " ! ! . or ?

Slide 8

Slide 8 text

1. Estimate causal structures of variables that share hidden common causes • ICA: Independent Component Analysis (Comon, 1991; Eriksson et al., 2004; Hyvarinen et al., 2001) – Factor analysis with no factor rotation indeterminacy – Factors are independent and non-Gaussian • LiNGAM with hidden common causes is ICA 8 = + + = ( − )(& ( − )(& LiNGAM with hidden common causes ICA

Slide 9

Slide 9 text

Basic idea (Hoyer, Shimizu, Kerminen & Palviainen, 2008) • All the three models are identifiable ICA – The zero/non-zero patterns of the mixing matrices are different under the faithfulness – Apply ICA and see the zero/non-zero pattern 9 " ! = 1 0 "" !" 1 !" " ! " " ! ! ! " !" !" "" " ! = 1 "! "" 0 1 !" " ! " " ! ! ! " "! !" "" " ! = 1 0 "" 0 1 !" " ! " " ! ! ! " !" ""

Slide 10

Slide 10 text

Identifiability (Salehkaleybar et al., 2020) • If no overlap in descendants of observed variables and hidden common causes, the causal orders and intervention effects are identifiable • If there is some overlap, only causal orders are identifiable, their intervention effects are not 10 " ! = 1 0 "" !" 1 !" " ! " " ! ! ! " !" !" "" " ! % = 1 0 0 "" !" 1 0 !" 0 0 1 %" " ! # " " ! ! ! " !" !" "" Overlap No overlap . . #"

Slide 11

Slide 11 text

2. Estimate causal structures of variables that do not share hidden common causes • A simple case: only exogenous variables share hidden common causes 11 " ! ! ! " . Underlying model Output / . / " ! . / . /

Slide 12

Slide 12 text

Bottom-up approach for estimating causal orders (Tashiro, Shimizu, Hyvarinen, & Washio, 2014) • Do the following for all the variables - ( = 1, … , ) – Regress - on the other variables – If and only if the explanatory variables and residual are independent, the variable is an unconfounded sink • Exclude the sink • Repeat … 12 !! !" "" !# !$ !! !" "" !# !! !" "" The algorithm stops #$ ##

Slide 13

Slide 13 text

A generalization for finding unconfounded parents of non-sink variables (Maeda & Shimizu, 2020) • 1. Find unconfounded ancestors of each variable • 2. Find unconfounded parents among the unconfounded ancestors found 13 Find a set of variables that gives independent residuals when # is regressed on every its subset (Lemma 3) Regress # on the unconfounded ancestors of # except ! Regress ! on the unconfounded common ancestors of ! and # If the two residuals are correlated, ! is a (unconfounded) parent of ! Otherwise not (Lemma 4) Wang and Drton (2020, arXiv preprint) considered criteria that can be applied to more general cases !! !" "" !# !$ "! !! !" "" !# !$ "! !! !!

Slide 14

Slide 14 text

Final summary • Causal structure learning in the presence of hidden common causes – A challenge of causal discovery – Independence matters rather than uncorrelatedness • Future lines of research – Mixed data with continuous and discrete variables – Multiple datasets – More collaborations with domain experts • Other latent variable models – Latent factors (Shimizu et al., 2009) – latent class (Shimizu et al., 2008) etc. 14 Y. Zeng, S. Shimizu, R. Cai, F. Xie, M. Yamamoto, Z. Hao (2020, arXiv preprint)

Slide 15

Slide 15 text

References • J. Pearl. The seven tools of causal inference with reflections on machine learning. Communications of the ACM, 62(3), 54-60, 2019 • T. Rosenström, M. Jokela, S. Puttonen, M. Hintsanen, L. Pulkki-Råback, J. S. Viikari, O. T. Raitakari and L. Keltikangas-Järvinen. Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PLoS ONE, 7(11): e50841, 2012 • A. Moneta, D. Entner, P. O. Hoyer and A. Coad. Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics, 75(5): 705-730, 2013. • P. Campomanes, M. Neri, B. A.C. Horta, U. F. Roehrig, S. Vanni, I. Tavernelli and U. Rothlisberger. Origin of the spectral shifts among the early intermediates of the rhodopsin photocycle. Journal of the American Chemical Society, 136(10): 3842-3851, 2014 • S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 2003--2030, 2006 • S. Shimizu. LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika, 41(1): 65--98, 2014 • S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. O. Hoyer and K. Bollen. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12(Apr): 1225-- 1248, 2011. • J. Pearl. Causality. Cambridge University Press, 2001. • P. Spirtes, C. Glymour, R. Scheines. Causation, Prediction, and Search. Springer, 1993. 15

Slide 16

Slide 16 text

References • P. O. Hoyer, S. Shimizu, A. Kerminen and M. Palviainen. Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2): 362-378, 2008 • P. Comon. Independent component analysis, a new concept? Signal processing, 1994 • J. Eriksson, V. Koivunen. Identifiability, separability, and uniqueness of linear ICA models. IEEE signal processing letters, 2004 • A. Hyvärinen, J. Karhunen, E. Oja. Independent Component Analysis, Wiley, 2001 • S. Salehkaleybar, A. Ghassami, N. Kiyavash, K. Zhang. Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables. Journal of Machine Learning Research, 21:1-24, 2020 • T. Tashiro, S. Shimizu, A. Hyvärinen and T. Washio. ParceLiNGAM: A causal ordering method robust against latent confounders. Neural Computation, 26(1): 57--83, 2014 • T. N. Maeda, S. Shimizu. RCD: Repetitive causal discovery of linear non-Gaussian acyclic models with latent confounders. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (AISTATS2020), 2020 • Y. S. Wang, M. Drton. Causal Discovery with Unobserved Confounding and non-Gaussian Data. Arxiv preprint arXiv:2007.11131, 2020 • S. Shimizu, P. O. Hoyer and A. Hyvärinen. Estimation of linear non-Gaussian acyclic models for latent factors. Neurocomputing, 72: 2024-2027, 2009. • S. Shimizu and A. Hyvärinen. Discovery of linear non-gaussian acyclic models in the presence of latent classes. In Proc. 14th Int. Conf. on Neural Information Processing (ICONIP2007), pp. 752-761, Kitakyushu, Japan, 2008. • Y. Zeng, S. Shimizu, R. Cai, F. Xie, M. Yamamoto, Z. Hao. Causal Discovery with Multi-Domain LiNGAM for Latent Factors. Arxiv preprint arXiv:2009.09176, 2020. 16