Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Linear non-Gaussian models with latent variables for causal discovery (for the NeurIPS2020 Workshop)

Shohei SHIMIZU
November 23, 2020

Linear non-Gaussian models with latent variables for causal discovery (for the NeurIPS2020 Workshop)

Shohei Shimizu
Shiga University and RIKEN

NeurIPS 2020 Workshop on Causal Discovery and Causality-Inspired Machine Learning, Dec. 11 (EST)

Shohei SHIMIZU

November 23, 2020
Tweet

More Decks by Shohei SHIMIZU

Other Decks in Science

Transcript

  1. Linear non-Gaussian models with latent variables for causal discovery Shohei

    Shimizu Shiga University and RIKEN The 2020 NeurIPS Workshop on Causal Discovery and Causality-Inspired Machine Learning
  2. Statistical Causal Inference • Infer causal relations from data –

    Intervention effects – Counterfactuals • If we had increased chocolate consumption, would the number of Nobel laureates increase? • To what extent? 2 Messerli, (2012), New England Journal of Medicine
  3. Ordinary (?) causal inference 1. Decide which quantity to be

    estimated: Intervention effect 2. Draw the causal graph based on background knolwedge 3. Derive which variables should be used for adjustment 4. Observe and adjust for the variables (if any), and estimate the intervention effect as follows 3 Chocolate Nobel GDP 𝐸 Nobel 𝑑𝑜 Chocolate) = 𝐸!"#$"%&'( ")*+(,') -.# [𝐸 Nobel Chocolate, variables adjusted for)] 𝐸 Nobel 𝒅𝒐 Cholocolate = a lot) − 𝐸 Nobel 𝒅𝒐(Chocolate = not much))
  4. Causal discovery • Infer the causal graph in data-driven ways

    • Need assumptions to infer the causal graph – Various methods for different assumptions – Basic setup • All the common causes are measured • Acyclicity 4 𝑥! 𝑥! ? or 𝑥" 𝑥! 𝑥! or 𝑥" or … 𝑥# : Chocolate 𝑥! : Nobel 𝑥" : GDP
  5. Causal discovery is a challenge of causal inference (Pearl, 2019)

    • Classic methods use conditional independence of variables (Pearl 2001; Spirtes 1993) – The limit is finding the Markov equivalent models • Need more assumptions to go beyond the limit – Restrictions on the functional forms or/and the distributions of variables • LiNGAM is an example (Shimizu et al., 2006; Shimizu, 2014) – non-Gaussian assumption to examine independence – Unique identification possible 5
  6. How independence and non-Gaussianity work? (Shimizu et al., 2011) 6

    𝑥/ = 𝑏/0𝑒0 + 𝑒/ and 𝑟0 (/) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual 𝑥0 = 𝑒0 and 𝑟/ (0) are independent 𝑥# = 𝑒# 𝑥$ = 𝑏$# 𝑥# + 𝑒$ (𝑏$# ≠ 0) 𝑥$ 𝑥# 𝑒# 𝑒$ 𝑟# ($) = 𝑥# − cov 𝑥#, 𝑥$ var 𝑥$ 𝑥$ = 1 − '!"()* +",+! *-. +! 𝑒# − '!"*-. +" *-. +! 𝑒$ 𝑟$ (#) = 𝑥$ − cov 𝑥$, 𝑥# var 𝑥# 𝑥# = 𝑥$ − 𝑏$#𝑥# = 𝑒$ 𝑒# , 𝑒" are non-Gaussian
  7. How independence and non-Gaussianity work? (Shimizu et al., 2011) 7

    𝑥/ = 𝑏/0𝑒0 + 𝑒/ and 𝑟0 (/) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual 𝑥0 = 𝑒0 and 𝑟/ (0) are independent 𝑥# = 𝑒# 𝑥$ = 𝑏$# 𝑥# + 𝑒$ (𝑏$# ≠ 0) 𝑥$ 𝑥# 𝑒# 𝑒$ 𝑟# ($) = 𝑥# − cov 𝑥#, 𝑥$ var 𝑥$ 𝑥$ = 1 − '!"()* +",+! *-. +! 𝑒# − '!"*-. +" *-. +! 𝑒$ 𝑟$ (#) = 𝑥$ − cov 𝑥$, 𝑥# var 𝑥# 𝑥# = 𝑥$ − 𝑏$#𝑥# = 𝑒$ 𝑒# , 𝑒" are non-Gaussian
  8. How independence and non-Gaussianity work? (Shimizu et al., 2011) 8

    𝑥/ = 𝑏/0𝑒0 + 𝑒/ and 𝑟0 (/) are dependent, although they are uncorrelated Underlying model Regress effect on cause Regress cause on effect Residual 𝑥0 = 𝑒0 and 𝑟/ (0) are independent 𝑥# = 𝑒# 𝑥$ = 𝑏$# 𝑥# + 𝑒$ (𝑏$# ≠ 0) 𝑥$ 𝑥# 𝑒# 𝑒$ 𝑟# ($) = 𝑥# − cov 𝑥#, 𝑥$ var 𝑥$ 𝑥$ = 1 − '!"()* +",+! *-. +! 𝑒# − '!"*-. +" *-. +! 𝑒$ 𝑟$ (#) = 𝑥$ − cov 𝑥$, 𝑥# var 𝑥# 𝑥# = 𝑥$ − 𝑏$#𝑥# = 𝑒$ 𝑒# , 𝑒" are non-Gaussian
  9. Other identifiable models • Continuous variables – Nonlinearity + “additive”

    noise (Hoyer+08NIPS, Zhang+09UAI, Peters+14JMLR) • 𝑥3 = 𝑓3(par(𝑥3)) + 𝑒3 • 𝑥3 = 𝑔3 40(𝑓3(par(𝑥3)) + 𝑒3) • Discrete variables – Poisson DAG model and its extensions (Park+18JMLR) • Mixed types of variables: Continuous and discrete variables – A logistic-distribution assumption for discrete vars (Two variables) (Wenjuan+18IJCAI) 9
  10. Python toolbox https://github.com/cdt15/lingam 10 • ICA-based LiNGAM algorithm • DirectLiNGAM

    • AR-LiNGAM and VARMA-LiNGAM • LiNGAM for multiple datasets • (Bottomup-) ParceLiNGAM • Nonlinear method: ANM Planning to implement more JNQPSUMJOHBN GSPNHSBQIWJ[JNQPSU%JHSBQI OQTFU@QSJOUPQUJPOT QSFDJTJPO TVQQSFTT5SVF TFFE FQTF σʔλΛ࡞੒ EFGNBLF@HSBQI EBH  E%JHSBQI FOHJOFEPU JGDPFGJOEBH GPSGSPN@ UP DPFGJO[JQ EBH<GSPN> EBH<UP> EBH<DPFG>  EFEHF GY\GSPN@^ GY\UP^ MBCFMG\DPFGG^ FMTF GPSGSPN@ UPJO[JQ EBH<GSPN> EBH<UP>  EFEHF GY\GSPN@^ GY\UP^ MBCFM SFUVSOE x3 x0 3.00 x2 6.00 x5 4.00 x4 8.00 x1 3.00 1.00 2.00 EBH\ GSPN<      > UP<      > DPFG<      > ^ NBLF@HSBQI EBH Bootstrap prob. Causal graph
  11. Applications https://sites.google.com/view/sshimizu06/lingam/lingampapers/applications-and-tailor-made-methods 11 Epidemiology Economics Sleep problems Depression mood Sleep

    problems Depression mood ? or OpInc.gr(t) Empl.gr(t) Sales.gr(t) R&D.gr(t) Empl.gr(t+1) Sales.gr(t+1) R&D(.grt+1) OpInc.gr(t+1) Empl.gr(t+2) Sales.gr(t+2) R&D.gr(t+2) OpInc.gr(t+2) (Moneta et al., 2012) (Rosenstrom et al., 2012) Neuroscience Chemistry (Campomanes et al., 2014) (Boukrina & Graves, 2013)
  12. Typical requirements/questions from users • Hidden common causes • Background

    knowledge • Mixed types of variables - Marketing research • Cyclicity – Biology • Multicolinearity - Manufacturing 12
  13. LiNGAM with hidden common causes (Hoyer, Shimizu, Kerminen, & Palviainen,

    2008) • Example causal graph • The model: 13 𝑥$ 𝑥# 𝑥E = 𝜆EE 𝑓E + 𝑒E 𝑥F = 𝑏FE 𝑥E + 𝜆FE 𝑓E + 𝑒F • Model: • Its matrix form: 𝑥! = # "#! 𝑏!" 𝑥" + # $%& ' 𝜆!$ 𝑓$ + 𝑒! 𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝑒# 𝑒$ 𝑓#
  14. 1. Estimate causal structures of variables that share hidden common

    causes • ICA: Independent Component Analysis (Comon, 1991; Eriksson et al., 2004; Hyvarinen et al., 2001) – Independent components are independent and non-Gaussian • LiNGAM with hidden common causes is ICA 14 𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝒙 = (𝐼 − 𝐵)(& (𝐼 − 𝐵)(&𝛬 𝒆 𝒇 LiNGAM with hidden common causes ICA
  15. Basic idea (Hoyer, Shimizu, Kerminen & Palviainen, 2008) • All

    the three models are identifiable ICA – The zero/non-zero patterns of the mixing matrices are different under the faithfulness – Apply ICA and see the zero/non-zero pattern 15 𝑥0 𝑥/ = 1 0 𝜆00 𝑏/0 1 𝜆/0 + 𝜆/0𝜆00 𝑒0 𝑒/ 𝑓0 𝑥$ 𝑥# 𝑓# 𝑒# 𝑒$ 𝑏"! 𝜆"! 𝜆!! 𝑥0 𝑥/ = 1 𝑏0/ 𝜆00 + 𝑏0/𝜆/0 0 1 𝜆/0 𝑒0 𝑒/ 𝑓0 𝑥$ 𝑥# 𝑓# 𝑒# 𝑒$ 𝑏!" 𝜆"! 𝜆!! 𝑥0 𝑥/ = 1 0 𝜆00 0 1 𝜆/0 𝑒0 𝑒/ 𝑓0 𝑥$ 𝑥# 𝑓# 𝑒# 𝑒$ 𝜆"! 𝜆!!
  16. Identifiability (for more than 2 variables) (Salehkaleybar et al., 2020)

    • Causal orders are identifiable, but intervention effects are not in some cases – If no overlap in descendants of observed variables and hidden common causes, the causal orders and intervention effects are identifiable – If there is some overlap, only causal orders are identifiable, their intervention effects are not 16 𝑥0 𝑥/ = 1 0 𝜆00 𝑏/0 1 𝜆/0 + 𝜆/0𝜆00 𝑒0 𝑒/ 𝑓0 𝑥$ 𝑥# 𝑓# 𝑒# 𝑒$ 𝑏"! 𝜆"! 𝜆!! 𝑥0 𝑥/ 𝑥5 = 1 0 0 𝜆00 𝑏/0 1 0 𝜆/0 + 𝜆/0𝜆00 0 0 1 𝜆50 𝑒! 𝑒" 𝑒# 𝑓 ! 𝑥$ 𝑥# 𝑓# 𝑒# 𝑒$ 𝑏"! 𝜆"! 𝜆!! Overlap No overlap 𝑥0 𝑒0 𝜆#!
  17. 2. Estimate causal structures of variables that do not share

    hidden common causes • Find unconfounded pairs of variables and confounded pairs • For unconfounded pairs, estimate the causal directions • Tashiro, Shimizu, Hyvarinen, and Washio (2014), Maeda and Shimizu (2020), Wang and Drton (2020) 17 𝑥$ 𝑥# 𝑓# 𝑥0 Underlying model Output 𝑥1 𝑥$ 𝑥# 𝑥0 𝑥1 𝑓$
  18. Final summary • Causal structure learning in the presence of

    hidden common causes – A challenge of causal discovery – Independence matters rather than uncorrelatedness • Other important topics: – Mixed data with continuous and discrete variables (Wenjuan+18IJCAI) – Cyclic models (Lacerda+08UAI) – Background knowledge – More collaborations with domain experts • Other latent variable models – Latent factors (Shimizu et al., 2009) – latent class (Shimizu et al., 2008) etc. 18 Y. Zeng, S. Shimizu, R. Cai, F. Xie, M. Yamamoto, Z. Hao (2020, arXiv preprint)
  19. References • J. Pearl. The seven tools of causal inference

    with reflections on machine learning. Communications of the ACM, 62(3), 54-60, 2019 • S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 2003--2030, 2006 • S. Shimizu. LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika, 41(1): 65--98, 2014 • S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. O. Hoyer and K. Bollen. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12(Apr): 1225--1248, 2011. • J. Pearl. Causality. Cambridge University Press, 2001. • P. Spirtes, C. Glymour, R. Scheines. Causation, Prediction, and Search. Springer, 1993. • P. O. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Schölkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21 (NIPS2008), pp. 689-696, 2009. • K. Zhang and A. Hyvärinen. On the identifiability of the post-nonlinear causal model. In Proc. 25th Conf. on Uncertainty in Artificial Intelligence (UAI2009), pp. 647-655, Montreal, Canada, 2009. • J. Peters, J. Mooij, D. Janzing and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15: 2009--2053, 2014. 19
  20. References • G. Park and G. Raskutti. Learning quadratic variance

    function (QVF) DAG models via OverDispersion Scoring (ODS). Journal of Machine Learning Research, 18: 1-44, 2018. • W. Wenjuan, F. Lu, and L. Chunchen. Mixed Causal Structure Discovery with Application to Prescriptive Pricing. In Proc. 27th International Joint Conference on Artificial Intelligence (IJCAI2018), pp. xx--xx, Stockholm, Sweden, 2018. • T. Rosenström, M. Jokela, S. Puttonen, M. Hintsanen, L. Pulkki-Råback, J. S. Viikari, O. T. Raitakari and L. Keltikangas-Järvinen. Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PLoS ONE, 7(11): e50841, 2012 • A. Moneta, D. Entner, P. O. Hoyer and A. Coad. Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics, 75(5): 705-730, 2013. • O. Boukrina and W. W. Graves. Neural networks underlying contributions from semantics in reading aloud. Frontiers in Human Neuroscience, 7:518, 2013. • P. Campomanes, M. Neri, B. A.C. Horta, U. F. Roehrig, S. Vanni, I. Tavernelli and U. Rothlisberger. Origin of the spectral shifts among the early intermediates of the rhodopsin photocycle. Journal of the American Chemical Society, 136(10): 3842-3851, 2014 • P. O. Hoyer, S. Shimizu, A. Kerminen and M. Palviainen. Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2): 362-378, 2008 • P. Comon. Independent component analysis, a new concept? Signal processing, 1994 • J. Eriksson, V. Koivunen. Identifiability, separability, and uniqueness of linear ICA models. IEEE signal processing letters, 2004 20
  21. References • A. Hyvärinen, J. Karhunen, E. Oja. Independent Component

    Analysis, Wiley, 2001 • S. Salehkaleybar, A. Ghassami, N. Kiyavash, K. Zhang. Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables. Journal of Machine Learning Research, 21:1-24, 2020 • T. Tashiro, S. Shimizu, A. Hyvärinen and T. Washio. ParceLiNGAM: A causal ordering method robust against latent confounders. Neural Computation, 26(1): 57--83, 2014 • T. N. Maeda, S. Shimizu. RCD: Repetitive causal discovery of linear non-Gaussian acyclic models with latent confounders. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (AISTATS2020), 2020 • Y. S. Wang, M. Drton. Causal Discovery with Unobserved Confounding and non-Gaussian Data. Arxiv preprint arXiv:2007.11131, 2020 • G. Lacerda, P. Spirtes, J. Ramsey and P. O. Hoyer. Discovering cyclic causal models by independent components analysis. In Proc. 24th Conf. on Uncertainty in Artificial Intelligence (UAI2008), pp. 366-374, Helsinki, Finland, 2008. • S. Shimizu, P. O. Hoyer and A. Hyvärinen. Estimation of linear non-Gaussian acyclic models for latent factors. Neurocomputing, 72: 2024-2027, 2009. • Y. Zeng, S. Shimizu, R. Cai, F. Xie, M. Yamamoto, Z. Hao. Causal Discovery with Multi-Domain LiNGAM for Latent Factors. Arxiv preprint arXiv:2009.09176, 2020. • S. Shimizu and A. Hyvärinen. Discovery of linear non-gaussian acyclic models in the presence of latent classes. In Proc. 14th Int. Conf. on Neural Information Processing (ICONIP2007), pp. 752-761, Kitakyushu, Japan, 2008. 21