Shohei SHIMIZU
November 05, 2021
1.7k

# LiNGAM Python package

Explains what LiNGAM python package can do at a seminar with causal discovery users

## Shohei SHIMIZU

November 05, 2021

## Transcript

Nov 2021

AS

4. ### LiNGAM model is identifiable (Shimizu, Hyvarinen, Hoyer & Kerminen, 2006)

• Linear Non-Gaussian Acyclic Model: – 𝑘(𝑖) (𝑖 = 1, … , 𝑝): causal (topological) order of 𝑥! – Error variables 𝑒! are independent and non-Gaussian • Coefficients and causal orders identifiable • Causal graph identifiable 4 or 𝑥" 𝑥# 𝑥\$ Causal graph 𝑥! = # " # \$"(!) 𝑏!# 𝑥# + 𝑒! 𝒙 = 𝐵𝒙 + 𝒆 𝑒\$ 𝑒" 𝑒# 𝑏#" 𝑏#\$ 𝑏"\$
5. ### Statistical reliability assessment • Bootstrap probability (bp) of directed paths

and edges • Interpret causal effects having bp larger than a threshold, say 5% 5 x3 x1 … … x3 x1 x0 x3 x1 x2 x3 x1 99% 96% Total effect: 20.9 10% LiNGAM Python package: https://github.com/cdt15/lingam
6. ### Before estimating causal graphs • Assessing assumptions by – Gaussianity

test – Histograms • continuous? – Too high correlation? • multicollinearity? – Background knowledge 6
7. ### After estimating causal graphs • Assessing assumptions by – Testing

independence of error variables, for example, by HSIC (Gretton et al., 2005) – Prediction accuracy using Markov boundary (Biza et al., 2020) – Compare with the results of other datasets in which causal graphs are expected to be similar – Check against background knowledge 7
8. ### DirectLiNGAM algorithm (Shimizu et al., 2011) • Repeat linear regression

and independence evaluation – https://lingam.readthedocs.io/en/latest/tutorial/lingam.html • p>n cases (Wang & Drton, 2020) – https://github.com/ysamwang/highDNG 8 ú ú ú û ù ê ê ê ë é + ú ú ú û ù ê ê ê ë é ú ú ú û ù ê ê ê ë é - = ú ú ú û ù ê ê ê ë é 2 1 3 2 1 3 2 1 3 0 3 . 1 0 0 0 5 . 1 0 0 0 e e e x x x x x x 0 0 0 0 0 0 0 0 ú û ù ê ë é + ú û ù ê ë é ú û ù ê ë é - = ú û ù ê ë é 2 1 ) 3 ( 2 ) 3 ( 1 ) 3 ( 2 ) 3 ( 1 0 3 . 1 0 0 e e r r r r 0 0 ) 3 ( 2 r ) 3 ( 1 r x3 x1 x2 0
9. ### Prior knowledge https://lingam.readthedocs.io/en/latest/tutorial/pk_direct.html • Prior knowledge about topological orders: k(3)

< k(1) < k(2) • Use prior knowledge in estimating topological causal orders and in pruning redundant edges 9 ) 3 ( 2 r ) 3 ( 1 r x3 x1 x2
10. ### Multiple datasets • Simultaneously analyze different datasets to use similarity

(Ramsey et al. 2011; Shimizu, 2012) – Similarity: Causal orders same, distributions and coefficients may differ – https://lingam.readthedocs.io/en/latest/tutorial/multiple_dataset.html 10 x3 x1 x2 e1 e2 e3 4 -3 2 x3 x1 x2 e1 e2 e3 -0.5 5 Dataset 1 Dataset 2
11. ### Multiple datasets: Longitudinal data • Longitudinal data consist of multiple

samples collected over a period of time (Kadowaki et al., 2013) • https://lingam.readthedocs.io/en/latest/tutorial/longitudinal.html 11
12. ### Analysis of predictive mechanisms • Combine the causal model and

predictive model to model the prediction mechanism 12 𝑋! 𝑋" 𝑋# 𝑋\$ 𝑌 𝑋! 𝑋" # 𝑌 𝑋# 𝑋\$ 𝑋! 𝑋" 𝑋# 𝑋\$ 𝑌 Causal model Predictive model # 𝑌 Prediction mechanism model ( ) 4 4 4 ,e y f x = ( ) 4 3 2 1 , , , ˆ x x x x f y = ( ) ( ) c x do y E i = | ˆ https://lingam.readthedocs.io/en/latest/tutorial/causal_effect.html#identification-of- feature-with-greatest-causal-influence-on-prediction
13. ### Illustrative example • Auto-MPG (miles per gallon) dataset • Linear

regression • Which variable has the greatest intervention effect on MPG prediction? • Which variable should be intervened on to obtain a certain MPG prediction? (Control) 13 Cylinders Displacement Weight Horsepower Acceleration MPG ! 𝑀𝑃𝐺 Desired MPG prediction Suggested intervention on cylinders 15 8 21 6 30 4
14. ### Time series model • Subsampling data: – SVAR: Structural Vector

Autoregressive model (Swanson & Granger, 1997) – Identifiability using non-Gaussianity (Hyvarinen et al., 2010) • https://lingam.readthedocs.io/en/latest/tutorial/var.html – VARMA instead of VAR (Kawahara et al., 2011) • https://lingam.readthedocs.io/en/latest/tutorial/varma.html • Nonstationarity – Assumption: Differences are stationarity (Moneta et al., 2013) 14 ) ( ) ( ) ( 0 t t t k e x B x + - = å = t t t x1(t) x1(t-1) x2(t-1) x2(t) e1(t-1) e2(t-1) e1(t) e2(t)
15. ### Hidden common cause (1) 15 • Assumption: only exogenous variables

allow hidden common causes x2 x3 x1 x2 x3 x1 f1 https://lingam.readthedocs.io/en/latest/tutorial/bottom_up_parce.html
16. ### Hidden common cause (2) RCD • For unconfounded pairs with

no hidden common causes, estimate the causal directions • For confounded pairs with hidden common causes, let them remain unknown 16 𝑥# 𝑥" 𝑓" 𝑥\$ Underlying model Output 𝑥% 𝑥# 𝑥" 𝑥\$ 𝑥% 𝑓# https://lingam.readthedocs.io/en/latest/tutorial/rcd.html
17. ### Time series model with hidden common causes • SVAR with

hidden common causes – Malinsky and Spirtes (2018) – Gerhardus and Runge (2020) – Nonparametric – Conditional independence – Python: https://github.com/jakobrunge/tigramite 17
18. ### Nonlinear model • Additive noise model: • R code: http://web.math.ku.dk/~peters/code.html

18 𝑥! = 𝑓! (par(𝑥! )) + 𝑒!
19. ### Methods based on conditional independencies • GUI: Tetrad – https://github.com/cmu-phil/tetrad

• Python: causal-learn (including LiNGAM variants) – https://github.com/cmu-phil/causal-learn • R: pcalg – https://cran.r-project.org/web/packages/pcalg/index.html 19
20. ### Future plan • A nonlinear version of RCD: CAM-UV •

Latent factors • Mixed data with continuous and discrete variables • Overcomplete ICA based method for hidden common cause cases under development 20
21. ### LiNGAM for latent factors (Shimizu et al., 2009) • Model:

– Two pure measurement variables per latent factor needed to identify the measurement model (Silva et al., 2006; Xie et al., 2020) • Estimate the latent factors and then their causal graph 21 𝒇 = 𝐵𝒇+𝝐 𝒙 = 𝐺𝒇+𝒆 𝑥! 𝑥" & 𝑓! & 𝑓" 𝑥# 𝑥\$ ?
22. ### Find common and unique factors across multiple datasets (Zeng et

al., 2021) • Model • Score function: likelihood + DAGness (Zheng et al., 2018) • Feature extraction across multiple datasets + causal discovery of latent factors 22 𝒇(') = 𝐵(') 𝒇(')+ 𝝐(') 𝒙(') = 𝐺(') 𝒇(')+ 𝒆(') 𝑚 = 1, … , 𝑀 ! " ! (#) ! ! (!) ! \$ (!) ! % (!) ! & (!) ? ! ! (\$) ! \$ (\$) ! " ! (!) ! % (%) ! & (&) ? ! " # (!) ! " # (#) ! " # (#) = ! " ! (!)?