$30 off During Our Annual Pro Sale. View Details »

LiNGAM Python package

Shohei SHIMIZU
November 05, 2021

LiNGAM Python package

Explains what LiNGAM python package can do at a seminar with causal discovery users

Shohei SHIMIZU

November 05, 2021
Tweet

More Decks by Shohei SHIMIZU

Other Decks in Science

Transcript

  1. LiNGAM Python package
    Shohei SHIMIZU
    Shiga University & RIKEN
    13 Nov 2021

    View Slide

  2. LiNGAM Python package
    • https://github.com/cdt15/lingam
    2
    ぜひstarを!
    Takashi Ikeuchi
    SCREEN AS

    View Slide

  3. Documentation
    • https://lingam.readthedocs.io/en/latest/#
    3

    View Slide

  4. LiNGAM model is identifiable
    (Shimizu, Hyvarinen, Hoyer & Kerminen, 2006)
    • Linear Non-Gaussian Acyclic Model:
    – 𝑘(𝑖) (𝑖 = 1, … , 𝑝): causal (topological) order of 𝑥!
    – Error variables 𝑒!
    are independent and non-Gaussian
    • Coefficients and causal orders identifiable
    • Causal graph identifiable
    4
    or
    𝑥"
    𝑥#
    𝑥$
    Causal graph
    𝑥!
    = #
    " # $"(!)
    𝑏!#
    𝑥#
    + 𝑒! 𝒙 = 𝐵𝒙 + 𝒆
    𝑒$
    𝑒" 𝑒#
    𝑏#"
    𝑏#$
    𝑏"$

    View Slide

  5. Statistical reliability assessment
    • Bootstrap probability (bp) of directed paths and edges
    • Interpret causal effects having bp larger than a threshold, say 5%
    5
    x3
    x1
    … …
    x3
    x1
    x0
    x3
    x1
    x2
    x3
    x1
    99% 96%
    Total effect:
    20.9
    10%
    LiNGAM Python package: https://github.com/cdt15/lingam

    View Slide

  6. Before estimating causal graphs
    • Assessing assumptions by
    – Gaussianity test
    – Histograms
    • continuous?
    – Too high correlation?
    • multicollinearity?
    – Background knowledge
    6

    View Slide

  7. After estimating causal graphs
    • Assessing assumptions by
    – Testing independence of error variables, for example, by HSIC
    (Gretton et al., 2005)
    – Prediction accuracy using Markov boundary (Biza et al., 2020)
    – Compare with the results of other datasets in which causal graphs
    are expected to be similar
    – Check against background knowledge
    7

    View Slide

  8. DirectLiNGAM algorithm
    (Shimizu et al., 2011)
    • Repeat linear regression and independence evaluation
    – https://lingam.readthedocs.io/en/latest/tutorial/lingam.html
    • p>n cases (Wang & Drton, 2020)
    – https://github.com/ysamwang/highDNG
    8
    ú
    ú
    ú
    û
    ù
    ê
    ê
    ê
    ë
    é
    +
    ú
    ú
    ú
    û
    ù
    ê
    ê
    ê
    ë
    é
    ú
    ú
    ú
    û
    ù
    ê
    ê
    ê
    ë
    é
    -
    =
    ú
    ú
    ú
    û
    ù
    ê
    ê
    ê
    ë
    é
    2
    1
    3
    2
    1
    3
    2
    1
    3
    0
    3
    .
    1
    0
    0
    0
    5
    .
    1
    0
    0
    0
    e
    e
    e
    x
    x
    x
    x
    x
    x 0
    0
    0 0
    0
    0
    0
    0
    ú
    û
    ù
    ê
    ë
    é
    +
    ú
    û
    ù
    ê
    ë
    é
    ú
    û
    ù
    ê
    ë
    é
    -
    =
    ú
    û
    ù
    ê
    ë
    é
    2
    1
    )
    3
    (
    2
    )
    3
    (
    1
    )
    3
    (
    2
    )
    3
    (
    1
    0
    3
    .
    1
    0
    0
    e
    e
    r
    r
    r
    r 0 0
    )
    3
    (
    2
    r
    )
    3
    (
    1
    r
    x3 x1 x2
    0

    View Slide

  9. Prior knowledge
    https://lingam.readthedocs.io/en/latest/tutorial/pk_direct.html
    • Prior knowledge about topological orders: k(3) < k(1) < k(2)
    • Use prior knowledge in estimating topological causal orders
    and in pruning redundant edges
    9
    )
    3
    (
    2
    r
    )
    3
    (
    1
    r
    x3 x1 x2

    View Slide

  10. Multiple datasets
    • Simultaneously analyze different datasets to use similarity
    (Ramsey et al. 2011; Shimizu, 2012)
    – Similarity: Causal orders same, distributions and coefficients may differ
    – https://lingam.readthedocs.io/en/latest/tutorial/multiple_dataset.html
    10
    x3
    x1
    x2
    e1
    e2
    e3
    4
    -3
    2
    x3
    x1
    x2
    e1
    e2
    e3
    -0.5
    5
    Dataset 1 Dataset 2

    View Slide

  11. Multiple datasets: Longitudinal data
    • Longitudinal data consist of multiple samples collected over a
    period of time (Kadowaki et al., 2013)
    • https://lingam.readthedocs.io/en/latest/tutorial/longitudinal.html
    11

    View Slide

  12. Analysis of predictive mechanisms
    • Combine the causal model and predictive model
    to model the prediction mechanism
    12
    𝑋!
    𝑋" 𝑋#
    𝑋$
    𝑌
    𝑋!
    𝑋" #
    𝑌
    𝑋#
    𝑋$
    𝑋!
    𝑋"
    𝑋#
    𝑋$
    𝑌
    Causal model Predictive model
    #
    𝑌
    Prediction mechanism model
    ( )
    4
    4
    4
    ,e
    y
    f
    x = ( )
    4
    3
    2
    1
    ,
    ,
    ,
    ˆ x
    x
    x
    x
    f
    y = ( )
    ( )
    c
    x
    do
    y
    E
    i
    =
    |
    ˆ
    https://lingam.readthedocs.io/en/latest/tutorial/causal_effect.html#identification-of-
    feature-with-greatest-causal-influence-on-prediction

    View Slide

  13. Illustrative example
    • Auto-MPG (miles per gallon) dataset
    • Linear regression
    • Which variable has the greatest intervention effect
    on MPG prediction?
    • Which variable should be intervened on to obtain a
    certain MPG prediction? (Control)
    13
    Cylinders
    Displacement
    Weight
    Horsepower
    Acceleration
    MPG
    !
    𝑀𝑃𝐺
    Desired
    MPG
    prediction
    Suggested
    intervention
    on cylinders
    15 8
    21 6
    30 4

    View Slide

  14. Time series model
    • Subsampling data:
    – SVAR: Structural Vector Autoregressive model (Swanson & Granger, 1997)
    – Identifiability using non-Gaussianity (Hyvarinen et al., 2010)
    • https://lingam.readthedocs.io/en/latest/tutorial/var.html
    – VARMA instead of VAR (Kawahara et al., 2011)
    • https://lingam.readthedocs.io/en/latest/tutorial/varma.html
    • Nonstationarity
    – Assumption: Differences are stationarity (Moneta et al., 2013)
    14
    )
    (
    )
    (
    )
    (
    0
    t
    t
    t
    k
    e
    x
    B
    x +
    -
    = å
    =
    t
    t
    t
    x1(t)
    x1(t-1)
    x2(t-1) x2(t)
    e1(t-1)
    e2(t-1)
    e1(t)
    e2(t)

    View Slide

  15. Hidden common cause (1)
    15
    • Assumption: only exogenous variables allow hidden
    common causes
    x2 x3
    x1
    x2 x3
    x1
    f1
    https://lingam.readthedocs.io/en/latest/tutorial/bottom_up_parce.html

    View Slide

  16. Hidden common cause (2) RCD
    • For unconfounded pairs with no hidden common causes, estimate the
    causal directions
    • For confounded pairs with hidden common causes, let them remain
    unknown
    16
    𝑥# 𝑥"
    𝑓"
    𝑥$
    Underlying model Output
    𝑥%
    𝑥# 𝑥"
    𝑥$
    𝑥%
    𝑓#
    https://lingam.readthedocs.io/en/latest/tutorial/rcd.html

    View Slide

  17. Time series model with hidden common
    causes
    • SVAR with hidden common causes
    – Malinsky and Spirtes (2018)
    – Gerhardus and Runge (2020)
    – Nonparametric
    – Conditional independence
    – Python: https://github.com/jakobrunge/tigramite
    17

    View Slide

  18. Nonlinear model
    • Additive noise model:
    • R code: http://web.math.ku.dk/~peters/code.html
    18
    𝑥!
    = 𝑓!
    (par(𝑥!
    )) + 𝑒!

    View Slide

  19. Methods based on conditional independencies
    • GUI: Tetrad
    – https://github.com/cmu-phil/tetrad
    • Python: causal-learn (including LiNGAM variants)
    – https://github.com/cmu-phil/causal-learn
    • R: pcalg
    – https://cran.r-project.org/web/packages/pcalg/index.html
    19

    View Slide

  20. Future plan
    • A nonlinear version of RCD: CAM-UV
    • Latent factors
    • Mixed data with continuous and discrete variables
    • Overcomplete ICA based method for hidden common cause
    cases under development
    20

    View Slide

  21. LiNGAM for latent factors (Shimizu et al., 2009)
    • Model:
    – Two pure measurement variables per latent factor needed to identify the
    measurement model (Silva et al., 2006; Xie et al., 2020)
    • Estimate the latent factors and then their causal graph
    21
    𝒇 = 𝐵𝒇+𝝐
    𝒙 = 𝐺𝒇+𝒆
    𝑥!
    𝑥"
    &
    𝑓!
    &
    𝑓"
    𝑥#
    𝑥$
    ?

    View Slide

  22. Find common and unique factors across
    multiple datasets (Zeng et al., 2021)
    • Model
    • Score function: likelihood + DAGness (Zheng et al., 2018)
    • Feature extraction across multiple datasets
    + causal discovery of latent factors
    22
    𝒇(') = 𝐵(') 𝒇(')+ 𝝐(')
    𝒙(') = 𝐺(') 𝒇(')+ 𝒆(')
    𝑚 = 1, … , 𝑀
    !
    "
    !
    (#)
    !
    !
    (!)
    !
    $
    (!)
    !
    %
    (!)
    !
    &
    (!)
    ?
    !
    !
    ($)
    !
    $
    ($)
    !
    "
    !
    (!)
    !
    %
    (%)
    !
    &
    (&)
    ?
    !
    "
    #
    (!)
    !
    "
    #
    (#)
    !
    "
    #
    (#) = !
    "
    !
    (!)?

    View Slide