Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LiNGAM approach to causal discovery (Preliminary version)

LiNGAM approach to causal discovery (Preliminary version)

Talk at the KDD2021 Workshop on Causal Discovery (CD2021) on 15 August.

As of 13 August, I'm still modifying the slides to make the talk shorter to 45 min.

Shohei SHIMIZU

August 13, 2021
Tweet

More Decks by Shohei SHIMIZU

Other Decks in Science

Transcript

  1. LiNGAM approach to causal discovery
    Shohei SHIMIZU
    Shiga University & RIKEN
    The KDD2021 Workshop on Causal Discovery (CD2021)

    View full-size slide

  2. What is causal discovery?
    • Methodology for inferring causal graphs using data
    2
    Maeda and Shimizu (2020)
    Assumptions
    • Functional form?
    • Distribution?
    • Hidden common
    cause present?
    • Acyclic? etc.
    Data Causal graph

    View full-size slide

  3. Causal graphs are the key to
    statistical causal inference
    • Estimate intervention effects
    – Need causal graph to select variables to be
    adjusted, e.g., using backdoor criterion
    (Pearl, 1995)
    • Also useful for machine learning
    – E.g., domain adaptation (Zhang et al., 2020),
    fairness (Kuzner et al., 2017), and interpretability
    (Blobaum & Shimizu, 2017)
    3
    Messerli (2012)
    Chocolate Nobel laureates
    GDP
    Number of Nobel laureates
    Chocolate consumption

    View full-size slide

  4. How do we draw a causal graph?
    • Common way: Use background knowledge
    • Often need to use both background knowledge AND DATA
    • Causal discovery: Infer the causal graph from data
    4
    ?
    or or
    Chocolate Nobel laureates
    GDP
    Chocolate Nobel
    GDP
    Chocolate Nobel
    GDP
    Chocolate Nobel
    GDP

    View full-size slide

  5. Application areas
    https://sites.google.com/view/sshimizu06/lingam/lingampapers/applications-and-tailor-made-methods
    5
    Epidemiology Economics
    Sleep
    problems
    Depression
    mood
    Sleep
    problems
    Depression
    mood ?
    or
    OpInc.gr(t)
    Empl.gr(t)
    Sales.gr(t)
    R&D.gr(t)
    Empl.gr(t+1)
    Sales.gr(t+1)
    R&D(.grt+1)
    OpInc.gr(t+1)
    Empl.gr(t+2)
    Sales.gr(t+2)
    R&D.gr(t+2)
    OpInc.gr(t+2)
    (Moneta et al., 2012)
    (Rosenstrom et al., 2012)
    Neuroscience Chemistry
    (Campomanes et al., 2014)
    (Boukrina & Graves, 2013)
    Prevention Medicine
    (Kotoku et al., 2020)
    Climatology
    (Liu & Niyogi, 2020)

    View full-size slide

  6. Causal discovery is a challenge
    in causal inference
    • Classical non-parametric approach uses conditional independence
    (Pearl 2001; Spirtes 1993)
    – Make no assumptions about function forms or distribution
    – The limit is finding the Markov equivalent models
    • Additional assumptions needed to go beyond the limit
    – Restrictions on functional forms and distributions
    – Uniquely Identifiable or Smaller numbers of Equivalent models
    • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014).
    – Non-Gaussian assumption to exploit independence
    – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020)
    6

    View full-size slide

  7. Causal discovery is a challenge
    in causal inference
    • Classical non-parametric approach uses conditional independence
    (Pearl 2001; Spirtes 1993)
    – Make no assumptions about function forms or distribution
    – The limit is finding the Markov equivalent models
    • Additional assumptions needed to go beyond the limit
    – Restrictions on functional forms and distributions
    – Uniquely identifiable or smaller numbers of equivalent models
    • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014).
    – Non-Gaussian assumption to exploit independence
    – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020)
    7

    View full-size slide

  8. Causal discovery is a challenge
    in causal inference
    • Classical non-parametric approach uses conditional independence
    (Pearl 2001; Spirtes 1993)
    – Make no assumptions about function forms or distribution
    – The limit is finding the Markov equivalent models
    • Additional assumptions needed to go beyond the limit
    – Restrictions on functional forms and distributions
    – Uniquely identifiable or smaller numbers of equivalent models
    • LiNGAM is one example (Shimizu et al., 2006; Shimizu, 2014).
    – Non-Gaussian assumption to exploit independence
    – Growing literature on its variants (Peters et al., 2018; Shimizu & Blobaum, 2020)
    8

    View full-size slide

  9. Methods of causal discovery
    9

    View full-size slide

  10. Framework
    • Structural causal model (Pearl, 2001)
    • Make assumptions and find a causal graph(s) that is consistent
    with the data
    – Typical example 1:
    • Directed acyclic graph (DAG)
    • No hidden common cause (all observed)
    – Typical example 2:
    • DAG
    • Hidden common causes may exist
    10
    x3
    x1
    e3
    e1
    x2 e2
    Error variable
    𝑥!
    = 𝑓!
    (parents of 𝑥!
    , 𝑒!
    )

    View full-size slide

  11. Non-parametric approach
    To what extent can we infer the causal graph
    without making any assumptions about the
    functional form or distribution?
    11
    Spirtes, Glymour, Shceines, 2001 (2nd ed)

    View full-size slide

  12. Non-parametric approach: Example
    1. Making assumptions on the underlying causal graph
    – Directed acyclic graph
    – No hidden common causes (all have been observed)
    2. Find the graph that best matches the data among such causal graphs that
    satisfy the assumptions.
    12
    If x and y are independent in the data, select (c) on the right.
    If x and y are dependent in the data, select (a) and (b).
    (a) and (b) are indistinguishable (not uniquely identifiable): Markov equivalence class
    Three candidates
    x y x y x y
    (a) (b) (c)

    View full-size slide

  13. Non-parametric approach: Example
    1. Making assumptions on the underlying causal graph
    – Directed acyclic graph
    – No hidden common causes (all have been observed)
    2. Find the graph that best matches the data among such causal graphs that
    satisfy the assumptions.
    13
    If x and y are independent in the data, select (c) on the right.
    If x and y are dependent in the data, select (a) and (b).
    (a) and (b) are indistinguishable (not uniquely identifiable): Markov equivalence class
    Three candidates
    x y x y x y
    (a) (b) (c)

    View full-size slide

  14. Various extensions
    • Equivalent models including unobserved common causes
    (Spirtes et al., 1995)
    • Those for time series cases (Malinsky & Spirtes, 2018)
    • Equivalence class including cyclic graphs (Richardson, 1996)
    • Lower bound on intervention effects (Maathuis et al., 2009; Malinsky &
    Spirtes, 2017)
    14
    x y

    w z
    x y
    w z
    x y
    f1
    w z
    f2
    F. Eberhardt CRM Workshop 2016

    View full-size slide

  15. Semi-parametric approach:
    Make additional assumptions on function
    forms and distributions
    What are the assumptions for making causal
    graphs identifiable?
    15

    View full-size slide

  16. Make additional assumptions on functional
    forms and distributions
    • More information available than conditional independence
    • E.g., linearity + non-Gaussian continuous distribution
    16
    Results in different distributions of x1 and x2
    No difference in terms of their conditional independence
    x y x y
    (a) (b)

    View full-size slide

  17. LiNGAM model is identifiable
    (Shimizu, Hyvarinen, Hoyer & Kerminen, 2006)
    • Linear Non-Gaussian Acyclic Model:
    – 𝑘(𝑖) (𝑖 = 1, … , 𝑝): causal (topological) order of 𝑥!
    – Error variables 𝑒!
    independent and non-Gaussian
    • Coefficients and causal orders identifiable
    • Causal graph identifiable
    17
    or
    𝑥"
    𝑥#
    𝑥$
    Causal graph
    𝑥!
    = #
    " # $"(!)
    𝑏!#
    𝑥#
    + 𝑒! 𝒙 = 𝐵𝒙 + 𝒆
    𝑒$
    𝑒" 𝑒#
    𝑏#"
    𝑏#$
    𝑏"$

    View full-size slide

  18. How do we use non-Gaussianity and
    independence?
    18
    𝑏!"
    𝑥! = 𝑏!"𝑒" + 𝑒!
    and 𝑟"
    (!) are dependent,
    although they are uncorrelated
    Residual
    𝑥" = 𝑒"
    and 𝑟!
    (") are independent
    𝑟"
    (#) = 𝑥" −
    cov 𝑥", 𝑥#
    var 𝑥#
    𝑥#
    = 1 − '!"()* +",+!
    *-. +!
    𝑒"
    − '!"*-. +"
    *-. +!
    𝑒#
    𝑟#
    (") = 𝑥# −
    cov 𝑥#
    , 𝑥"
    var 𝑥"
    𝑥"
    = 𝑥#
    − 𝑏#"
    𝑥"
    = 𝑒#
    Underlying model
    𝑥" = 𝑒"
    𝑥#
    = 𝑏#"
    𝑥"
    + 𝑒#
    (𝑏#"
    ≠ 0) 𝑥#
    𝑥"
    𝑒"
    𝑒#
    𝑒!
    , 𝑒"
    non-Gaussian
    Regress effect x2 on cause x1 Regress cause x1 on effect x2

    View full-size slide

  19. Independence measure (Hyvarinen & Smith, 2013)
    • Can compute difference of mutual information of explanatory
    variable and its residual for different directions by one-
    dimensional entropy
    • Maximum entropy approximation of entropy 𝐻 (Hyvarinen, 1999)
    19
    𝐻(𝑢) ≈ 𝐻 𝑣 − 𝑘-
    [𝐸 log cosh 𝑢 − 𝛾].−𝑘.
    [𝐸 𝑢 exp (−𝑢./2 ].
    𝐼 𝑥"
    , 𝑟#
    " − 𝐼 𝑥#
    , 𝑟"
    # = 𝐻 𝑥"
    + 𝐻
    𝑟#
    "
    sd 𝑟#
    "
    − 𝐻 𝑥#
    + 𝐻
    𝑟"
    #
    sd 𝑟"
    #

    View full-size slide

  20. Evaluation of estimated causal graphs
    20

    View full-size slide

  21. Before estimating causal graphs
    • Assessing assumptions by
    – Gaussianity test
    – Histograms
    • continuous?
    – Too high correlation?
    • multicollinearity?
    – Background knowledge
    21

    View full-size slide

  22. After estimating causal graphs
    • Assessing assumptions by
    – Testing independence of error variables, e.g., by HSIC (Gretton
    et al., 2005)
    – Prediction accuracy using Markov boundary (Biza et al., 2020)
    – Compare to the results of other datasets in which causal
    graphs expected to be similar
    – Check against background knowledge
    22

    View full-size slide

  23. Statistical reliability assessment
    • Bootstrap probability (bp) of directed paths and edges
    • Interpret causal effects whose bp larger than a threshold, say 5%
    23
    x3
    x1
    … …
    x3
    x1
    x0
    x3
    x1
    x2
    x3
    x1
    99% 96%
    Total effect:
    20.9 10%
    LiNGAM Python package: https://github.com/cdt15/lingam

    View full-size slide

  24. To relax the model assumptions
    24

    View full-size slide

  25. Other identifiable models
    • Nonlinearity + “additive” noise (Hoyer+08NIPS, Zhang+09UAI, Peters+14JMLR)
    • 𝑥% = 𝑓%(par(𝑥%)) + 𝑒%
    • 𝑥% = 𝑔%
    &"(𝑓%(par(𝑥%)) + 𝑒%)
    • Discrete variables
    – Poisson DAG model and its extensions (Park+18JMLR)
    • Mixed types of variables: LiNGAM + logistic-type model
    – Identifiability condition for two variables (Wenjuan+18IJCAI)
    – Probably ok also for multivariate cases using the idea of Thm.28 of Peters
    et al. (2014)
    25

    View full-size slide

  26. Other identifiable models
    • Nonlinearity + “additive” noise (Hoyer+08NIPS, Zhang+09UAI, Peters+14JMLR)
    • 𝑥% = 𝑓%(par(𝑥%)) + 𝑒%
    • 𝑥% = 𝑔%
    &"(𝑓%(par(𝑥%)) + 𝑒%)
    • Discrete variables
    – Poisson DAG model and its extensions (Park+18JMLR)
    • Mixed types of variables: LiNGAM + logistic-type model
    – Identifiability condition for two variables (Wenjuan+18IJCAI)
    – Probably ok also for multivariate cases using the idea of Thm.28 of Peters
    et al. (2014)
    26

    View full-size slide

  27. Other identifiable models
    • Nonlinearity + “additive” noise (Hoyer+08NIPS, Zhang+09UAI, Peters+14JMLR)
    • 𝑥% = 𝑓%(par(𝑥%)) + 𝑒%
    • 𝑥% = 𝑔%
    &"(𝑓%(par(𝑥%)) + 𝑒%)
    • Discrete variables
    – Poisson DAG model and its extensions (Park+18JMLR)
    • Mixed types of variables: LiNGAM + logistic-type model
    – Identifiability condition for two variables (Wenjuan+18IJCAI)
    27

    View full-size slide

  28. For better statistical reliability
    28

    View full-size slide

  29. For better statistical reliability
    • Use background knowledge in estimation
    – Causal orders
    – Specify functional forms
    – Specify distribution
    • E.g., in manufacturing, causal orders
    of these 3 groups often known
    – Manufacturing conditions
    – Intermediate characteristics
    – Final characteristic(s)
    29
    Final characteristic
    Manufacturing
    Condition 1
    Manufacturing
    Condition 10
    Intermediate
    chrctrstc 1
    Intermediate
    chrctrstc 100

    Intermediate
    chrctrstc 82
    Intermediate
    chrctrstc 8
    Intermediate
    chrctrstc 66
    Intermediate
    chrctrstc 66
    Intermediate
    chrctrstc 16


    … …

    View full-size slide

  30. For better statistical reliability
    • Simultaneously analyze different datasets to use similarity
    (Ramsey et al. 2011; Shimizu, 2012)
    – Similarity: Causal orders same, distributions and coefficients may different
    – Accuracy greatly improved in fMRI simulated data (Ramsey et al., 2011)
    30
    x3
    x1
    x2
    e1
    e2
    e3
    4
    -3
    2
    x3
    x1
    x2
    e1
    e2
    e3
    -0.5
    5
    Dataset 1 Dataset 2

    View full-size slide

  31. LiNGAM with hidden common causes
    31

    View full-size slide

  32. Estimate causal structures of variables that
    do not share hidden common causes
    • For unconfounded pairs with no hidden common causes, estimate the causal
    directions
    • For confounded pairs with hidden common causes, leave them remain
    unknown
    32
    𝑥# 𝑥"
    𝑓"
    𝑥$
    Underlying model Output
    𝑥0
    𝑥# 𝑥"
    𝑥$
    𝑥0
    𝑓#

    View full-size slide

  33. Non-Gaussianity and independence work again
    • Existence of hidden common causes leads to dependence
    btw. explanatory variable and its residual (Tashiro et al., 2014)
    • Key result (Maeda & Shimizu, 2020)
    – Find a set of variables that that gives independent residual
    when a variable is regressed on every its subset
    – If succeeded, variables in such a set (x1 and x2) are
    the unconfounded ancestors of the variable (x4)
    • For nonlinear additive models, existence of hidden
    intermediate variables also leads to dependence
    (Maeda & Shimizu, 2021)
    33
    𝑥#
    𝑥"
    𝑓"
    !!
    !"
    ""
    !#
    !$
    "!
    !!
    𝑥# 𝑥"
    𝑓$

    View full-size slide

  34. Non-Gaussianity and independence work again
    • Existence of hidden common causes leads to dependence
    btw. explanatory variable and its residual (Tashiro et al., 2014)
    • Key result (Maeda & Shimizu, 2020)
    – Find a set of variables that that gives independent residual
    when a variable is regressed on every its subset
    – If succeeded, variables in such a set (x1 and x2) are
    unconfounded ancestors of the variable (x4)
    • For nonlinear additive models, existence of hidden
    intermediate variables also leads to dependence
    (Maeda & Shimizu, 2021)
    34
    𝑥#
    𝑥"
    𝑓"
    !!
    !"
    ""
    !#
    !$
    "!
    !!
    𝑥# 𝑥"
    𝑓$

    View full-size slide

  35. Non-Gaussianity and independence work again
    • Existence of hidden common causes leads to dependence
    btw. explanatory variable and its residual (Tashiro et al., 2014)
    • Key result (Maeda & Shimizu, 2020)
    – Find a set of variables that that gives independent residual
    when a variable is regressed on every its subset
    – If succeeded, variables in such a set (x1 and x2) are
    unconfounded ancestors of the variable (x4)
    • For nonlinear additive models, existence of hidden
    intermediate variables also leads to dependence
    (Maeda & Shimizu, 2021)
    35
    𝑥#
    𝑥"
    𝑓"
    !!
    !"
    ""
    !#
    !$
    "!
    !!
    𝑥# 𝑥"
    𝑓$

    View full-size slide

  36. Estimate causal structures of variables that
    share hidden common causes
    (Hoyer, Shimizu, Kerminen & Palviainen, 2008; Salehkaleybar et al., 2020)
    • LiNGAM with unobserved common cause is ICA (Hyvarinen et al.,2001)
    • Apply ICA and look at the zero/non-zero pattern
    36
    𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝒙 = (𝐼 − 𝐵)"# (𝐼 − 𝐵)"#𝛬
    𝒆
    𝒇
    𝑥"
    𝑥!
    =
    1 0 𝜆""
    𝑏!" 1 𝜆!"
    𝑒"
    𝑒!
    𝑓"
    𝑥# 𝑥"
    𝑓"
    𝑒"
    𝑒#
    𝑏!"
    𝜆!" 𝜆""
    𝑥"
    𝑥!
    =
    1 𝑏"! 𝜆""
    0 1 𝜆!"
    𝑒"
    𝑒!
    𝑓"
    𝑥# 𝑥"
    𝑓"
    𝑒"
    𝑒#
    𝑏"!
    𝜆!" 𝜆""
    𝑥"
    𝑥!
    =
    1 0 𝜆""
    0 1 𝜆!"
    𝑒"
    𝑒!
    𝑓"
    𝑥#
    𝑥"
    𝑓"
    𝑒"
    𝑒#
    𝜆!" 𝜆""
    Independent components

    View full-size slide

  37. Estimate causal structures of variables that
    share hidden common causes
    (Hoyer, Shimizu, Kerminen & Palviainen, 2008; Salehkaleybar et al., 2020)
    • LiNGAM with unobserved common cause is ICA (Hyvarinen et al.,2001)
    • Apply ICA and look at the zero/non-zero pattern
    37
    𝒙 = 𝐵𝒙 + 𝛬𝒇 + 𝒆 𝒙 = (𝐼 − 𝐵)"# (𝐼 − 𝐵)"#𝛬
    𝒆
    𝒇
    𝑥"
    𝑥!
    =
    1 0 𝜆""
    𝑏!" 1 𝜆!" + 𝜆!"𝜆""
    𝑒"
    𝑒!
    𝑓"
    𝑥# 𝑥"
    𝑓"
    𝑒"
    𝑒#
    𝑏!"
    𝜆!" 𝜆""
    𝑥"
    𝑥!
    =
    1 𝑏"! 𝜆"" + 𝑏"!𝜆!"
    0 1 𝜆!"
    𝑒"
    𝑒!
    𝑓"
    𝑥# 𝑥"
    𝑓"
    𝑒"
    𝑒#
    𝑏"!
    𝜆!" 𝜆""
    𝑥"
    𝑥!
    =
    1 0 𝜆""
    0 1 𝜆!"
    𝑒"
    𝑒!
    𝑓"
    𝑥#
    𝑥"
    𝑓"
    𝑒"
    𝑒#
    𝜆!" 𝜆""
    Independent components

    View full-size slide

  38. LiNGAM for latent factors
    38

    View full-size slide

  39. LiNGAM for latent factors (Shimizu et al., 2009)
    • Model:
    – 2 pure measurement variables per latent needed to identify the
    measurement model (Silva et al., 2006; Xie et al., 2020)
    • Estimate the latent factors and then their causal graph
    39
    𝑥"
    𝑥!
    $
    𝑓"
    $
    𝑓!
    𝑥#
    𝑥$
    ?
    𝒇 = 𝐵𝒇+𝝐
    𝒙 = 𝐺𝒇+𝒆

    View full-size slide

  40. Find common and unique factors across
    multiple datasets (Zeng et al., 2021)
    • Model
    • Score function: likelihood + DAGness (Zheng et al., 2018)
    • Feature extraction across multiple datasets
    + causal discovery of latent factors
    40
    𝒇(1) = 𝐵(1) 𝒇(1)+ 𝝐(1)
    𝒙(1) = 𝐺(1) 𝒇(1)+ 𝒆(1)
    𝑚 = 1, … , 𝑀
    !
    "
    !
    (#)
    !
    !
    (!)
    !
    $
    (!)
    !
    %
    (!)
    !
    &
    (!)
    ?
    !
    !
    ($)
    !
    $
    ($)
    !
    "
    !
    (!)
    !
    %
    (%)
    !
    &
    (&)
    ?
    !
    "
    #
    (!)
    !
    "
    #
    (#)
    !
    "
    #
    (#) = !
    "
    !
    (!)?

    View full-size slide

  41. Final summary
    41

    View full-size slide

  42. Final summary
    • Statistical causal inference is a fundamental tool for science
    – Many well-developed methods available in cases that a causal graph can be
    drawn with background knowledge
    – Helping drawing causal graphs with data is the key: Causal discovery
    • LiNGAM-related papers: https://sites.google.com/view/sshimizu06/lingam/lingampapers
    • Next default assumptions
    – Hidden common cause / latent factors
    – Mixed data: Continuous and discrete
    – (Cyclicity (Lacerda et al., 2008))
    42

    View full-size slide

  43. References
    • T. N. Maeda, S. Shimizu. RCD: Repetitive causal discovery of linear non-Gaussian acyclic models with latent
    confounders. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (AISTATS2020), 2020
    • F. H. Messerli, Chocolate Consumption, Cognitive Function, and Nobel Laureates. New England Journal of
    Medicine, 2012.
    • T. Rosenström, M. Jokela, S. Puttonen, M. Hintsanen, L. Pulkki-Råback, J. S. Viikari, O. T. Raitakari and L.
    Keltikangas-Järvinen. Pairwise measures of causal direction in the epidemiology of sleep problems and
    depression. PLoS ONE, 7(11): e50841, 2012
    • A. Moneta, D. Entner, P. O. Hoyer and A. Coad. Causal inference by independent component analysis: Theory
    and applications. Oxford Bulletin of Economics and Statistics, 75(5): 705-730, 2013.
    • O. Boukrina and W. W. Graves. Neural networks underlying contributions from semantics in reading aloud.
    Frontiers in Human Neuroscience, 7:518, 2013.
    • P. Campomanes, M. Neri, B. A.C. Horta, U. F. Roehrig, S. Vanni, I. Tavernelli and U. Rothlisberger. Origin of the
    spectral shifts among the early intermediates of the rhodopsin photocycle. Journal of the American Chemical
    Society, 136(10): 3842-3851, 2014.
    • Peters, Janzing, and Schölkopf. (2018). Elements of Causal Inference: Foundations and Learning Algorithms. MIT
    Press.
    • S. Shimizu and P. Blöbaum. Recent advances in semi-parametric methods for causal discovery. In Direction
    Dependence in Statistical Models: Methods of Analysis (W. Wiedermann, D. Kim, E. Sungur, and A. von Eye, eds.),
    Chapter. Wiley, 2020.
    43

    View full-size slide

  44. References
    • J. Pearl. Causality. Cambridge University Press, 2001.
    • P. Spirtes, C. Glymour, R. Scheines. Causation, Prediction, and Search. Springer, 1993.
    • S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen. A linear non-gaussian acyclic model for causal
    discovery. Journal of Machine Learning Research, 7: 2003--2030, 2006
    • S. Shimizu. LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika, 41(1): 65--98,
    2014
    • P. Spirtes, C. Meek, T. S. Richardson. Causal Inference in the Presence of Latent Variables and Selection Bias.
    In Proc. 11th Conf. on Uncertainty in Artificial Intelligence (UAI1995), 1995.
    • D. Malinsky and P. Spirtes. Causal Structure Learning from Multivariate Time Series in Settings with
    Unmeasured Confounding. In Proc. 2018 ACM SIGKDD Workshop on Causal Discovery (KDD-CD), 2018.
    • T. S. Richardson. A Discovery Algorithm for Directed Cyclic Graphs. In Proc. 12th Conf. on Uncertainty in
    Artificial Intelligence (UAI1996), 1996.
    44

    View full-size slide

  45. References
    • D. Malinsky and P. Spirtes, Estimating bounds on causal effects in high-dimensional and possibly
    confounded systems. International J. Approximate Reasoning, 2017
    • S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. O. Hoyer and K. Bollen.
    DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of
    Machine Learning Research, 12(Apr): 1225--1248, 2011.
    • A. Hyvärinen and S. M. Smith. Pairwise likelihood ratios for estimation of non-Gaussian structural equation
    models. Journal of Machine Learning Research, 14(Jan): 111--152, 2013.
    • A. Hyvarinen. New approximations of differential entropy for independent component analysis and
    projection pursuit, In Advances in Neural Information Processing Systems 12 (NIPS1999), 1999
    • P. O. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Schölkopf. Nonlinear causal discovery with additive noise
    models. In Advances in Neural Information Processing Systems 21 (NIPS2008), pp. 689-696, 2009.
    • K. Zhang and A. Hyvärinen. Distinguishing causes from effects using nonlinear acyclic causal models. In
    JMLR Workshop and Conference Proceedings, Causality: Objectives and Assessment (Proc. NIPS2008 workshop
    on causality), 6: 157-164, 2010.
    • J. Peters, J. Mooij, D. Janzing and B. Schölkopf. Causal discovery with continuous additive noise models.
    Journal of Machine Learning Research, 15: 2009--2053, 2014.
    45

    View full-size slide

  46. References
    • G. Lacerda, P. Spirtes, J. Ramsey and P. O. Hoyer. Discovering cyclic causal models by independent
    components analysis. In Proc. 24th Conf. on Uncertainty in Artificial Intelligence (UAI2008), pp. 366-374, Helsinki,
    Finland, 2008.
    • P. O. Hoyer, S. Shimizu, A. Kerminen and M. Palviainen. Estimation of causal effects using linear non-gaussian
    causal models with hidden variables. International Journal of Approximate Reasoning, 49(2): 362-378, 2008.
    • S. Salehkaleybar, A. Ghassami, N. Kiyavash, K. Zhang. Learning Linear Non-Gaussian Causal Models in the
    Presence of Latent Variables. Journal of Machine Learning Research, 21:1-24, 2020.
    • S. Shimizu, P. O. Hoyer and A. Hyvärinen. Estimation of linear non-Gaussian acyclic models for latent factors.
    Neurocomputing, 72: 2024-2027, 2009.
    • Y. Zeng, S. Shimizu, R. Cai, F. Xie, M. Yamamoto, Z. Hao. Causal Discovery with Multi-Domain LiNGAM for
    Latent Factors. Proc. IJCAI2021.
    • Zheng, Xun and Aragam, Bryon and Ravikumar, Pradeep K and Xing, Eric P. DAGs with NO TEARS: Continuous
    Optimization for Structure Learning, Part of Advances in Neural Information Processing Systems 31 (NeurIPS
    2018), 2018
    • J. D. Ramsey, S. J. Hanson and C. Glymour. Multi-subject search correctly identifies causal connections and
    most causal directions in the DCM models of the Smith et al. simulation study. NeuroImage, 58(3): 838--848,
    2011.
    • S. Shimizu. Joint estimation of linear non-Gaussian acyclic models. Neurocomputing, 81: 104-107, 2012.
    46

    View full-size slide

  47. References
    • W. Wenjuan, F. Lu, and L. Chunchen. Mixed Causal Structure Discovery with Application to Prescriptive
    Pricing. In Proc. 27th International Joint Conference on Artificial Intelligence (IJCAI2018), pp. xx--xx, Stockholm,
    Sweden, 2018.
    • Y. Komatsu, S. Shimizu and H. Shimodaira. Assessing statistical reliability of LiNGAM via multiscale bootstrap.
    In Proc. International Conference on Artificial Neural Networks (ICANN2010), pp.309-314, Thessaloniki, Greece,
    2010.
    • K. Biza, I. Tsamardinos, S. Triantafillou. Tuning causal discovery algorithms. In Proc. Probabilistic Graphical
    Models (PGM2020), 2020.
    • R. Silva, R. Scheines, C. Glymour, and P. Spirtes. Learning the structure of linear latent variable models.
    Journal of Machine Learning Research, 7:191–246, 2006.
    • F. Xie, R. Cai, B. Huang, C. Glymour, Z. Hao, and K. Zhang. Generalized independent noise condition for
    estimating latent variable causal graphs. NeurIPS, 33, 2020.
    • K. Zhang, M. Gong, P. Stojanov, B. Huang, Q. Liu, C. Glymour. Domain Adaptation as a Problem of Inference on
    Graphical Models. NeurIPS, 33, 2020.
    • M. J. Kusner, J. Loftus, C. Russell, R. Silva. Counterfactual Fairness. In Advances in Neural Information
    Processing Systems 30 (NIPS 2017), 2017
    • P. Blöbaum and S. Shimizu. Estimation of interventional effects of features on prediction. In Proc. 2017 IEEE
    International Workshop on Machine Learning for Signal Processing (MLSP2017), pp. xx--xx, Tokyo, Japan, 2017.
    47

    View full-size slide