Slide 16
Slide 16 text
遺伝子発現量の間の因果効果
Maathuis et al. (2010) Predicting causal effects in large-scale systems from observational data.
Nature Methods.
• 5361変数・サンプルサイズ63
• 因果探索で因果グラフを推定し、それに基づいて因果効果(の下限)
を推定 (下限が甘いことはある)
• 実際の介入実験結果と照らしてランダムにやるより当たっていた
16
yielded only 5 ± 2.1 true positives (10% ± 4.2%). Moreover, IDA
improved substantially on Lasso4 and Elastic-net5, two state-of-
the-art high-dimensional regression approaches commonly used
to determine variable importance but not designed for causal
inference (Fig. 1a, Supplementary Table 1 and Supplementary
Methods). For m = 10 and q = 50, these methods yielded 10
(20%) and 8 (16%) true positives, respectively. Finally, we found
that the superior performance of IDA compared to that of the
other methods was insensitive to the choice of m value for m =
1, ... 50 (Fig. 1b).
As a second test, we used data from the DREAM4 In Silico
Network Challenge6, a competition in reverse engineering of
gene regulation networks. These data include several types of
simulated mRNA expression levels, based on sophisticated bio-
logically motivated simulation methods6, for five networks of
10 genes and five networks of 100 genes. We used two types of
observational data: (i) steady-state gene expression levels from
unknown multifactorial perturbations of the networks and (ii)
time series data on gene expression levels from the response
and recovery of the networks to unknown external perturba-
primary interest in many fields of science. The
od for determining such relationships uses ran-
lled perturbation experiments. In many settings,
xperiments are expensive and time consuming.
rable to obtain causal information from observa-
t is, from data obtained by observing the system
out subjecting it to interventions.
tablished methods to estimate causal effects
onal data when the possible causal relationships
riables are known1. Many real-world problems,
e large-scale systems without such information.
enerally impossible to estimate causal effects in
we recently proposed and mathematically justi-
l method to obtain bounds on total causal effects,
umptions (Supplementary Methods). We call this
ntion-calculus when the DAG is absent (IDA).
en experimentally validated until now, and there
rimental validation of causal inference methods
ere an experimental validation of IDA. As a first
a compendium of gene
ofiles of Saccharomyces
taining 267 full-genome
ofiles of yeast deletion
ventional data), together
nome expression profiles
rol experiments (observa-
obtained under the same
ter initial data cleaning
y Methods), the interven-
ained expression measure-
genes for 234 single-gene
nt strains, and the obser-
ontained expression mea-
e same 5,361 genes for 63
res.
nterventional data as the
for estimating the total m values
0
0.5
1.0
1.5
2.0
2.5
pAUC × 105
0 10 20 30 40 50
0 1,000 2,000 3,000 4,000
0
200
400
600
800
1,000
IDA
Lasso
Elastic-net
Random
True positives
False positives
a b
IDA
Lasso
Elastic-net
Random
未観測共通原因ありへの拡張
(Malinsky & Spirtes, 2017)
Code: https://github.com/dmalinsk/lv-ida