統計的因果探索とAI

Slide 1

Slide 1 text

統計的因果探索とAI 清水昌平滋賀大学データサイエンス学系理化学研究所革新知能統合研究センター 2022年11月10日脳病態数理・データ科学セミナー任期5年 2名 https://www.shiga-u.ac.jp/wp/wp- content/uploads/DSC_CREST_20221201.pdf https://twitter.com/sshimizu2006/status/15779932554335641 62?s=20&t=70ZDqbghQk-FN6f0ixHJxQ

Slide 16

Slide 16 text

遺伝子発現量の間の因果効果 Maathuis et al. (2010) Predicting causal effects in large-scale systems from observational data. Nature Methods. • 5361変数・サンプルサイズ63 • 因果探索で因果グラフを推定し、それに基づいて因果効果(の下限) を推定 (下限が甘いことはある) • 実際の介入実験結果と照らしてランダムにやるより当たっていた 16 yielded only 5 ± 2.1 true positives (10% ± 4.2%). Moreover, IDA improved substantially on Lasso4 and Elastic-net5, two state-of- the-art high-dimensional regression approaches commonly used to determine variable importance but not designed for causal inference (Fig. 1a, Supplementary Table 1 and Supplementary Methods). For m = 10 and q = 50, these methods yielded 10 (20%) and 8 (16%) true positives, respectively. Finally, we found that the superior performance of IDA compared to that of the other methods was insensitive to the choice of m value for m = 1, ... 50 (Fig. 1b). As a second test, we used data from the DREAM4 In Silico Network Challenge6, a competition in reverse engineering of gene regulation networks. These data include several types of simulated mRNA expression levels, based on sophisticated bio- logically motivated simulation methods6, for five networks of 10 genes and five networks of 100 genes. We used two types of observational data: (i) steady-state gene expression levels from unknown multifactorial perturbations of the networks and (ii) time series data on gene expression levels from the response and recovery of the networks to unknown external perturba- primary interest in many fields of science. The od for determining such relationships uses ran- lled perturbation experiments. In many settings, xperiments are expensive and time consuming. rable to obtain causal information from observa- t is, from data obtained by observing the system out subjecting it to interventions. tablished methods to estimate causal effects onal data when the possible causal relationships riables are known1. Many real-world problems, e large-scale systems without such information. enerally impossible to estimate causal effects in we recently proposed and mathematically justi- l method to obtain bounds on total causal effects, umptions (Supplementary Methods). We call this ntion-calculus when the DAG is absent (IDA). en experimentally validated until now, and there rimental validation of causal inference methods ere an experimental validation of IDA. As a first a compendium of gene ofiles of Saccharomyces taining 267 full-genome ofiles of yeast deletion ventional data), together nome expression profiles rol experiments (observa- obtained under the same ter initial data cleaning y Methods), the interven- ained expression measure- genes for 234 single-gene nt strains, and the obser- ontained expression mea- e same 5,361 genes for 63 res. nterventional data as the for estimating the total m values 0 0.5 1.0 1.5 2.0 2.5 pAUC × 105 0 10 20 30 40 50 0 1,000 2,000 3,000 4,000 0 200 400 600 800 1,000 IDA Lasso Elastic-net Random True positives False positives a b IDA Lasso Elastic-net Random 未観測共通原因ありへの拡張 (Malinsky & Spirtes, 2017) Code: https://github.com/dmalinsk/lv-ida

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text