Slide 1

Slide 1 text

Causal Inference in Machine Learning @davilagrau

Slide 2

Slide 2 text

About me… Andrés-Leonardo Martínez-Ortiz a.k.a almo, holds a PhD on Software, Systems and Computing and a Master on Computer Science. Based on Zurich, almo is a member of the Google Machine Learning Site Reliability Engineering team, leading several programs aiming for reliability, efficiency & convergence. He is also a member of IEEE, ACM, Linux Foundation and Computer Society. @davilagrau almo.dev almo

Slide 3

Slide 3 text

About this talk… Feynman Effect ● How artificial could an intelligence be? ● How much could a machine learn? ● How could AI be smarter?

Slide 4

Slide 4 text

How artificial could an intelligence be?

Slide 5

Slide 5 text

Testing intelligence (1950) Turing Test (binary) Judges: 4 Level: adult humans (general) Mini Turing Test (binary) Judges: 1 Level: 3 years old human (restricted) Photo by Crazy Cake on Unsplash Photo by Caleb Woods on Unsplash

Slide 6

Slide 6 text

Seeing Imagine Acting Testing Intelligence (2020) The ladder of causation Counterfactuals Activities: imagining, retrospective, understanding. What if I have done… ? Why? Was X the cause? Y? What if X has not occurred? What if I acted differently? Intervention Activities: doing, intervening. What if I do… ? How? What would Y be if I do X? How can I make Y happen? Association Activities: seeing, observing. What if I see…? How are the variables related? How would seeing X change my belief of Y? L1 L2 L3

Slide 7

Slide 7 text

SuperIntelligence Hint: Not even possible to know if an AI is superintelligent. https://spectrum.ieee.org/tech-talk/robotics/artificial-intelligence/super-artificialintelligence January 2021 L3 + n

Slide 8

Slide 8 text

How much could a machine learn?

Slide 9

Slide 9 text

The simplest learning algorithm Y= aX + b Photo by Joanna Kosinska on Unsplash

Slide 10

Slide 10 text

Photo by Maxim Hopman on Unsplash Photo by Tangerine Newt on Unsplash How much ice cream (X) does a murderer (Y) eat? P(Y|X) != P(Y|do(X))

Slide 11

Slide 11 text

L1 Association (seeing, observing, …) Cholesterol (Y) Exercise (X) L2 Intervention (acting, doing, …) Cholesterol (Y) Exercise (X) 20 30 40 50 60 Age X Y Cholesterol Exercise ? Age X Y Cholesterol Exercise x x P(Y|X,do(Age))

Slide 12

Slide 12 text

AIAAIC Repository https://www.aiaaic.org AI, algorithmic, and automation incidents and controversies collected, dissected, examined, and divulged AI is fragile. "Don’t shake me!" Those who remember history are doomed to repeat it AI cannot remember old stories! AI doesn’t like Why’s AI doesn't like maths or logic AI doesn't like decisions. "Don't make me choose." Photo by Margarita Zueva on Unsplash

Slide 13

Slide 13 text

How could AI be smarter?

Slide 14

Slide 14 text

Causal Formulation Age X Y Cholesterol Exercise x x P(X|Y,do(Age=a)) P(X|Y,do(Temperature=t)) Temperature X Y Murders Ice Cream x x x Main elements ● Causal diagrams define a knowledge language ● Do-Calculus i.e. algebraic formulation defines a query language Main results ● Topological Structure of Causal Graphs allow us to infer causal connection among the variables ● In some cases, do-calculus (level 2 & level 3) might be expressed in terms of observational data (level 1)

Slide 15

Slide 15 text

Causal Diagrams (Directed Acyclic Graphs) A B C A B C A B C Mediator Collider F H I D A B C G J P(J|A,B,C,D,F,G,H,I) = P (J|H,I) d-separation Fork

Slide 16

Slide 16 text

Causal Inference Engine Knowledge Assumptions Causal Model Testable Implication Query Can we answer? Back to 2 and 3 Estimand (Recipe) Data Statistical Estimation Estimate (Answer) 1 2 3 4 5 7 8 9 6 Yes No Inference Engine

Slide 17

Slide 17 text

Photo by Viktor Forgacs on Unsplash A B C D Z Causal Modeling and Testing 1.- P(A, B, C, D, ..., Z) O(NP) 2.- P(A) = ∏P(A| pa(A)) Heuristical and contextual* F H I D A B C G J d-separation U k U j U i *Randomized Controlled Trial

Slide 18

Slide 18 text

Photo by Sharon Pittaway on Unsplash Bayesian Networks PyWhy Probability Trees Implementing Inference Engine Tensorflow Probability: chaining joint distribution w/ Bayesian Modeling with Joint Distribution. Support also of the Structural Time Series Python E2E open source library: it causal inference that supports explicit modeling and testing of causal assumptions Kotlin implementation of predicate and causal inference in Probability Trees (almo.dev)

Slide 19

Slide 19 text

Probability Trees Root NodeID:01 Statement:[O,1] Statements:[(Z, 0)] NodeID:01.0.0.0 P:0.5000 Statements:[(Z, 1)] NodeID:01.0.0.1 P:0.5000 Statements:[(Z, 1)] NodeID:01.0.1.0 P:0.3333 Statements:[(Z, 0)] NodeID:01.0.1.1 P:0.6667 Statements:[(Y, 0)] NodeID:01.1.0.0 P:0.5000 Statements:[(Y, 1)] NodeID:01.1.0.1 P:0.5000 Statements:[(Y, 1)] NodeID:01.1.1.0 P:0.2000 Statements:[(Y, 0)] NodeID:01.1.1.1 P:0.8000 Statements:[(Y, 0)] NodeID:01.0.0 P:0.4000 Statements:[(Y, 1) NodeID:01.0.1 P:0.6000 Statements:[(Z, 0)] NodeID:01.1.0 P:0.3333 Statements:[(Z, 1)] NodeID:01.1.1 P:0.6667 Statements:[(X, 0)] NodeID:01.0 P:0.4545 Statements:[(X, 1)] NodeID:01.1 P:0.5455

Slide 20

Slide 20 text

Discrete Probability Trees Modelling Random Experiments & Stochastic Process Main Features: ● Computing arbitrary events through propositional calculus and causal precedence ● Computing the three fundamental operations of Structural Causal Models (Do-Calculus) ○ Conditions ○ Interventions ○ Counterfactuals Algorithms for Causal Reasoning in Probability Trees Genewein et al. (2020) DeepMind https://arxiv.org/abs/1911.10500

Slide 21

Slide 21 text

Probability Trees Recursive definition Node n is a tuple n = (u,S,C) ● Id ● Statements list ● Transition list Transition list is a tuple (𝙥,𝒎) ∈ [0,1]xN ● transition probability ● Node m The root: ● Node without parents ● Statement “O=1” A leaf is a node with childs.

Slide 22

Slide 22 text

Events and min-cuts An event is a collection of total realization i.e. path from the root to a leaf. Formally, an event is a cut δ(𝑇, 𝐹) where the true set 𝑇 and the false set 𝐹 contains all the nodes where the event becomes true and false respectively. Critical nodes are Markov Blanket: all variables bound within a path from the root to the critical nodes are exogenous downstream.

Slide 23

Slide 23 text

Min-cuts Algorithms Bug: it should be V

Slide 24

Slide 24 text

Causal Events Precedence Bug: it should be 𝑻 e Bug: it should be 𝑭 e Note: the event where Y=1 precedes Z=0 cannot be stated logically. It is a causal event requiring a probability tree.

Slide 25

Slide 25 text

Causal Events Conditions P(A|B) Bug: it should be q Question: What is the probability of the event A given that the event B is true? Note: Downstream (prediction) or upstream (inference)

Slide 26

Slide 26 text

Causal Events Interventions P(A|do(B=b)) Question: What is the probability of the event A given that the event B was made true? Note: only downstream (prediction)

Slide 27

Slide 27 text

And before finishing…

Slide 28

Slide 28 text

Some references… Algorithms for Causal Reasoning in Probability Trees, T. Genewein, G. Deletang, V. Mikulik, M. Martic, S. Legg and P. A. Ortega, DeepMind. Causality for Machine Learning, B. Scholkopf, Max Planck Institute for Intelligence Systems DAGs with NO TEARS: Continuous Optimization for Structure Learning, X. Zheng, B. Aragam, P. Ravikumar and E.P. Xing, Carnegie Mellon University. Hands-on Bayesian Neural Networks - A tutorial for Deep Learning Users, L.V. Jospin, H. Laga, F. Boussaid, W. Buntine and M. Bennamoun.

Slide 29

Slide 29 text

And a game: Finding the causes is possible… 1 1 2 3 3 25 4 543 5 29281 6 3781503 7 1138779265 8 783702329343 9 1213442454842881 10 4175098976430598143 11 31603459396418917607425 Hint: https://oeis.org/A003024 …but really hard

Slide 30

Slide 30 text

Causal Inference in Machine Learning @davilagrau Thank you! @davilagrau almo.dev almo