Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trustworthy Learning and Reasoning in Complex Domains

Trustworthy Learning and Reasoning in Complex Domains

Detecting complex patterns of events with significant causal and temporal dependencies across multiple data streams is extremely difficult. Training a complicated model would require a large amount of data, which is unrealistic considering that complex events often are rare. For instance, only a tiny fraction of CCTV footage shows violence, and only a minor fraction of activities recorded in computer systems are acts of Advanced Persistent Threats (APTs). Neuro-symbolic architectures can deliver excellent results, especially when features are linked together through effective probabilistic circuits compiled from human-generated logic. Moreover, uncertainty-awareness is shown to raise the trust human operators can have when using such autonomous architectures. Indeed, there is no such thing as a certain datum in the real world: everything comes with shades of uncertainty. Traditional uncertainty estimation methodologies in AI aim at quantifying it via point probabilities, which can be more misleading than other approaches such as Bayesian statistics. Starting from the role that (probabilistic) logics has in supporting human sensemaking (Toniolo et al. 2015; Cerutti and Thimm 2019), in this talk, I will illustrate how we can encompass efficient and effective uncertainty-aware learning and reasoning in probabilistic circuits (Cerutti et al. 2019; 2021) and neural networks (Sensoy et al. 2020). I will then illustrate two neuro-symbolic architectures for complex event processing (Xing et al. 2020; Roig Vilamala et al. 2020) and discuss their uncertainty-awareness future extensions and potential real-world impact, including in cyber-threat intelligence analysis (Baroni et al. 2021).

Bibliography:

Baroni, Pietro, Federico Cerutti, Daniela Fogli, Massimiliano Giacomin, Francesco Gringoli, Giovanni Guida, and Paul Sullivan. 2021. ‘Self-Aware Effective Identification and Response to Viral Cyber Threats’. In 2021 13th International Conference on Cyber Conflict (CyCon), 353–70.

Cerutti, Federico, Lance Kaplan, Angelika Kimmig, and Murat Sensoy. 2019. ‘Probabilistic Logic Programming with Beta-Distributed Random Variables’. In Proceedings of the AAAI Conference on Artificial Intelligence.

Cerutti, Federico, Lance M. Kaplan, Angelika Kimmig, and Murat Sensoy. 2021. ‘Handling Epistemic and Aleatory Uncertainties in Probabilistic Circuits’. Accepted Subject to Minor Corrections.

Cerutti, Federico, and Matthias Thimm. 2019. ‘A General Approach to Reasoning with Probabilities’. International Journal of Approximate Reasoning 111.

Roig Vilamala, Marc, Harry Taylor, Tianwei Xing, Luis Garcia, Mani Srivastava, Lance M. Kaplan, Alun Preece, Angelika Kimming, and Federico Cerutti. 2020. ‘A Hybrid Neuro-Symbolic Approach for Complex Event Processing (Extended Abstract)’. In Proceedings of ICLP2020.

Sensoy, Murat, Lance Kaplan, Federico Cerutti, and Maryam Saleki. 2020. ‘Uncertainty-Aware Deep Classifiers Using Generative Models’. In Proceedings of the AAAI Conference on Artificial Intelligence, 5620–27.

Toniolo, A., T.J. Norman, A. Etuk, F. Cerutti, R.W. Ouyang, M. Srivastava, N. Oren, T. Dropps, J.A. Allen, and P. Sullivan. 2015. ‘Supporting Reasoning with Different Types of Evidence in Intelligence Analysis’. In Proceedings of AAMAS 2015, 781–89.

Xing, Tianwei, Luis Garcia, Marc Roig Vilamala, Federico Cerutti, Lance M Kaplan, Alun D Preece, and Mani B Srivastava. 2020. ‘Neuroplex: Learning to Detect Complex Events in Sensor Networks through Knowledge Injection’. In Proceedings of SenSys2020, edited by Jin Nakazawa and Polly Huang, 489–502. ACM.

68e3122726e353a50fde91aca0629e26?s=128

Federico Cerutti

September 17, 2021
Tweet

Transcript

  1. None
  2. Augmenting human sensemaking abilities to achieve causal insights and foresights

    (a.k.a. situational understanding) 2
  3. Overture. A brief historical case. Act I. On conjectures, refutations,

    and argumentation. Act II. There is no certain datum in the world. Act III. Interesting problems are complex. Epilogue. 3
  4. 4 https://archive.org/details/in.ernet.dli.2015.228218/page/n775/mode/2up?view=theater

  5. 5 Image: Wikipedia

  6. Empiricism All hypotheses and theories must be tested against observations

    of the natural world, rather than resting solely on a priori reasoning, intuition, or revelation. 6
  7. 7 Image: Wikipedia

  8. 8 Image: Wikipedia

  9. 9 Image: Wikipedia

  10. 10 https://www.jpl.nasa.gov/spaceimages/details.php?id=PIA02210

  11. 11 Image: Wikipedia

  12. The path of the planet Uranus did not conform to

    the path predicted by Newton’s law of gravitation in presence of the known planets. Explanations: • Human/instrument measure error • Newton’s laws are mistaken • An invisible magic teapot caused the perturbation in order to show the hubris of modern science • . . . • Newton’s laws—confirmed by a significant amount of evidence—are correct and the perturbation is caused by another, unknown, planet 12 Image: Wikipedia
  13. Scientific theories are capable of being refuted: they are falsifiable

    Verification and falsification are different processes: • No accumulation of confirming instances is sufficient • Only one contradicting instance suffices to refute a theory Scientific theories are tentative 13 Image: Wikipedia
  14. Overture. A brief historical case. Act I. On conjectures, refutations,

    and argumentation. Act II. There is no certain datum in the world. Act III. Interesting problems are complex. Epilogue. 14
  15. Does MMR vaccination cause autism? 15

  16. Argument from Correlation to Cause Correlation Premise: There is a

    positive correlation between A and B. Conclusion: A causes B. CQ1: Is there really a correlation between A and B? CQ2: Is there any reason to think that the correlation is any more than a coincidence? CQ3: Could there be some third factor, C, that is causing both A and B? Walton, Reed, Macagno, Argumentation Schemes, CUP, 2008 16
  17. 17

  18. EARLY REPORT Early report lleal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive

    developmental disorder in children A J Wake eld, S H Murch, A Anthony, J Linnell, D M Casson, M Malik, M Berelowitz, A P Dhillon, M A Thomson, P Harvey, A Valentine, 5 E Davies, J A Walker-Smith 5|-|mma|'Y Introduction 1177 " °9W several children Who, after a nP"" ' "‘ investigated a conser""' _m;mAn1".,,, 18
  19. 19 From Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder

    in children by Wakefield et al, The Lancet, 1998
  20. 20

  21. The New England Iournal of Medicine Copyright © 2002 by

    the Massachusetts Medical Society VOLUME 347 N()VEMBER 7, 2002 NUMBER 19 A POPULATION-BASED STUDY OF MEASLES, MUMPS, AND RUBELLA VACCINATION AND AUTISM KREESTEN MELDGAARD MADSEN, M.D., ANDERS HVIID, M.Sc., MOGENS VESTERGAARD, M.D., DIANA SCHENDEL, PH.D., JAN WOHLFAHRT, M.Sc., POUL THORSEN, M.D., J(ZiRN OLSEN, M.D., AND MADS MELBYE, M.D. ABS""‘ I 7 "Tested that the measle ' +hat vaccina— ”“CCi11C C3“’ -nn- ’ 21
  22. 22 From A Population-based Study of Measles, Mumps, and Rubella

    Vaccination and Autism by Madsen et al, The New England Journal of Medicine, 2002
  23. β =⇒ α γ =⇒ β ε =⇒ δ δ

    ∈ β 23
  24. β =⇒ α γ =⇒ β ε =⇒ δ δ

    ∈ β 24
  25. Results (tiny summary) HCI Assessment of argumentation semantics against human

    intuition (ECAI 2014) Algorithms Efficient algorithms and ensemble approaches (KR 2014, AAAI 2015, ECAI 2016, KER 2018, IJAR 2018, AIJ 2019, IJCAI 2021) Impact Implementation in the CISpaces.org online system (AAMAS 2015, SPIE 2018, COMMA 2018, JURIX 2018, AI3 2021) 25
  26. CISpaces.org Fact extraction from Twitter Argumentation graph manipulation Natural Language

    Generation for Automatic Reporting Available for use by professional analysts in the US Army Research Laboratory, and the UK Joint Forces Intelligence Group TRL4: validation in a laboratory environment https://tiresia.unibs.it/cispaces 26 F. Cerutti, T. J. Norman, A. Toniolo, and S. E. Middleton. CISpaces.org: from Fact Extraction to Report Generation. COMMA 2018, 269–281, 2018.
  27. Overture. A brief historical case. Act I. On conjectures, refutations,

    and argumentation. Act II. There is no certain datum in the world. Act III. Interesting problems are complex. Epilogue. 27
  28. Qualification problem “ For example, the successful use of a

    boat to cross a river requires, if the boat is a rowboat, that the oars and rowlocks be present and unbroken, and that they fit each other. Many other qualifications can be added, making the rules for using a rowboat almost impossible to apply, and yet anyone will still be able to think of additional requirements not yet stated. „ J. McCarthy, “Circumscription—A Form of Nonmonotonic Reasoning,” AIJ, 13 (12): 2739, 1980. 28
  29. Uncertainty Reliability of the Source A Completely reliable B Usually

    reliable C Fairly reliable D Not usually reliable E Unreliable F Reliability cannot be judged Credibility of the Information 1 Confirmed by other sources 2 Probably true 3 Possibly true 4 Doubtful 5 Improbable 6 Truth cannot be judged 29
  30. 0 . 1 : : burglary . 0 . 2

    : : earthquake . 0 . 7 : : hears_alarm ( john ) . alarm :− burglary . alarm :− earthquake . c a l l s ( john ) :− alarm , hears_alarm ( john ) . evidence ( c a l l s ( john ) ) . query ( burglary ) . alarm ↔ burglary ∨ earthquake calls(john) ↔ alarm ∧ hears_alarm(john) calls(john) 1 ρ( a ) = 1 λ = 1 1 ρ( a ) = 1 λ = 1 2 ρ( c(j) ) = 1 λ = 1 2 ρ( c(j) ) = 1 λ = 0 7 ρ( b ) = 0.1 λ = 1 7 ρ( b ) = 0.1 λ = 1 3 ρ( h(j) ) = 0.7 λ = 1 8 ρ( e ) = 0.2 λ = 1 3 ρ( h(j) ) = 0.7 λ = 1 4 ⊗ 5 ⊗ 6 ⊕ 8 ρ( e ) = 0.2 λ = 1 9 ⊗ 10 ⊕ 11 ⊗ 12 ⊕ 13 ⊗ 14 ⊗ 15 ⊕ 16 ⊗ 17 ⊕ 30
  31. Where numbers come from? # Day Earthquake 1 T 2

    T 3 F 4 F 5 F 6 F 7 F 8 F 9 F 10 F π: true—unknown—probability of earthquake in a given period of time Let y be the number of occurrence of earthquake per period of time (y = 2) From Bayes’ theorem, we can estimate the posterior distribution of π given the data on the basis of a prior: g(π|y) ∝ g(π) · f (y|π) The conjugate of a binomial is the Beta distribution. If: g(π; a, b) = Beta(a, b) = Γ(a + b) Γ(a) + Γ(b) πa−1(1 − π)b−1 then: g(π|y) = Beta(y + a, n − y + b) If a = b = 1 (uniform prior), then g(π|y) = Beta(y + 1, n − y + 1) In the example, g(π|y = 2, n = 10) = Beta(3, 9) 31
  32. X1 ∼ Beta(3, 9) 0.0 0.2 0.4 0.6 0.8 1.0

    0.0 0.5 1.0 1.5 2.0 2.5 3.0 E[X1 ] = 0.2500 Var(X1 ) = 1.4423 · 10−2 95% Confidence Interval: [0.0602, 0.5178] X2 ∼ Beta(21, 81) 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 E[X2 ] = 0.2059 Var(X2 ) = 1.5873 · 10−3 95% Confidence Interval: [0.1336, 0.2891] X3 ∼ Beta(201, 801) 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 E[X3 ] = 0.2006 Var(X3 ) = 1.5988 · 10−4 95% Confidence Interval: [0.1764, 0.2259] Although E[X1 ] ≃ E[X2 ] ≃ E[X3 ] ≃ 0.2 they represent remarkably different random variables 32 ∗ Y-axes of the graphs are misaligned for better graphical representation.
  33. Microsoft Human-AI Interaction Guidelines Guideline 1: Make clear what the

    system can do. Guideline 2: Make clear how well the system can do what it can do. . . . S. Amershi et. al., “Guidelines for Human-AI Interaction,” CHI 2019 EU Requirements of Trustworthy AI Human agency and oversight Technical robustness and safety Privacy and data governance Transparency Diversity, non-discrimination, and fairness Societal and environmental wellbeing Accountability EUROPEAN COMMISSION, 2019. High-Level Expert Group on Artificial Intelligence. 33
  34. ω2 : : burglary . ω3 : : earthquake .

    ω4 : : hears_alarm ( john ) . alarm :− burglary . alarm :− earthquake . c a l l s ( john ) :− alarm , hears_alarm ( john ) . evidence ( c a l l s ( john ) ) . query ( burglary ) . Identifier Beta parameters ω1 Beta(∞, 1) ω1 Beta(1, ∞) ω2 Beta(2, 18) ω2 Beta(18, 2) ω3 Beta(2, 8) ω3 Beta(8, 2) ω4 Beta(3.5, 1.5) ω4 Beta(1.5, 3.5) Cerutti, Kaplan, Kimmig, Şensoy, Handling Epistemic and Aleatory Uncertainties in Probabilistic Circuits, Under Submission, 2021, https://arxiv.org/abs/2102.10865 34
  35. 1 ρ( a ) = 1.00, 0.00 λ = 1

    1 ρ( a ) = 1.00, 0.00 λ = 1 2 ρ( c(j) ) = 1.00, 0.00 λ = 1 2 ρ( c(j) ) = 1.00, 0.00 λ = 0 7 ρ( b ) = 0.05, 0.10 λ = 1 7 ρ( b ) = 0.05, 0.10 λ = 1 3 ρ( h(j) ) = 0.50, 0.40 λ = 1 8 ρ( e ) = 0.10, 0.20 λ = 1 3 ρ( h(j) ) = 0.50, 0.40 λ = 1 4 ⊗ 5 ⊗ 6 ⊕ 8 ρ( e ) = 0.10, 0.20 λ = 1 9 ⊗ 10 ⊕ 11 ⊗ 12 ⊕ 13 ⊗ 14 ⊗ 15 ⊕ 16 ⊗ 17 ⊕ 1 ρ( a ) = ω1 X1 = ω1 λ = 1 1 ρ( a ) = ω1 X1 = ω1 λ = 1 2 ρ( c(j) ) = ω1 X2 = ω1 λ = 1 2 ρ( c(j) ) = ω1 X2 = ω1 λ = 0 12 ⊕ 12 ⊕ 13 ⊗ 13 ⊗ 14 ⊗ 14 ⊗ 16 ⊗ 16 ⊗ 17 ⊕ 17 ⊕ 7 ρ( b ) = ω2 X7 = ω2 λ = 1 7 ρ( b ) = ω2 X 7 = ω2 λ = 0 7 ρ( b ) = ω2 X7 = ω2 λ = 1 9 ⊗ 9 ⊗ 8 ρ( e ) = ω3 X8 = ω3 λ = 1 8 ρ( e ) = ω3 X8 = ω3 λ = 1 11 ⊗ 6 ⊕ 15 ⊕ 3 ρ( h(j) ) = ω4 X3 = ω4 λ = 1 3 ρ( h(j) ) = ω4 X3 = ω4 λ = 1 4 ⊗ 5 ⊗ 10 ⊕ Cerutti, Kaplan, Kimmig, Şensoy, Handling Epistemic and Aleatory Uncertainties in Probabilistic Circuits, Under Submission, 2021, https://arxiv.org/abs/2102.10865 35
  36. Let n be a ⊕-gate over C nodes, its children

    E[Xn] = c∈C E[Xc ], cov[Xn] = c∈C c′∈C cov[Xc , Xc′ ], cov[Xn, Xz ] = c∈C cov[Xc , Xz ] for z ∈ NA \ {n} Let n be a ⊗-gate over C nodes, its children E[Xn] = c∈C E[Xc ], cov[Xn] ≃ c∈C c′∈C E[Xn]2 E[Xc ]E[Xc′ ] cov[Xc , Xc′ ], cov[Xn, Xz ] ≃ c∈C E[Xn] E[Xc ] cov[Xc , Xz ] for z ∈ NA \ {n}. E Xr Xr ≃ E[Xr ] E[Xr ] , cov Xr Xr ≃ 1 E[Xr ]2 cov[Xr ] + E[Xr ]2 E[Xr ]4 cov[Xr ] − 2 E[Xr ] E[Xr ]3 cov[Xr , Xr ]. Cerutti, Kaplan, Kimmig, Şensoy, Handling Epistemic and Aleatory Uncertainties in Probabilistic Circuits, Under Submission, 2021, https://arxiv.org/abs/2102.10865 36
  37. 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.25 0.50 0.75

    1.00 1.25 1.50 1.75 MC CPB Cerutti, Kaplan, Kimmig, Şensoy, Handling Epistemic and Aleatory Uncertainties in Probabilistic Circuits, Under Submission, 2021, https://arxiv.org/abs/2102.10865 37
  38. 10 20 30 40 50 60 70 80 90 100

    110 120 130 140 150 160 170 180 190 200 Number of Monte Carlo Samples 0.800 0.825 0.850 0.875 0.900 0.925 0.950 0.975 1.000 Correlation with Golden Standard 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Number of Monte Carlo Samples 0.800 0.825 0.850 0.875 0.900 0.925 0.950 0.975 1.000 Correlation with Golden Standard 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Number of Monte Carlo Samples 0.800 0.825 0.850 0.875 0.900 0.925 0.950 0.975 1.000 Correlation with Golden Standard CPB MC 0 2 4 6 8 10 12 Execution Times CPB MC 0 2 4 6 8 10 12 Execution Times CPB MC 0 2 4 6 8 10 12 Execution Times Cerutti, Kaplan, Kimmig, Şensoy, Handling Epistemic and Aleatory Uncertainties in Probabilistic Circuits, Under Submission, 2021, https://arxiv.org/abs/2102.10865 38
  39. Overture. A brief historical case. Act I. On conjectures, refutations,

    and argumentation. Act II. There is no certain datum in the world. Act III. Interesting problems are complex. Epilogue. 39
  40. None
  41. A Trustworthy Loss Function Classification becomes regression outputting pieces of

    evidences in favour of different classes Expected squared error (aka Brier score) with Dir(mi | αi ) (prior for a Multinomial) penalising the divergence from the uniform distribution: L = N i=1 E[ yi − mi 2 2 ] + λt N i=1 KL ( Dir(µi | αi ) || Dir(µi | 1) ) where: • λt avoid premature convergence to the uniform distribution; • αi = yi + (1 − yi ) · αi are the Dirichlet parameters the neural network in a forward pass has put on the wrong classes, and the idea is to minimise them as much as possible. • KL ( Dir(µi | αi ) || Dir(µi | 1) ) = ln Γ( K k=1 αi,k ) Γ(K) K k=1 Γ(αi,k ) + K k=1 (αi,k − 1) ψ(αi,k ) − ψ K j=1 αi,j where ψ(x) = d dx ln ( Γ(x) ) is the digamma function Şensoy, Kaplan, and Kandemir. “Evidential deep learning to quantify classification uncertainty.” NeurIPS. 2018. 41
  42. EDL + GAN for adversarial training Şensoy, Kaplan, Cerutti, and

    Saleki. “Uncertainty-Aware Deep Classifiers using Generative Models.” AAAI 2020 42
  43. Robustness against FGS Anomaly detection (mnist) (cifar10) Şensoy, Kaplan, Cerutti,

    and Saleki. “Uncertainty-Aware Deep Classifiers using Generative Models.” AAAI 2020 43
  44. Roig Vilamala et. al. “A Hybrid Neuro-Symbolic Approach for Complex

    Event Processing (Extended Abstract).” In ICLP2020. Xing et. al. “Neuroplex: Learning to Detect Complex Events in Sensor Networks through Knowledge Injection.” In SenSys2020. 44
  45. NeuroPLEX Xing et. al. “Neuroplex: Learning to Detect Complex Events

    in Sensor Networks through Knowledge Injection.” In SenSys2020. 45
  46. Overture. A brief historical case. Act I. On conjectures, refutations,

    and argumentation. Act II. There is no certain datum in the world. Act III. Interesting problems are complex. Epilogue. 46
  47. None
  48. None
  49. Roig Vilamala et. al. “A Hybrid Neuro-Symbolic Approach for Complex

    Event Processing (Extended Abstract).” In ICLP2020. Xing et. al. “Neuroplex: Learning to Detect Complex Events in Sensor Networks through Knowledge Injection.” In SenSys2020. 49
  50. None
  51. Co-I S. Chakraborty IBM Research T. J. Watson • M.

    Giacomin Brescia • L. Kaplan US CCDC ARL A. Kimmig KU Leuven • S. Julier UCL • Y. McDermott-Rees Swansea • T. Norman Southampton N. Oren Aberdeen • G. Pearson UK MoD Dstl • A. Preece Cardiff • M. Şensoy Ozyegin M. Srivastava UCLA • M. Thimm Hagen • N. Tintarev Maastricht • A. Toniolo St. Andrews M. Vallati Huddersfield Intern/PhD/Post-Doc C. Allen Cardiff • A. Fanelli Brescia • L. Garcia UCLA • S. Habib UCL • C. Hougen Michigan O. Lipinski Southampton • K. Mishra US CCDC ARL • M. Roig Vilamala Cardiff • H. Rose UCL G. Pellier-Hollows Cardiff • T. Xing UCLA • T. Zanetti Cardiff 51