Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OpenTalks.AI - Алексей Потапов, Universality and efficient inference: a stumbling block of all approaches to AGI

OpenTalks.AI - Алексей Потапов, Universality and efficient inference: a stumbling block of all approaches to AGI

OpenTalks.AI

March 01, 2018
Tweet

More Decks by OpenTalks.AI

Other Decks in Business

Transcript

  1. Universality and efficient inference: a stumbling block of all approaches

    to AGI Prof. Alexey Potapov ITMO University, SingularityNET 2018 OpenTalks.ai @ Moscow
  2. Subfields and approaches 2 • Deep Learning • Cognitive Architectures

    • Probabilistic Models • Universal Algorithmic Intelligence • Reinforcement Learning
  3. Subfields and approaches 3 • Deep Learning • Cognitive Architectures

    • Probabilistic Models • Universal Algorithmic Intelligence • Reinforcement Learning Where is a key to AGI?
  4. Can neurophysiologists infer this? 7     

              i z z x w c x b W W i i j j ij i j j j T T e e Z e e Z e Z P P ) ( ) ( 1 1 1 ) , ( ) ( z z x c bx z z x cz bx z z x x  P(x)  1 Z ebj xj 1  e ci  wij xj j        i  j   ln P(x z,)   ln eE (x,z ) z   ln eE(x,z ) x,z     1 eE (x,z ) z  eE(x,z ) E(x,z)  z   1 eE (x,z ) x,z  eE (x,z ) E(x,z)  x,z     P(z x,) E(x,z)  z   P(x,z) E(x,z)  x,z  ... ...
  5. Can neurophysiologists infer this? 8     

              i z z x w c x b W W i i j j ij i j j j T T e e Z e e Z e Z P P ) ( ) ( 1 1 1 ) , ( ) ( z z x c bx z z x cz bx z z x x  P(x)  1 Z ebj xj 1  e ci  wij xj j        i  j   ln P(x z,)   ln eE (x,z ) z   ln eE(x,z ) x,z     1 eE (x,z ) z  eE(x,z ) E(x,z)  z   1 eE (x,z ) x,z  eE (x,z ) E(x,z)  x,z     P(z x,) E(x,z)  z   P(x,z) E(x,z)  x,z  ... ... No way
  6. Another example: Sigma 9 Rosenbloom, P., Demski, A. & Ustun,

    V. (2017). The Sigma Cognitive Architecture and System: Towards Functionally Elegant Grand Unification. Journal of Artificial General Intelligence, 7(1), pp. 1-103. • Graphical Architecture Hypothesis • Four desiderata: • grand unification • generic cognition • functional elegance • sufficient efficiency • Deconstruction of all cognitive functions with the use of factor-graphs as a general cognitive firmware Cool…
  7. Another example: Sigma 10 Rosenbloom, P., Demski, A. & Ustun,

    V. (2017). The Sigma Cognitive Architecture and System: Towards Functionally Elegant Grand Unification. Journal of Artificial General Intelligence, 7(1), pp. 1-103. • Graphical Architecture Hypothesis • Four desiderata: • grand unification • generic cognition • functional elegance • sufficient efficiency • Deconstruction of all cognitive functions with the use of factor-graphs as a general cognitive firmware Cool… but wait, what problems are solvable?
  8. What do we want? 11 • Intelligence measures an agent’s

    ability to achieve goals in a wide range of environments given limited with insufficient knowledge and resources (B. Goertzel, P. Wang, M. Hutter, etc.) • Let’s take this definition seriously and ask what is done within different approaches to achieve this?
  9. Universal Algorithmic Intelligence: Solomonoff Induction 12 • Universal priors μ

    – programs (binary strings) for Universal Turing Machine • Marginal probability • Prediction • Optimal prediction for any (computable) data source • No “no free lunch theorem”!  MU (x)  2l() :U()x*   P()  2l()  MU(y x)  MU(xy)/MU(x)   ) ( 2 2 ln ) | 1 ( ) | 1 ( 1 2 : 1 1 : 1 1 Q K x x P x x Q U n i i i U i i Q                Convergence!
  10. Universal Induction + RL 13 • AIXI as an optimal

    universal intelligence  Strict statement of the task for general intelligence  ()  2K ()V     V    Rt t 1     • Universal Intelligence Quantity Legg S. Machine Super Intelligence. Department of Informatics, University of Lugano (2008)
  11. Universal Induction + RL 14 • AIXI as an optimal

    universal intelligence  Strict statement of the task for general intelligence  ()  2K ()V     V    Rt t 1     • Universal Intelligence Quantity Legg S. Machine Super Intelligence. Department of Informatics, University of Lugano (2008) • Heavily criticized • Practically impossible • not related to natural intelligence • which is not that universal
  12. Universal Algorithmic Intelligence 15 Narrow AI AGI AIXI XIX automata

    Computers Universal TM • If any approach doesn’t even try address the problem of efficient universal induction in mathematical/ technical notions, it is most likely doomed to result in yet another narrow AI
  13. Example: Deep Learning 16 •Turing-incomplete (cannot represent arbitrary regularities) +

    Discriminative models  • Weak generalization • Require large training sets; no one-shot learning • Cannot learn invariants • Vulnerable to Adversarial examples • Difficulties with transfer and unsupervised learning + From AGI perspective • Encode higher-order statistics, but not causal, logical, spatio-temporal relations • Bad in high-level reasoning and planning, etc. Images from: Szegedy, C. et al. Intriguing properties of neural networks. arXiv 1312.6199 (2013). Gary Marcus. Keynote @ AGI-16
  14. Is DL that bad? 17 Image from: https://deepmind.com/blog/differentiable-neural-computers/ • RNN

    instead of finite state machine • External memory with soft addressing • End-to-end differentiable algorithms • Neural differentiable computer, Neural GPU, Neural programmer-interpreter, Differentiable Forth interpreter, etc. • Memory augmented NNs, including deep RL
  15. Is DL that bad? 18 Image from: https://deepmind.com/blog/differentiable-neural-computers/ • RNN

    instead of finite state machine • External memory with soft addressing • End-to-end differentiable algorithms • Neural differentiable computer, Neural GPU, Neural programmer-interpreter, Differentiable Forth interpreter, etc. • Memory augmented NNs, including deep RL • Apparent trend towards universal induction within DL
  16. Is DL that bad? 19 • RNN instead of finite

    state machine • External memory with soft addressing • End-to-end differentiable algorithms • Neural differentiable computer, Neural GPU, Neural programmer-interpreter, Differentiable Forth interpreter, etc. • Memory augmented NNs, including deep RL • Apparent trend towards universal induction within DL • But gradient descent is not enough to learn algorithms
  17. What’s about probabilistic models? 20 • Graphical models in computer

    vision, knowledge representations, etc. • Probabilistic programming • Probabilistic models of cognition Images from: Mansinghka, V., Kulkarni, T., Perov, Y., Tenenbaum, J.: Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs. Advances in NIPS, arXiv:1307.0060 [cs.AI] (2013).
  18. Learning Probabilistic Programs 21 https://arxiv.org/pdf/1407.2646v1.pdf • Higher-order PPLs allow for

    learning probabilistic programs from data by means of probabilistic programs (while learning of graphical models cannot be expressed in terms of graphical models) • Probabilistic Programming implements a form of universal induction
  19. Learning Probabilistic Programs 22 https://arxiv.org/pdf/1407.2646v1.pdf • Higher-order PPLs allow for

    learning probabilistic programs from data by means of probabilistic programs (while learning of graphical models cannot be expressed in terms of graphical models) • Probabilistic Programming implements a form of universal induction • MCMC inference is not scalable enough
  20. More efficient universal induction 23 • Reference machine optimization •

    Prior choice of the reference machine (e.g. full-fledged programming language) • Incremental learning • Search method improvement • Genetic programming • HSearch • Incremental self-improvement (Gödel machine)
  21. More efficient universal induction 24 • Reference machine optimization •

    Prior choice of the reference machine (e.g. full-fledged programming language) • Incremental learning • Search method improvement • Genetic programming • HSearch • Incremental self-improvement (Gödel machine) • Still not enough
  22. Metacomputations in Universal Intelligence 25 • Program specialization = construction

    of its efficient projection on one of its parameters • E.g. specialized interpreter w.r.t. program = compiled program • Specialized specializer w.r.t. interpreter = compiler • Specialized universal induction w.r.t. Turing-incomplete reference machine = narrow machine learning method • Specialized MCMC w.r.t. generative model = discriminative model Khudobakhshov V. Metacomputations and program-based knowledge representation // AGI-13 Potapov A. Rodionov S. Making universal induction efficient by specialization // AGI-14
  23. Example: ‘Compilation’ of PP into DNN 26 • A generative

    model specified as a probabilistic program is ‘compiled’ into a discriminative model specified as a neural networks https://arxiv.org/abs/1610.09900
  24. Metacomputations in Universal Intelligence 27 • Still not enough 

    Partial specialization • Discriminative models are not always possible • But we can do much better than blind or metaheuristic search • E.g. genetic algorithms with data-guided trainable crossover Potapov A. Rodionov S. Genetic Algorithms with DNN-Based Trainable Crossover as an Example of Partial Specialization of General Search // Proc. Artificial General Intelligence, AGI’17. P. 101-111.
  25. Meta-learning with DNNs as an example 28 • A neural

    network that embeds its own meta-levels • Learning to learn using gradient descent • Learning to learn by gradient descent by gradient descent • Learning to reinforcement learn • RL2: Fast Reinforcement Learning via Slow Reinforcement Learning • Meta-Learning with Memory-Augmented Neural Networks • Designing Neural Network Architectures using Reinforcement Learning • … https://arxiv.org/pdf/1611.02167.pdf
  26. One approach to AGI 30 • Extended probabilistic programming language:

    • Probabilistic programs as generative models (basic) • Representation of discriminative models (available) • Self-referential interpreter with controllable inference  A cognitive architecture with knowledge management do deal with learnt domain-dependent specialized models • OpenCog • Cognitive architecture with Turing-complete knowledge representation  OpenCoggy probabilistic programming with inference meta-learning extended with deep learning models https://wiki.opencog.org/w/OpenCoggy_Probabilistic_Programming https://blog.opencog.org/2017/10/14/inference-meta-learning-part-i/ https://github.com/opencog/semantic-vision/wiki/About-the-SynerGAN-architecture