OpenTalks.AI

Universality and efficient inference: a stumbling block of all approaches
to AGI Prof. Alexey Potapov ITMO University, SingularityNET 2018 OpenTalks.ai @ Moscow

Subfields and approaches 2 • Deep Learning • Cognitive Architectures
• Probabilistic Models • Universal Algorithmic Intelligence • Reinforcement Learning

Subfields and approaches 3 • Deep Learning • Cognitive Architectures
• Probabilistic Models • Universal Algorithmic Intelligence • Reinforcement Learning Where is a key to AGI?

Just an example: agi.io 4 https://agi.io/2015/12/22/how-to-build-a-general-intelligence-circuits-and-pathways/

Just an example: agi.io 5 But wait, how and why?

6 Extremely simple system

Can neurophysiologists infer this? 7     
          i z z x w c x b W W i i j j ij i j j j T T e e Z e e Z e Z P P ) ( ) ( 1 1 1 ) , ( ) ( z z x c bx z z x cz bx z z x x  P(x)  1 Z ebj xj 1  e ci  wij xj j        i  j   ln P(x z,)   ln eE (x,z ) z   ln eE(x,z ) x,z     1 eE (x,z ) z  eE(x,z ) E(x,z)  z   1 eE (x,z ) x,z  eE (x,z ) E(x,z)  x,z     P(z x,) E(x,z)  z   P(x,z) E(x,z)  x,z  ... ...

Can neurophysiologists infer this? 8     
          i z z x w c x b W W i i j j ij i j j j T T e e Z e e Z e Z P P ) ( ) ( 1 1 1 ) , ( ) ( z z x c bx z z x cz bx z z x x  P(x)  1 Z ebj xj 1  e ci  wij xj j        i  j   ln P(x z,)   ln eE (x,z ) z   ln eE(x,z ) x,z     1 eE (x,z ) z  eE(x,z ) E(x,z)  z   1 eE (x,z ) x,z  eE (x,z ) E(x,z)  x,z     P(z x,) E(x,z)  z   P(x,z) E(x,z)  x,z  ... ... No way

Another example: Sigma 9 Rosenbloom, P., Demski, A. & Ustun,
V. (2017). The Sigma Cognitive Architecture and System: Towards Functionally Elegant Grand Unification. Journal of Artificial General Intelligence, 7(1), pp. 1-103. • Graphical Architecture Hypothesis • Four desiderata: • grand unification • generic cognition • functional elegance • sufficient efficiency • Deconstruction of all cognitive functions with the use of factor-graphs as a general cognitive firmware Cool…

Another example: Sigma 10 Rosenbloom, P., Demski, A. & Ustun,
V. (2017). The Sigma Cognitive Architecture and System: Towards Functionally Elegant Grand Unification. Journal of Artificial General Intelligence, 7(1), pp. 1-103. • Graphical Architecture Hypothesis • Four desiderata: • grand unification • generic cognition • functional elegance • sufficient efficiency • Deconstruction of all cognitive functions with the use of factor-graphs as a general cognitive firmware Cool… but wait, what problems are solvable?

What do we want? 11 • Intelligence measures an agent’s
ability to achieve goals in a wide range of environments given limited with insufficient knowledge and resources (B. Goertzel, P. Wang, M. Hutter, etc.) • Let’s take this definition seriously and ask what is done within different approaches to achieve this?

Universal Algorithmic Intelligence: Solomonoff Induction 12 • Universal priors μ
– programs (binary strings) for Universal Turing Machine • Marginal probability • Prediction • Optimal prediction for any (computable) data source • No “no free lunch theorem”!  MU (x)  2l() :U()x*   P()  2l()  MU(y x)  MU(xy)/MU(x)   ) ( 2 2 ln ) | 1 ( ) | 1 ( 1 2 : 1 1 : 1 1 Q K x x P x x Q U n i i i U i i Q                Convergence!

Universal Induction + RL 13 • AIXI as an optimal
universal intelligence  Strict statement of the task for general intelligence  ()  2K ()V     V    Rt t 1     • Universal Intelligence Quantity Legg S. Machine Super Intelligence. Department of Informatics, University of Lugano (2008)

Universal Induction + RL 14 • AIXI as an optimal
universal intelligence  Strict statement of the task for general intelligence  ()  2K ()V     V    Rt t 1     • Universal Intelligence Quantity Legg S. Machine Super Intelligence. Department of Informatics, University of Lugano (2008) • Heavily criticized • Practically impossible • not related to natural intelligence • which is not that universal

Universal Algorithmic Intelligence 15 Narrow AI AGI AIXI XIX automata
Computers Universal TM • If any approach doesn’t even try address the problem of efficient universal induction in mathematical/ technical notions, it is most likely doomed to result in yet another narrow AI

Example: Deep Learning 16 •Turing-incomplete (cannot represent arbitrary regularities) +
Discriminative models  • Weak generalization • Require large training sets; no one-shot learning • Cannot learn invariants • Vulnerable to Adversarial examples • Difficulties with transfer and unsupervised learning + From AGI perspective • Encode higher-order statistics, but not causal, logical, spatio-temporal relations • Bad in high-level reasoning and planning, etc. Images from: Szegedy, C. et al. Intriguing properties of neural networks. arXiv 1312.6199 (2013). Gary Marcus. Keynote @ AGI-16

Is DL that bad? 17 Image from: https://deepmind.com/blog/differentiable-neural-computers/ • RNN
instead of finite state machine • External memory with soft addressing • End-to-end differentiable algorithms • Neural differentiable computer, Neural GPU, Neural programmer-interpreter, Differentiable Forth interpreter, etc. • Memory augmented NNs, including deep RL

Is DL that bad? 18 Image from: https://deepmind.com/blog/differentiable-neural-computers/ • RNN
instead of finite state machine • External memory with soft addressing • End-to-end differentiable algorithms • Neural differentiable computer, Neural GPU, Neural programmer-interpreter, Differentiable Forth interpreter, etc. • Memory augmented NNs, including deep RL • Apparent trend towards universal induction within DL

Is DL that bad? 19 • RNN instead of finite
state machine • External memory with soft addressing • End-to-end differentiable algorithms • Neural differentiable computer, Neural GPU, Neural programmer-interpreter, Differentiable Forth interpreter, etc. • Memory augmented NNs, including deep RL • Apparent trend towards universal induction within DL • But gradient descent is not enough to learn algorithms

What’s about probabilistic models? 20 • Graphical models in computer
vision, knowledge representations, etc. • Probabilistic programming • Probabilistic models of cognition Images from: Mansinghka, V., Kulkarni, T., Perov, Y., Tenenbaum, J.: Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs. Advances in NIPS, arXiv:1307.0060 [cs.AI] (2013).

Learning Probabilistic Programs 21 https://arxiv.org/pdf/1407.2646v1.pdf • Higher-order PPLs allow for
learning probabilistic programs from data by means of probabilistic programs (while learning of graphical models cannot be expressed in terms of graphical models) • Probabilistic Programming implements a form of universal induction

Learning Probabilistic Programs 22 https://arxiv.org/pdf/1407.2646v1.pdf • Higher-order PPLs allow for
learning probabilistic programs from data by means of probabilistic programs (while learning of graphical models cannot be expressed in terms of graphical models) • Probabilistic Programming implements a form of universal induction • MCMC inference is not scalable enough

More efficient universal induction 23 • Reference machine optimization •
Prior choice of the reference machine (e.g. full-fledged programming language) • Incremental learning • Search method improvement • Genetic programming • HSearch • Incremental self-improvement (Gödel machine)

More efficient universal induction 24 • Reference machine optimization •
Prior choice of the reference machine (e.g. full-fledged programming language) • Incremental learning • Search method improvement • Genetic programming • HSearch • Incremental self-improvement (Gödel machine) • Still not enough

Metacomputations in Universal Intelligence 25 • Program specialization = construction
of its efficient projection on one of its parameters • E.g. specialized interpreter w.r.t. program = compiled program • Specialized specializer w.r.t. interpreter = compiler • Specialized universal induction w.r.t. Turing-incomplete reference machine = narrow machine learning method • Specialized MCMC w.r.t. generative model = discriminative model Khudobakhshov V. Metacomputations and program-based knowledge representation // AGI-13 Potapov A. Rodionov S. Making universal induction efficient by specialization // AGI-14

Example: ‘Compilation’ of PP into DNN 26 • A generative
model specified as a probabilistic program is ‘compiled’ into a discriminative model specified as a neural networks https://arxiv.org/abs/1610.09900

Metacomputations in Universal Intelligence 27 • Still not enough 
Partial specialization • Discriminative models are not always possible • But we can do much better than blind or metaheuristic search • E.g. genetic algorithms with data-guided trainable crossover Potapov A. Rodionov S. Genetic Algorithms with DNN-Based Trainable Crossover as an Example of Partial Specialization of General Search // Proc. Artificial General Intelligence, AGI’17. P. 101-111.

Meta-learning with DNNs as an example 28 • A neural
network that embeds its own meta-levels • Learning to learn using gradient descent • Learning to learn by gradient descent by gradient descent • Learning to reinforcement learn • RL2: Fast Reinforcement Learning via Slow Reinforcement Learning • Meta-Learning with Memory-Augmented Neural Networks • Designing Neural Network Architectures using Reinforcement Learning • … https://arxiv.org/pdf/1611.02167.pdf

Approaching AGI 29

One approach to AGI 30 • Extended probabilistic programming language:
• Probabilistic programs as generative models (basic) • Representation of discriminative models (available) • Self-referential interpreter with controllable inference  A cognitive architecture with knowledge management do deal with learnt domain-dependent specialized models • OpenCog • Cognitive architecture with Turing-complete knowledge representation  OpenCoggy probabilistic programming with inference meta-learning extended with deep learning models https://wiki.opencog.org/w/OpenCoggy_Probabilistic_Programming https://blog.opencog.org/2017/10/14/inference-meta-learning-part-i/ https://github.com/opencog/semantic-vision/wiki/About-the-SynerGAN-architecture

31 Thank you for attention! Contact: [email protected]

OpenTalks.AI - Алексей Потапов, Universality an...

OpenTalks.AI - Алексей Потапов, Universality and efficient inference: a stumbling block of all approaches to AGI

More Decks by OpenTalks.AI

Other Decks in Business

Featured

Transcript

Universality and efficient inference: a stumbling block of all approaches

Subfields and approaches 2 • Deep Learning • Cognitive Architectures

Subfields and approaches 3 • Deep Learning • Cognitive Architectures

Just an example: agi.io 4 https://agi.io/2015/12/22/how-to-build-a-general-intelligence-circuits-and-pathways/

Just an example: agi.io 5 But wait, how and why?

6 Extremely simple system

Can neurophysiologists infer this? 7     

Can neurophysiologists infer this? 8     

Another example: Sigma 9 Rosenbloom, P., Demski, A. & Ustun,

Another example: Sigma 10 Rosenbloom, P., Demski, A. & Ustun,

What do we want? 11 • Intelligence measures an agent’s

Universal Algorithmic Intelligence: Solomonoff Induction 12 • Universal priors μ

Universal Induction + RL 13 • AIXI as an optimal

Universal Induction + RL 14 • AIXI as an optimal

Universal Algorithmic Intelligence 15 Narrow AI AGI AIXI XIX automata

Example: Deep Learning 16 •Turing-incomplete (cannot represent arbitrary regularities) +

Is DL that bad? 17 Image from: https://deepmind.com/blog/differentiable-neural-computers/ • RNN

Is DL that bad? 18 Image from: https://deepmind.com/blog/differentiable-neural-computers/ • RNN

Is DL that bad? 19 • RNN instead of finite

What’s about probabilistic models? 20 • Graphical models in computer

Learning Probabilistic Programs 21 https://arxiv.org/pdf/1407.2646v1.pdf • Higher-order PPLs allow for

Learning Probabilistic Programs 22 https://arxiv.org/pdf/1407.2646v1.pdf • Higher-order PPLs allow for

More efficient universal induction 23 • Reference machine optimization •

More efficient universal induction 24 • Reference machine optimization •

Metacomputations in Universal Intelligence 25 • Program specialization = construction

Example: ‘Compilation’ of PP into DNN 26 • A generative

Metacomputations in Universal Intelligence 27 • Still not enough 

Meta-learning with DNNs as an example 28 • A neural

Approaching AGI 29

One approach to AGI 30 • Extended probabilistic programming language:

31 Thank you for attention! Contact: [email protected]