The intertwined quest for understanding biological intelligence and creating artificial intelligence

The intertwined quest for understanding biological intelligence and creating artificial
intelligence Surya Ganguli Presentation by Breandan Considine

Mathematistan, Martin Kuppe

What is computational cognitive science?

Revisiting some old and new ideas in intelligence Microdynamics -
Characterizing the low level structure and function of intelligent systems Biologically plausible learning Weight transport problem and weight symmetry Macrodynamics - Characterizing the high-level behavioral traits of intelligent systems Psychology, behaviorism and the origins of RL Model-based vs. model-free learning and Themes to keep in mind: What aspects of intelligence does our model explain well? What aspects of intelligence doesn’t our model account for?

Neural networks new and old How does something like this:
Implement an algorithm like this? “The recent excitement about neural networks” (Crick, 1989)

Neural networks, revisited ✔ Loosely inspired by biological neural networks
✔ Structural similarity of biological circuits and RNNs ✔ Growing evidence that RNNs predict cortical activity ❌ Synchronous updates are not biologically plausible How Important Is Weight Symmetry in Backpropagation? ❌ Feedback and feedforward weights must be the same ❌ Forward/backward passes use different computations ❌ Error gradients must be stored separately “The weight transport problem” (Grossberg, 1987)

✔ Random feedback weights support learning (Lillicrap, 2014) ► B+B=I
where B+ is the Moore-Penrose PI ► Input data has zero mean and unit variance ► Forward weights are initialized to zero ► Output layer weights adapted to minimize error ✔ Feeback alignment also works in DNNs (Lillicrap, 2016) ✔ Any random matrix B is sufficient as long as on average: (equivalent to requiring and are within 90°) ✔ Learning leads to better alignment between W and B Feedback Alignment Random synaptic feedback weights support error backpropagation for deep learning (Lillicrap et al., 2016)

Direct Feedback Alignment Blocked Direct Feedback Alignment, (Zarlenga & Niklasson,
2018) ✔ Able to achieve zero training error ✔ Works in both deep, convolutional networks ✔ Competitive with backprop on CIFAR-(10, 100)? ❌ DFA Assumes there is a global feedback path? ❌ Layers do not learn until prior layers are aligned ❌ Like backprop, relies on synchronous updates ❌ Difficulty scaling to more challenging datasets ❌ Requires delivery of signed error vectors

Output target is driven by the gradient of the global
loss Roughly, this formalizes the notion of “invertibility” If fi and g i are linear mappings and g i has a random matrix, DTP is equivalent to feedback alignment. Difference Target Propagation Difference Target Propagation (Lee et al., 2014)

Difference Target Propagation and SDTP Bartunov et al., 2018. Assessing
the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures

Assessing Scalability of Biologically-Motivated Deep Learning ✔ DTP avoids weight
transport by training a distinct set of feedback connections ✔ Errors guiding weight updates are computed locally with backward activities ❌ DTP requires explicit gradient computation for learning the output layer weights ❌ Seems to have difficulty scaling to larger datasets like ImageNet, et al. ❌ Untested on CNNs, ResNets, and architectures more complicated than MLPs See also Greedy Layerwise Learning Can Scale to ImageNet, (Belilovsky, 2019) Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures (Bartunov et al., 2018)

Similarity matching loss Elements of S(X) are populated by: Prediction
loss: ✔ Competitive with SOTA on CIFAR-(10, 100), Fashion-MNIST, Kuzushiji-MNIST… ✔ Connected to Symmetric NMF, can be implemented using Hebbian learning Training Neural Networks with Local Error Signals Training Neural Networks with Local Error Signals (Nøkland et al., 2019)

Reinforcement Learning and Classical Conditioning Rescorla-Wagner Model ✔ Learning only
happens when a stimulus is unexpected ✔ Conditioned stimuli summed to form a total prediction ✔ Predicts several anomalous features of animal learning ❌ Does not account for higher-order conditioning effects ❌ Does not account for the order of events within a trial

Temporal Difference Learning How do we predict a reward that
depends on future values of some stimulus? This relation is only valid if the values are correct, otherwise there is a TD error: Substituting TD error in the RW model, we recover Sutton & Barto’s original TDL:

Temporal difference learning ✔ Explains higher-order conditioning effects ✔ TD
error predicts firing rate in VTA, SNc ✔ Can be augmented with eligibility traces ❌ Can converge to wrong value estimates ❌ How are we to get P(r|S t ) and P(S t+1 |S t )? ❌ Need to know the dynamics of world model ❌ Sensitive to the order of events

✔ Close parallels with Systems 1 and 2 thinking ►
Thinking Fast and Thinking Slow (Kahneman, 2011) ✔ Can we model the world robustly? At what cost? ✔ Evidence that planning is independently evolved Model-based vs. Model Free RL

World Models ✔ Modeling the world from an egocentric PoV
✔ Similarities to work in hippocampal learning ✔ Incorporates planning and “experience replay” ✔ More sample efficient than direct-RL ❌ Agent can learn to cheat in the world model ► Number of similar ideas are emerging in robotics planning (e.g. Bharadhwaj 2019, Faust 2018) Recurrent World Models Facilitate Policy Evolution (Ha & Schmidhuber, 2018)

Recurrent World Models Facilitate Policy Evolution (Ha & Schmidhuber, 2018)

Testing for an understanding of other agents. Can ToMNet... 1.
Characterize various species of agents by observing their behavior? 2. Predict S n+1 for various agents using its learned representation 3. Model agent species under partial observability 4. Identify false beliefs (also under POMDP) 5. Explain its theory of an agent’s internal belief ❌ Does not consider the multiagent setting ❌ ToMNet observer has full observability Machine Theory of Mind (Rabinowitz et al., 2018) Machine Theory of Mind

ToM: Characterizing Agent Behavior ► Testing for an understanding of
other agents ► Multiple species of goal directed agents ► Blind, sighted, stateless, sighted & stateless ► Agent has different partial observability constraints ► ToMNet has full observability over the gridworld Machine Theory of Mind (Rabinowitz et al., 2018)

Theory of mind: examining false beliefs

Sequential Social Dilemmas

A: How would B’s action change if I had acted
differently?

Intrinsic Social Motivation ✔ Extends ToMNet into the multi-agent RL
setting ✔ Agents reason about each other’s behavior ✔ Game-theoretic social dilemma tasks ✔ Reward for casual influence encourages cooperation ✔ Agents learn to communicate on explicit channel ✔ Emergent communication without an explicit channel ✔ Connection to intrinsic motivation Intrinsic Social Motivation via Causal Influence in MARL (Jacques et al., 2018)

What lessons have we learned? ► You get what you
measure. ► If all we care about is performance on a single task, how useful is the model? ► DNNs alone may not be able to reproduce the full range of biological behaviors ► Intelligence is shaped by environment - experimental setup is important! ► Maybe we need a strong simulator to recreate intelligence in its natural setting ► Lacking realism, maybe biologically-derived architectures are good enough ► If so, we need to pay careful attention to the marco- and microdynamics

References 1. Learning to Predict by the Methods of Temporal
Differences (Sutton, 1988) 2. The wake-sleep algorithm for unsupervised neural networks (Hinton et al., 1995) 3. Decision theory, reinforcement learning, and the brain (Dayan & Daw, 2008) 4. Random feedback weights support learning in deep neural networks (Lillicrap et al., 2014) 5. Compositional Inductive Biases in Function Learning (Schultz et al., 2016) 6. Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net (Goyal et al., 2017) 7. What insects can tell us about the origins of consciousness (Baron & Klein, 2016) 8. A Personal Journey into Bayesian Networks (Pearl, 2018) 9. Learning to Play With Intrinsically-Motivated, Self-Aware Agents (Haber et al., 2018) 10. Intrinsic Social Motivation via Causal Influence in Multi-Agent RL (Jaques et al., 2018) 11. Training Neural Networks with Local Error Signals (Nøkland & Eidnes, 2019) 12. Neuroscience-Inspired Artificial Intelligence (Hassabis et al., 2017)

The intertwined quest for understanding biologi...

The intertwined quest for understanding biological intelligence and creating artificial intelligence

Breandan Considine

More Decks by Breandan Considine

Other Decks in Research

Featured

Transcript