Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The intertwined quest for understanding biologi...

The intertwined quest for understanding biological intelligence and creating artificial intelligence

In “The intertwined quest for understanding biological intelligence and creating artificial intelligence,” Surya Ganguli maps out his vision for a new research program that seeks to, “unify the disciplines of neuroscience, psychology, cognitive science and AI.” He points to a handful of clues at confluence of these disciplines and hints at untapped sources of insight waiting to be discovered - by the observant explorer. Next Monday, at the Cognitively Informed Reading Group, we will survey some places where past treasure was found, such as temporal difference learning, wake-sleep, variational methods, memory networks and world models. We will then visit two outposts on the frontiers of computational neuroscience and social psychology where some some strange new patterns are emerging... Join us on _Monday, February 11th at 11:30am in A.14_ to take part in this quest.

Required Reading

"The intertwined quest for understanding biological intelligence and creating artificial intelligence" (Ganguli, 2018):
https://hai.stanford.edu/news/the_intertwined_quest_for_understanding_biological_intelligence_and_creating_artificial_intelligence/

Suggested Reading

Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures (Bartunov et al., 2018): https://arxiv.org/pdf/1807.04587.pdf

Machine Theory of Mind (Rabinowitz et al., 2018): https://arxiv.org/pdf/1802.07740.pdf

Breandan Considine

February 11, 2019
Tweet

More Decks by Breandan Considine

Other Decks in Research

Transcript

  1. The intertwined quest for understanding biological intelligence and creating artificial

    intelligence Surya Ganguli Presentation by Breandan Considine
  2. Revisiting some old and new ideas in intelligence Microdynamics -

    Characterizing the low level structure and function of intelligent systems Biologically plausible learning Weight transport problem and weight symmetry Macrodynamics - Characterizing the high-level behavioral traits of intelligent systems Psychology, behaviorism and the origins of RL Model-based vs. model-free learning and Themes to keep in mind: What aspects of intelligence does our model explain well? What aspects of intelligence doesn’t our model account for?
  3. Neural networks new and old How does something like this:

    Implement an algorithm like this? “The recent excitement about neural networks” (Crick, 1989)
  4. Neural networks, revisited ✔ Loosely inspired by biological neural networks

    ✔ Structural similarity of biological circuits and RNNs ✔ Growing evidence that RNNs predict cortical activity ❌ Synchronous updates are not biologically plausible How Important Is Weight Symmetry in Backpropagation? ❌ Feedback and feedforward weights must be the same ❌ Forward/backward passes use different computations ❌ Error gradients must be stored separately “The weight transport problem” (Grossberg, 1987)
  5. ✔ Random feedback weights support learning (Lillicrap, 2014) ► B+B=I

    where B+ is the Moore-Penrose PI ► Input data has zero mean and unit variance ► Forward weights are initialized to zero ► Output layer weights adapted to minimize error ✔ Feeback alignment also works in DNNs (Lillicrap, 2016) ✔ Any random matrix B is sufficient as long as on average: (equivalent to requiring and are within 90°) ✔ Learning leads to better alignment between W and B Feedback Alignment Random synaptic feedback weights support error backpropagation for deep learning (Lillicrap et al., 2016)
  6. Direct Feedback Alignment Blocked Direct Feedback Alignment, (Zarlenga & Niklasson,

    2018) ✔ Able to achieve zero training error ✔ Works in both deep, convolutional networks ✔ Competitive with backprop on CIFAR-(10, 100)? ❌ DFA Assumes there is a global feedback path? ❌ Layers do not learn until prior layers are aligned ❌ Like backprop, relies on synchronous updates ❌ Difficulty scaling to more challenging datasets ❌ Requires delivery of signed error vectors
  7. Output target is driven by the gradient of the global

    loss Roughly, this formalizes the notion of “invertibility” If fi and g i are linear mappings and g i has a random matrix, DTP is equivalent to feedback alignment. Difference Target Propagation Difference Target Propagation (Lee et al., 2014)
  8. Difference Target Propagation and SDTP Bartunov et al., 2018. Assessing

    the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures
  9. Assessing Scalability of Biologically-Motivated Deep Learning ✔ DTP avoids weight

    transport by training a distinct set of feedback connections ✔ Errors guiding weight updates are computed locally with backward activities ❌ DTP requires explicit gradient computation for learning the output layer weights ❌ Seems to have difficulty scaling to larger datasets like ImageNet, et al. ❌ Untested on CNNs, ResNets, and architectures more complicated than MLPs See also Greedy Layerwise Learning Can Scale to ImageNet, (Belilovsky, 2019) Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures (Bartunov et al., 2018)
  10. Similarity matching loss Elements of S(X) are populated by: Prediction

    loss: ✔ Competitive with SOTA on CIFAR-(10, 100), Fashion-MNIST, Kuzushiji-MNIST… ✔ Connected to Symmetric NMF, can be implemented using Hebbian learning Training Neural Networks with Local Error Signals Training Neural Networks with Local Error Signals (Nøkland et al., 2019)
  11. Reinforcement Learning and Classical Conditioning Rescorla-Wagner Model ✔ Learning only

    happens when a stimulus is unexpected ✔ Conditioned stimuli summed to form a total prediction ✔ Predicts several anomalous features of animal learning ❌ Does not account for higher-order conditioning effects ❌ Does not account for the order of events within a trial
  12. Temporal Difference Learning How do we predict a reward that

    depends on future values of some stimulus? This relation is only valid if the values are correct, otherwise there is a TD error: Substituting TD error in the RW model, we recover Sutton & Barto’s original TDL:
  13. Temporal difference learning ✔ Explains higher-order conditioning effects ✔ TD

    error predicts firing rate in VTA, SNc ✔ Can be augmented with eligibility traces ❌ Can converge to wrong value estimates ❌ How are we to get P(r|S t ) and P(S t+1 |S t )? ❌ Need to know the dynamics of world model ❌ Sensitive to the order of events
  14. ✔ Close parallels with Systems 1 and 2 thinking ►

    Thinking Fast and Thinking Slow (Kahneman, 2011) ✔ Can we model the world robustly? At what cost? ✔ Evidence that planning is independently evolved Model-based vs. Model Free RL
  15. World Models ✔ Modeling the world from an egocentric PoV

    ✔ Similarities to work in hippocampal learning ✔ Incorporates planning and “experience replay” ✔ More sample efficient than direct-RL ❌ Agent can learn to cheat in the world model ► Number of similar ideas are emerging in robotics planning (e.g. Bharadhwaj 2019, Faust 2018) Recurrent World Models Facilitate Policy Evolution (Ha & Schmidhuber, 2018)
  16. Testing for an understanding of other agents. Can ToMNet... 1.

    Characterize various species of agents by observing their behavior? 2. Predict S n+1 for various agents using its learned representation 3. Model agent species under partial observability 4. Identify false beliefs (also under POMDP) 5. Explain its theory of an agent’s internal belief ❌ Does not consider the multiagent setting ❌ ToMNet observer has full observability Machine Theory of Mind (Rabinowitz et al., 2018) Machine Theory of Mind
  17. ToM: Characterizing Agent Behavior ► Testing for an understanding of

    other agents ► Multiple species of goal directed agents ► Blind, sighted, stateless, sighted & stateless ► Agent has different partial observability constraints ► ToMNet has full observability over the gridworld Machine Theory of Mind (Rabinowitz et al., 2018)
  18. Intrinsic Social Motivation ✔ Extends ToMNet into the multi-agent RL

    setting ✔ Agents reason about each other’s behavior ✔ Game-theoretic social dilemma tasks ✔ Reward for casual influence encourages cooperation ✔ Agents learn to communicate on explicit channel ✔ Emergent communication without an explicit channel ✔ Connection to intrinsic motivation Intrinsic Social Motivation via Causal Influence in MARL (Jacques et al., 2018)
  19. What lessons have we learned? ► You get what you

    measure. ► If all we care about is performance on a single task, how useful is the model? ► DNNs alone may not be able to reproduce the full range of biological behaviors ► Intelligence is shaped by environment - experimental setup is important! ► Maybe we need a strong simulator to recreate intelligence in its natural setting ► Lacking realism, maybe biologically-derived architectures are good enough ► If so, we need to pay careful attention to the marco- and microdynamics
  20. References 1. Learning to Predict by the Methods of Temporal

    Differences (Sutton, 1988) 2. The wake-sleep algorithm for unsupervised neural networks (Hinton et al., 1995) 3. Decision theory, reinforcement learning, and the brain (Dayan & Daw, 2008) 4. Random feedback weights support learning in deep neural networks (Lillicrap et al., 2014) 5. Compositional Inductive Biases in Function Learning (Schultz et al., 2016) 6. Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net (Goyal et al., 2017) 7. What insects can tell us about the origins of consciousness (Baron & Klein, 2016) 8. A Personal Journey into Bayesian Networks (Pearl, 2018) 9. Learning to Play With Intrinsically-Motivated, Self-Aware Agents (Haber et al., 2018) 10. Intrinsic Social Motivation via Causal Influence in Multi-Agent RL (Jaques et al., 2018) 11. Training Neural Networks with Local Error Signals (Nøkland & Eidnes, 2019) 12. Neuroscience-Inspired Artificial Intelligence (Hassabis et al., 2017)