On the Rational Boundedness of Cognitive Control: Independent vs. Interactive Parallelism (Cosyne 2019)

Sebastian Musslick Princeton Neuroscience Institute Cosyne 2019 - Continual Learning
in Biological and Artificial Neural Networks Slides available at: https://speakerdeck.com/musslick

Cognitive control – reconfigure information processing away from default (automatic)
settings (Cohen et al., 1990; Botvinick & Cohen, 2015) read email follow talk

Cognitive control is limited (Posner & Snyder, 1975; Shiffrin &
Schneider, 1977)

read email follow talk Capacity constraints on control allocation to
multiple tasks Cognitive control is limited (Posner & Snyder, 1975; Shiffrin & Schneider, 1977)

Bounds of cognitive control are… § A defining feature of
cognitive control (Posner & Snyder, 1997; Shiffrin & Schneider, 1977) § A premise of general theories of cognition § ACT-R (Anderson, 1986; 2013) § EPIC (Meyer & Kieras, 1997) § SOAR (Laird, 2012) § Multi-Threaded Cognition (Salvucci & Taatgen, 2008, 2010) § Bounded Rationality (Simon, 1957) § An explanatory variable in recent models of control allocation § Opportunity Cost Model (Kurzban, Duckworth, Kable & Myers, 2013) § Expected Value of Control Theory (Shenhav, Botvinick & Cohen, 2013; Musslick, Shenhav, Botvinick & Cohen, 2015) § Value of Computation (Lieder & Griffiths, 2015; Lieder, Shenhav, Musslick & Griffiths, 2018)

Bounds of cognitive control are… § A defining feature of
cognitive control (Posner & Snyder, 1997; Shiffrin & Schneider, 1977) § A premise of general theories of cognition § ACT-R (Anderson, 1986; 2013) § EPIC (Meyer & Kieras, 1997) § SOAR (Laird, 2012) § Multi-Threaded Cognition (Salvucci & Taatgen, 2008, 2010) § An explanatory variable in recent models of control allocation § Opportunity Cost Model (Kurzban, Duckworth, Kable & Myers, 2013) § Expected Value of Control Theory (Shenhav, Botvinick & Cohen, 2013; Musslick, Shenhav, Botvinick & Cohen, 2015) § Value of Computation (Lieder & Griffiths, 2015; Lieder, Shenhav, Musslick & Griffiths, 2018) Structural limitations? Metabolic constraints?

Shared Representations Separate Representations

Under which conditions can we multitask?

Name the color of the following stimulus and, at the
same time, point to where it is… BROWN

YELLOW

point left if the written word is RED point right
if the written word is GREEN RED

Name the color of the following stimulus and, at the
same time: point left if the written word is RED point right if the written word is GREEN RED

Accuracy Results Color Naming + Location Pointing Word Mapping Color
Naming + Word Mapping Accuracy by Task Task Percent Correct (%) 0 20 40 60 80 100 74 87 5.1 (first part) (second part) (third part) Anne Mennen Abigail Novick

(Cohen et al., 1990 ; Feng et al., 2014) verbal
manual response color word location stimulus internal (hidden) representation

hidden control signal output control signal color word location verbal
manual (Cohen et al., 1990 ; Feng et al., 2014) stimulus internal (hidden) representation response

hidden control signal (Cohen et al., 1990 ; Feng et
al., 2014) stimulus internal (hidden) representation response color word location output control signal verbal manual

al., 2014) stimulus internal (hidden) representation response color word location output control signal verbal manual multitasking is possible

al., 2014) stimulus internal (hidden) representation response color word location output control signal verbal manual multitasking is not possible

al., 2014) stimulus internal (hidden) representation response color word location output control signal verbal manual purpose of cognitive control is to limit interference

hidden control signal output control signal color word location verbal
manual stimulus internal (hidden) representation response output representations task (internal) input representations

What is the maximum number of tasks that the network
can perform in parallel without interference?

bipartite task graph a b c Task Dependencies dependency graph
a b c

bipartite task graph dependency graph a b c a b
c Task Dependencies a b c a b c a b c a b c

bipartite task graph dependency graph Task Dependencies a b c
a b c a b c a b c

Parallel Processing Capability Maximum amount of tasks that can be
performed in independently? a b c d e f g h i j bipartite task graph a b c j i d h g e f dependency graph

Parallel Processing Capability An independent vertex set of a graph
G is a subset of the vertices such that no two vertices in the subset are connected by an edge of G. A maximum independent vertex set is an independent vertex set containing the largest possible number of vertices for a given graph. a j g Maximum amount of tasks that can be performed in independently? dependency graph (Musslick, Özcimder, Dey, Patwary, Willke & Cohen, CogSci, 2016)

Parallel Processing Capability An independent vertex set of a graph
G is a subset of the vertices such that no two vertices in the subset are connected by an edge of G. A maximum independent vertex set is an independent vertex set containing the largest possible number of vertices for a given graph. Maximum amount of tasks that can be performed in independently? bipartite task graph dependency graph a b c d e f g h i j a j g (Musslick, Özcimder, Dey, Patwary, Willke & Cohen, CogSci, 2016)

Parallel processing capacity decreases as a function of § Overlap
between task processing pathways (Feng et al., 2014; Musslick et al, 2016)

= 2log('() (* 60 40 20 0 ( * Network
Size (Alon, Reichman, Shinkar, Wagner, Musslick, Cohen, Griffiths, Dey, Ozcimder, NIPS, 2017) Parallel processing capacity decreases as a function of § Network depth Maximum Induced Path

0 0.01 0.02 0.03 0.04 Mean Squared Error Multitasking Error
(Mean Squared Error) (Musslick, Özcimder, Dey, Patwary, Willke & Cohen, CogSci, 2016) 1 2 3 n …  x t … task graph trained neural network extract based on single task representations predict performance for all possible multitasking combinations Independent Dependent

Why Shared Representations? § Multi-Task Learning: Improved Learning Efficiency &
Generalization Performance (e.g. Baxter, 1995; Caruana, 1997; Collobert & Weston, 2008; Bengio et al., 2013) shared intermediate representation auxilliary task 1 output auxilliary task 2 output primary task output …

!" !# $# $" hidden gating signal output gating signal

!" #" #$ hidden gating signal output gating signal 2
x training signal !$

1 task executable at a time hidden gating signal output
gating signal !" !# $# $"

!" #" #$ hidden gating signal output gating signal 1
x training signal !$

!" #" #$ hidden gating signal output gating signal !$

2 tasks executable at a time !" !# output gating
signal $" hidden gating signal $#

∝ multitasking capacity # of tasks sharing an input dimension
Andrew Saxe "# "$ %$ %# hidden gating signal output gating signal (Musslick, Saxe, Dey, Özcimder, Henselman & Cohen, CogSci, 2017) iterations to learn2

Neural Network Simulation stimulus stimulus !" !# !$ %" %#
%$ internal (hidden) representation task … stimulus grouped into input dimensions output grouped into response dimensions

Neural Network Simulation stimulus !" !# tasks rely on different
input features

Neural Network Simulation !" !# tasks rely on different Input
features !" !# tasks rely on the same Input features !" !# tasks rely on partially overlapping input features 0 0.5 1 Learned Task Correlation 50 55 60 65 70 75 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Feature Overlap 0 0.5 1 Learned Task Correlation 50 55 60 65 70 75 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Feature Overlap correlation between task features (Musslick, Saxe, Dey, Özcimder, Henselman & Cohen, CogSci, 2017)

Neural Network Simulation 70 75 80 85 Iterations Required To
Train 42 44 46 48 50 52 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Initial Task Correlation 70 75 80 85 Iterations Required To Train 42 44 46 48 50 52 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Initial Task Correlation Initial Task Correlation … a b c j …  x t (Musslick, Saxe, Dey, Özcimder, Henselman & Cohen, CogSci, 2017)

Neural Network Simulation 70 75 80 85 Iterations Required To
Train 42 44 46 48 50 52 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Initial Task Correlation Initial Task Correlation 60 70 80 90 100 110 120 Iterations Required To Train 40 50 60 70 80 90 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Initial Task Similarity 100% Feature Overlap 80% Feature Overlap 0% Feature Overlap (Musslick, Saxe, Dey, Özcimder, Henselman & Cohen, CogSci, 2017)

(Sagiv, Musslick, Niv & Cohen, CogSci, 2018) Separating representations P(separate
representations) is optimal if a) Time cost for serialization is high b) There is little benefit for shared representations in terms of learning efficiency P (separate representations)

I. Increased multitasking capacity due to automatization ('reflex action'), not
due to enhanced mental capacity II. Multitasking capability must be domain-specific Thea Alba

Multidimensional Scaling

Multitasking Training Study color word location verbal manual Abigail Novick

Multitasking Training Study color word location verbal manual Abigail Novick
word reading word pointing

∝ multitasking capacity iterations to learn2 Ø Shared representations promote
learning efficiency Ø Parallel processing capacity drops precipitously with the amount of shared representation and can be virtually invariant to network size Tradeoff Ø Improvements in parallel processing performance achieved by pattern separation

Jonathan Cohen Ted Willke & many others Biswadip Dey Kayhan
Ozcimder Andrew Saxe Abigail Novick Anne Mennen Penina Krieger Yotam Sagiv Sachin Ravi Daniel Reichman Giovanni Petri Thank you!

Slides available at: https://speakerdeck.com/musslick

On the Rational Boundedness of Cognitive Contro...

On the Rational Boundedness of Cognitive Control: Independent vs. Interactive Parallelism (Cosyne 2019)

More Decks by Sebastian Musslick

Other Decks in Research

Featured

Transcript