Upgrade to Pro — share decks privately, control downloads, hide ads and more …

On the Rational Boundedness of Cognitive Control: Independent vs. Interactive Parallelism (Cosyne 2019)

On the Rational Boundedness of Cognitive Control: Independent vs. Interactive Parallelism (Cosyne 2019)

One of the most compelling characteristics of controlled processing is our limitation to exercise it. These limitations form one of the most basic and influential tenets of cognitive psychology: controlled processing relies on a central, limited capacity processing mechanism that imposes seriality on control-dependent processes. In the first part of this talk, I present a challenge to this view that distinguishes control-dependent and automatic processing by their reliance on shared vs. separated (task-dedicated) representations. Specifically, I propose that control functions to avert conflicting use of representations shared by multiple processes. That is, constraints on the use of control arise as a rational response to the shared use of representations, rather than from the control mechanism itself. I use graph-theoretic methods to formalize this theory, and show that multitasking capability of a network architecture drops precipitously with an increase in shared representations, and is virtually invariant to network size. This raises an important question: insofar as shared representation introduces the risk of cross-talk and thereby limitations in multitasking, why would the brain prefer shared task representations over separate ones? In computational simulations and behavioral experiments I demonstrate a tradeoff between learning efficiency, promoted by shared representations, and multitasking capability, best achieved via separated representations. The commonly-observed trajectory from controlled to automatic processing during learning may therefore reflect a rational optimization of this tradeoff: shared representations initially afford a bias toward efficient learning in novel task environments at the expense of seriality and control-dependence; but experience in environments where multitasking affords sufficient advantage ultimately promotes acquisition of separated, task-dedicated representations.

Sebastian Musslick

March 04, 2019
Tweet

More Decks by Sebastian Musslick

Other Decks in Research

Transcript

  1. Sebastian Musslick Princeton Neuroscience Institute Cosyne 2019 - Continual Learning

    in Biological and Artificial Neural Networks Slides available at: https://speakerdeck.com/musslick
  2. Cognitive control – reconfigure information processing away from default (automatic)

    settings (Cohen et al., 1990; Botvinick & Cohen, 2015) read email follow talk
  3. read email follow talk Capacity constraints on control allocation to

    multiple tasks Cognitive control is limited (Posner & Snyder, 1975; Shiffrin & Schneider, 1977)
  4. Bounds of cognitive control are… § A defining feature of

    cognitive control (Posner & Snyder, 1997; Shiffrin & Schneider, 1977) § A premise of general theories of cognition § ACT-R (Anderson, 1986; 2013) § EPIC (Meyer & Kieras, 1997) § SOAR (Laird, 2012) § Multi-Threaded Cognition (Salvucci & Taatgen, 2008, 2010) § Bounded Rationality (Simon, 1957) § An explanatory variable in recent models of control allocation § Opportunity Cost Model (Kurzban, Duckworth, Kable & Myers, 2013) § Expected Value of Control Theory (Shenhav, Botvinick & Cohen, 2013; Musslick, Shenhav, Botvinick & Cohen, 2015) § Value of Computation (Lieder & Griffiths, 2015; Lieder, Shenhav, Musslick & Griffiths, 2018)
  5. Bounds of cognitive control are… § A defining feature of

    cognitive control (Posner & Snyder, 1997; Shiffrin & Schneider, 1977) § A premise of general theories of cognition § ACT-R (Anderson, 1986; 2013) § EPIC (Meyer & Kieras, 1997) § SOAR (Laird, 2012) § Multi-Threaded Cognition (Salvucci & Taatgen, 2008, 2010) § An explanatory variable in recent models of control allocation § Opportunity Cost Model (Kurzban, Duckworth, Kable & Myers, 2013) § Expected Value of Control Theory (Shenhav, Botvinick & Cohen, 2013; Musslick, Shenhav, Botvinick & Cohen, 2015) § Value of Computation (Lieder & Griffiths, 2015; Lieder, Shenhav, Musslick & Griffiths, 2018) Structural limitations? Metabolic constraints?
  6. Name the color of the following stimulus and, at the

    same time, point to where it is… BROWN
  7. point left if the written word is RED point right

    if the written word is GREEN RED
  8. RED

  9. RED

  10. Name the color of the following stimulus and, at the

    same time: point left if the written word is RED point right if the written word is GREEN RED
  11. RED

  12. Accuracy Results Color Naming + Location Pointing Word Mapping Color

    Naming + Word Mapping Accuracy by Task Task Percent Correct (%) 0 20 40 60 80 100 74 87 5.1 (first part) (second part) (third part) Anne Mennen Abigail Novick
  13. (Cohen et al., 1990 ; Feng et al., 2014) verbal

    manual response color word location stimulus internal (hidden) representation
  14. (Cohen et al., 1990 ; Feng et al., 2014) verbal

    manual response color word location stimulus internal (hidden) representation
  15. hidden control signal output control signal color word location verbal

    manual (Cohen et al., 1990 ; Feng et al., 2014) stimulus internal (hidden) representation response
  16. hidden control signal (Cohen et al., 1990 ; Feng et

    al., 2014) stimulus internal (hidden) representation response color word location output control signal verbal manual
  17. hidden control signal (Cohen et al., 1990 ; Feng et

    al., 2014) stimulus internal (hidden) representation response color word location output control signal verbal manual multitasking is possible
  18. hidden control signal (Cohen et al., 1990 ; Feng et

    al., 2014) stimulus internal (hidden) representation response color word location output control signal verbal manual multitasking is not possible
  19. hidden control signal (Cohen et al., 1990 ; Feng et

    al., 2014) stimulus internal (hidden) representation response color word location output control signal verbal manual purpose of cognitive control is to limit interference
  20. hidden control signal output control signal color word location verbal

    manual stimulus internal (hidden) representation response output representations task (internal) input representations
  21. What is the maximum number of tasks that the network

    can perform in parallel without interference?
  22. What is the maximum number of tasks that the network

    can perform in parallel without interference?
  23. bipartite task graph dependency graph a b c a b

    c Task Dependencies a b c a b c a b c a b c
  24. Parallel Processing Capability Maximum amount of tasks that can be

    performed in independently? a b c d e f g h i j bipartite task graph a b c j i d h g e f dependency graph
  25. Parallel Processing Capability An independent vertex set of a graph

    G is a subset of the vertices such that no two vertices in the subset are connected by an edge of G. A maximum independent vertex set is an independent vertex set containing the largest possible number of vertices for a given graph. a j g Maximum amount of tasks that can be performed in independently? dependency graph (Musslick, Özcimder, Dey, Patwary, Willke & Cohen, CogSci, 2016)
  26. Parallel Processing Capability An independent vertex set of a graph

    G is a subset of the vertices such that no two vertices in the subset are connected by an edge of G. A maximum independent vertex set is an independent vertex set containing the largest possible number of vertices for a given graph. Maximum amount of tasks that can be performed in independently? bipartite task graph dependency graph a b c d e f g h i j a j g (Musslick, Özcimder, Dey, Patwary, Willke & Cohen, CogSci, 2016)
  27. Parallel processing capacity decreases as a function of § Overlap

    between task processing pathways (Feng et al., 2014; Musslick et al, 2016)
  28. = 2log('() (* 60 40 20 0 ( * Network

    Size (Alon, Reichman, Shinkar, Wagner, Musslick, Cohen, Griffiths, Dey, Ozcimder, NIPS, 2017) Parallel processing capacity decreases as a function of § Network depth Maximum Induced Path
  29. 0 0.01 0.02 0.03 0.04 Mean Squared Error Multitasking Error

    (Mean Squared Error) (Musslick, Özcimder, Dey, Patwary, Willke & Cohen, CogSci, 2016) 1 2 3 n …  x t … task graph trained neural network extract based on single task representations predict performance for all possible multitasking combinations Independent Dependent
  30. Why Shared Representations? § Multi-Task Learning: Improved Learning Efficiency &

    Generalization Performance (e.g. Baxter, 1995; Caruana, 1997; Collobert & Weston, 2008; Bengio et al., 2013) shared intermediate representation auxilliary task 1 output auxilliary task 2 output primary task output …
  31. 2 tasks executable at a time !" !# output gating

    signal $" hidden gating signal $#
  32. ∝ multitasking capacity # of tasks sharing an input dimension

    Andrew Saxe "# "$ %$ %# hidden gating signal output gating signal (Musslick, Saxe, Dey, Özcimder, Henselman & Cohen, CogSci, 2017) iterations to learn2
  33. Neural Network Simulation stimulus stimulus !" !# !$ %" %#

    %$ internal (hidden) representation task … stimulus grouped into input dimensions output grouped into response dimensions
  34. Neural Network Simulation !" !# tasks rely on different Input

    features !" !# tasks rely on the same Input features !" !# tasks rely on partially overlapping input features 0 0.5 1 Learned Task Correlation 50 55 60 65 70 75 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Feature Overlap 0 0.5 1 Learned Task Correlation 50 55 60 65 70 75 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Feature Overlap correlation between task features (Musslick, Saxe, Dey, Özcimder, Henselman & Cohen, CogSci, 2017)
  35. Neural Network Simulation 70 75 80 85 Iterations Required To

    Train 42 44 46 48 50 52 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Initial Task Correlation 70 75 80 85 Iterations Required To Train 42 44 46 48 50 52 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Initial Task Correlation Initial Task Correlation … a b c j …  x t (Musslick, Saxe, Dey, Özcimder, Henselman & Cohen, CogSci, 2017)
  36. Neural Network Simulation 70 75 80 85 Iterations Required To

    Train 42 44 46 48 50 52 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Initial Task Correlation Initial Task Correlation 60 70 80 90 100 110 120 Iterations Required To Train 40 50 60 70 80 90 Multitasking Accuracy (%) 0 0.2 0.4 0.6 0.8 1 Initial Task Similarity 100% Feature Overlap 80% Feature Overlap 0% Feature Overlap (Musslick, Saxe, Dey, Özcimder, Henselman & Cohen, CogSci, 2017)
  37. (Sagiv, Musslick, Niv & Cohen, CogSci, 2018) Separating representations P(separate

    representations) is optimal if a) Time cost for serialization is high b) There is little benefit for shared representations in terms of learning efficiency P (separate representations)
  38. I. Increased multitasking capacity due to automatization ('reflex action'), not

    due to enhanced mental capacity II. Multitasking capability must be domain-specific Thea Alba
  39. ∝ multitasking capacity iterations to learn2 Ø Shared representations promote

    learning efficiency Ø Parallel processing capacity drops precipitously with the amount of shared representation and can be virtually invariant to network size Tradeoff Ø Improvements in parallel processing performance achieved by pattern separation
  40. Jonathan Cohen Ted Willke & many others Biswadip Dey Kayhan

    Ozcimder Andrew Saxe Abigail Novick Anne Mennen Penina Krieger Yotam Sagiv Sachin Ravi Daniel Reichman Giovanni Petri Thank you!