Slide 1

Slide 1 text

Building and Replicating Models of Visual Search Behavior with Tensorflow and the Scientific Python Stack David Nicholson Emory University, Biology, Prinz lab NickleDave @nicholdav

Slide 2

Slide 2 text

Acknowledgements ● Atlanteans ● DARPA Lifelong Learning Machines (L2M) program Constantine Dovrolis Zsolt Kira Sarah Pallas Astrid Prinz

Slide 3

Slide 3 text

Acknowledgements

Slide 4

Slide 4 text

Introduction Visual search: ● in the real world http://info.ni.tu-berlin.de/photodb/

Slide 5

Slide 5 text

Introduction Visual search: ● in the real world ● in the lab

Slide 6

Slide 6 text

Introduction Why build models of visual search behavior? 1. understand brain mechanisms of goal-driven perception

Slide 7

Slide 7 text

Introduction Why build models of visual search behavior? 1. understand brain mechanisms of goal-driven perception ○ Does the model we build with this mechanism behave like humans and other animals?

Slide 8

Slide 8 text

Introduction Why build models of visual search behavior? 1. understand brain mechanisms of goal-driven perception ○ Does the model we build with this mechanism behave like humans and other animals? 2. design artificial intelligence algorithms that draw from these mechanisms ○ Does our agent behave like humans and other animals?

Slide 9

Slide 9 text

Introduction The discrete item display visual search task

Slide 10

Slide 10 text

Introduction The discrete item display visual search task

Slide 11

Slide 11 text

Introduction Models of the discrete item display visual search task Models of capacity limitations serial, attention-limited - e.g. Guided Search parallel, noise-limited - Signal Detection Theory-based models

Slide 12

Slide 12 text

Introduction Models of the discrete item display visual search task Models of capacity limitations None of these models are "pixels-in, behavior out"

Slide 13

Slide 13 text

Introduction Neural networks as models of the visual system What I mean by "neural networks" https://www.youtube.com/watch?v=aircAruvnKk

Slide 14

Slide 14 text

Introduction Neural networks as models of the visual system Specifically, convolutional neural networks (CNNs) https://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/

Slide 15

Slide 15 text

Introduction Neural networks as models of the visual system The architecture of CNNs resembles the visual system Convolutional layers Fully-connected layers AlexNet (Krizhevsky et al. 2012). Adapted from Wang et al. 2015

Slide 16

Slide 16 text

Introduction Neural networks as models of the visual system The architecture of CNNs resembles the visual system Mid-level features texture, shape High-level semantic representation "the (primate) visual system" Low-level features edges, orientation

Slide 17

Slide 17 text

Introduction Neural networks as models of the visual system Task-optimized CNNs learn representations similar to those in the visual system of the (primate) brain Khaligh-Razavi, Kriegeskorte 2014

Slide 18

Slide 18 text

Introduction Neural networks as models of the visual system Task-optimized CNNs learn representations similar to those in the visual system of the (primate) brain Yamins DiCarlo 2014

Slide 19

Slide 19 text

Introduction Neural networks as brain models Hypothesis: if CNNs are using representations similar to those in the (primate) visual system, then they should behave like humans when performing related tasks

Slide 20

Slide 20 text

Introduction Neural networks as brain models Hypothesis: if CNNs are using representations similar to those in the (primate) visual system, then they should behave like humans when performing the discrete item display search task

Slide 21

Slide 21 text

Introduction Neural networks as brain models Hypothesis: if CNNs are using representations similar to those in the (primate) visual system, then they should behave like humans when performing the discrete item display search task Põder, 2017. "Capacity limitations of visual search in deep convolutional neural network"

Slide 22

Slide 22 text

Experiment 1: replicate fine-tuning approach Methods ● use AlexNet and VGG16 architecture with weights pre-trained on ImageNet dataset

Slide 23

Slide 23 text

Experiment 1: replicate fine-tuning approach Methods ● use AlexNet and VGG16 architecture with weights pre-trained on ImageNet dataset ● randomly initialize fully-connected layers

Slide 24

Slide 24 text

Experiment 1: replicate fine-tuning approach Methods ● use AlexNet and VGG16 architecture with weights pre-trained on ImageNet dataset ● randomly initialize fully-connected layers ● train fully-connected layers with small learning rate: 0.0001

Slide 25

Slide 25 text

Experiment 1: replicate fine-tuning approach Methods ● use AlexNet and VGG16 architecture with weights pre-trained on ImageNet dataset ● randomly initialize fully-connected layers ● train fully-connected layers with small learning rate: 0.0001 ● apply base learning rate to other layers: 1e-20

Slide 26

Slide 26 text

Experiment 1: replicate fine-tuning approach Methods ● use AlexNet and VGG16 architecture with weights pre-trained on ImageNet dataset ● randomly initialize fully-connected layers ● train fully-connected layers with small learning rate: 0.0001 ● apply base learning rate to other layers: 1e-20 ● generate visual search stimuli with searchstims, a Python package built with PyGame (https://github.com/NickleDave/searchstims)

Slide 27

Slide 27 text

Experiment 1: replicate fine-tuning approach Methods ● use AlexNet and VGG16 architecture with weights pre-trained on ImageNet dataset ● randomly initialize fully-connected layers ● train fully-connected layers with small learning rate: 0.0001 ● apply base learning rate to other layers: 1e-20 ● generate visual search stimuli with searchstims, a Python package built with PyGame (https://github.com/NickleDave/searchstims) ● train 5 replicates of each network on a dataset with 6400 samples of a single visual search stimulus, balanced across "set size" ● measure accuracy on separate 800 sample test set

Slide 28

Slide 28 text

Experiment 1: replicate fine-tuning approach Results ● Both AlexNet and VGG16 show human-like drop in accuracy when trained this way

Slide 29

Slide 29 text

Experiment 1: replicate fine-tuning approach Results ● Both AlexNet and VGG16 show human-like drop in accuracy when trained this way ● but training histories suggest model has not converged

Slide 30

Slide 30 text

Experiment 1: replicate fine-tuning approach Results ● notice different rates of convergence across set sizes: 1 > 2 > 4 > 8

Slide 31

Slide 31 text

Experiment 2: typical learning rate, augment data Methods ● train fully-connected layers with typical learning rate: 0.001 ● freeze weights in other layers pre-trained on ImageNet; no "base" rate ● increase number of training examples for larger set sizes

Slide 32

Slide 32 text

Experiment 2: typical learning rate, augment data Results ● training histories show that accuracy of models now converge on asymptotic value

Slide 33

Slide 33 text

Experiment 2: typical learning rate, augment data Results ● training histories show that accuracy of models now converge on asymptotic value ● but still see different rates of convergence

Slide 34

Slide 34 text

Experiment 2 Results

Slide 35

Slide 35 text

Experiment 2 Results

Slide 36

Slide 36 text

Experiment 2 Results: ● improvement comes from augmented data

Slide 37

Slide 37 text

Experiment 2: typical learning rate, augment data Results: ● improvement comes from augmented data

Slide 38

Slide 38 text

Discussion implications for artificial intelligence: ● translational invariance is still an issue ● possible solutions: ○ spatial transformer networks (Jaderberg 2015) ○ dynamic routing with capsules (e.g. Sabour et al. 2017) ● are these mechanisms competitive with just augmenting the dataset?

Slide 39

Slide 39 text

Discussion Implications for neuroscience ● "training" the visual system may include "augmentation" to induce translational invariance ○ e.g. see just a few objects but from many different perspectives ○ cf. work by Linda Smith et al. ● visual system has other mechanisms to enable translational invariance ○ such as: moving the eyes ● hard to compare behavior of deep learning models with behavior of animals when tasks measure factors that impair performance ○ do we have a good model or just bad training? ○ but this is important to do; can't ignore tasks with clear effects

Slide 40

Slide 40 text

Questions, comments please check out: https://github.com/NickleDave/thrillington https://www.nengo.ai/ https://www.nengo.ai/nengo-dl/ for more work like this, check out this conference: https://ccneuro.org/2019/ and these podcasts https://braininspired.co/ http://unsupervisedthinkingpodcast.blogspot.com/