The thin line between reconstruction, classification, and hallucination in brain decoding

The thin line between reconstruction, classiﬁcation and hallucination in brain
decoding Yuki Kamitani Kyoto University & ATR http://kamitani-lab.ist.i.kyoto-u.ac.jp @ykamit Pierre Huyghe ‘Uumwelt’ (2018)

Kyoto University and ATR Ken Shirakawa Yoshihiro Nagano Shuntaro Aoki
Misato Tanaka Yusuke Muraki Tomoyasu Horikawa (NTT) Guohua Shen (UEC) Kei Majima (NIRS) Grants JSPS KAKENHI, JST CREST, NEDO Acknowledgements

Reconstruct Test image Takagi and Nishimoto, 2023 Ozcelik and VanRullen,
2023 Reconstruction • Reconstruction from visual features + text feature-guided diffusion • Natural Scene Dataset (NSD; Allen et al., 2022)

Treatise on Man (Descartes, 1677) World, Brain, and Mind Ernst
Mach’s drawing of his own visual scene (Mach, 1900)

Fechner’s inner and outer psychophysics Slide: Cheng Fan

#VUUPO QSFTT Fechner’s inner and outer psychophysics

Fechner’s inner and outer psychophysics Richer contents revealed? #SBJOEFDPEJOH Brain
decoding as psychological measurement Not a parlor trick!

Brain decoding Let the machine to recognize brain activity patterns
that humans cannot recognize (Kamitani & Tong, Nature Neuroscience 2005) Machine learning prediction Neural mind-reading via shared representation

(Miyawaki, Uchida, Yamashita, Sato, Morito,Tanabe, Sadato, Kamitani, Neuron 2008) Presented
Reconstructed Presentedɹɹ Reconstructed Visual image reconstruction

… ~30 zeros 10 x 10 binary “pixels” 2100 =
10000000ɾɾɾpossible images Brain data for only a tiny subset of images can be measured

Multi-scale Image bases + + + Presented image (contrast) Reconstructed
image (contrast) fMRI signals Modular (compositional) decoding with local contrast features Training: ~400 random images Test: Images not used in training (arbitrary images; 2^100)

Classification vs. reconstruction Classification: • Classes are predefined and shared
between train and test Reconstruction: • Ability to predict arbitrary instances in the space of interest • Zero-shot prediction: Beyond training outputs <=> “Double dipping” How to build a reconstruction model with limited data? • Compositional representation: Instances are represented by a combination of elemental features (e.g., pixels, wavelets) • Effective mapping from brain activity to each elemental feature

Visual image reconstruction by decoding local contrasts (Neuron, 2008) Decoding
dream contents in semantic categories (Science, 2013) Low High ? ? DNN

Figure 2: An illustration of the architecture of our CNN,
explicitly showing the delineation of responsibilities between the two GPUs. One GPU runs the layer-parts at the top of the ﬁgure while the other runs the layer-parts at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264– 4096–4096–1000. neurons in a kernel map). The second convolutional layer takes as input the (response-normalized Krizhevsky et al., 2012 D N N 1ɹ D N N 2 D N N 3 D N N 4 D N N 5 D N N 6 D N N 7 D N N 8 convolutional layers fully-connected layers • Won the object recognition challenge in 2012 • 60 million parameters and 650,000 neurons (units) Nameless, faceless features of DNN • Trained with 1.2 million annotated images to classify 1,000 object categories

−4 0 4 8 True Predicted Unit #562 of CNN8
predicted from VC Feature value (Horikawa and Kamitani, 2015; Nature Communications 2017; Nonaka et al., 2019) Brain-to-DNN decoding (translation) and hierarchical correspondence

Deep image reconstruction ʢShen, Horikawa, Majima, Kamitani, bioRxiv 2017; Plos
CB 2019ʣ

true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC
Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_unav5_new:VC FMD_8s:VC Scenetrain_unav5_8s:VC GOD5_FMD5_unav_8s:VC GOD5_Scene5_unav_8s:VC GOD5_FMD5_Scene5_unav_8s:VC true GODtrain_u FMD_8s:VC Scenetrain_ GOD5_FMD GOD5_Scen GOD5_FMD true GODtrain_u FMD_8s:VC Scenetrain_ GOD5_FMD GOD5_Scen GOD5_FMD

Imagined Reconstruction ʢShen, Horikawa, Majima, Kamitani, bioRxiv 2017; Plos CB
2019; Movie by M. Tanakaʣ Mental imagery

Illusions (Cf., Shimojo, Kamitani, Nishida, Science 2001)

(Chen, Horikawa, Majima, Aoki, Abdelhack, Tanaka, Kamitani, Science Advances 2023)
Reconstruction Generator Decoded DNN features Test: Illusory images fMRI activity Stimulus DNN features Stimulus DNN features Training: Natural images Decoder

Translator–generator pipeline Shirakawa, Nagano, Tanaka, Aoki, Majima, Muraki, Kamitani, arxiv
2024 • Translator: Align brain activity to a machine’s latent features (neural- to-machine latent translation) • Generator: Recover an image from the latent features Latent features: DNN features of an image. Compositional representation that spans the image space of interest.

Realistic output using text-guided methods Reconstruct Test image Takagi and
Nishimoto, 2023 Ozcelik and VanRullen, 2023 Reconstruction • Reconstruction from visual features + text feature (ClIP)-guided diffusion • Natural Scene Dataset (NSD; Allen et al., 2022) Takagi & Nishimoto (2023) Ozcelik & VanRullen (2023) Shen et al. (2019)

Cherry-picking from multiple generations • Takagi & Nishimoto (2023) generated
multiple images and selected the best one, a highly questionable procedure • Plausible reconstructions even from random brain data • Something more than cherry-picking? Takagi & Nishimoto (2023)

Failure of replication with different dataset Test image • The
text-guided methods fail to generalize • Shen et al. (2019) method shows consistent performance across datasets Test image Replication with diﬀerent dataset

Issues with NSD • Only ~40 semantic clusters, described by
single words • Signiﬁcant overlap between training and test sets • For each test image, visually similar images are found in the training set

Miyawaki et al. (2008) Train 440 random binary images 1200
natural images Test Simple shapes (+ independent set of random images) Train/test splits in our previous studies Shen et al. (2019) Natural images from different categories + simple shapes • Designed to avoid visual and semantic overlaps. Testing out-of- distribution/domain generalization • Why artificial images in test? Pitfalls of naturalistic approaches • The brain has evolved with “natural” images but we can perceive artificial images, too. Models should account for this • Risk of unintended shortcuts with an increasing scale/complexity of data analysis. Use artificial images for control

Failure of zero-shot prediction • CLIP almost always fail to
identify the true sample against training samples • The predictions for the cluster excluded from training fall on the other clusters • Severe limitation to predict out of training ≒ classification Excluded from training

Failed recovery from true features • The text-guided methods cannot
recover the original image from the true features • But the outputs have realistic appearances (hallucination) “Realistic reconstructions” may primarily be a blend of 1. Classification into trained categories 2. Hallucination: the generation of convincing yet inauthentic images through text-to-image diffusion Takagi & Nishimoto (2023) Ozcelik & VanRullen (2023) Shen et al. (2019)

Output dimension collapse A regression model’s output collapses to a
subspace spanned by the training latent features Brain: Latent feature: Weight of Ridge regression: Prediction from brain data: The prediction is a linear weighted summation of the training latent features (if all input dimensions are used for the prediction of each latent feature)

Is out-of-distribution prediction possible? Simulation with clustered data 101 102
103 Number of training cluster (keeping total sample size) 0.0 0.2 0.4 0.6 0.8 1.0 Cluster identification 0 In Out Chance level Prediction accuracy Latent feature space (Y) • Diverse training features enables effective out-of-distribution prediction. • Compositional prediction: The ability to predict in unseen domains by a combination of predicted features

Whole space? Effective axes? How much diversity necessary? Latent feature
space (Y) Training latent feature should cover: Training data (latent features, Y) should be diverse enough but • Not necessary to be exponentially diverse • The necessary diversity scales linearly with the dimension of latent features

Questionable train/test split Nishimoto et al. 2011 • (Note: Retrieval
via an encoding model rather than reconstruction) • 37/48 scenes in test contain nearly identical frames in train • Presumably temporally adjacent frames were split into train and test Shared categories between training and test • “EEG image reconstruction” (e.g., Kavasidis et al., 2017): 2000 images of 40 object categories in ImageNet, shared between train and test • “Music reconstruction” (Denk et al., 2023): 540 music pieces from 10 music genres shared between train and test

• Original images can be recovered from the true features
of higher CNN layers by pixel optimization with weak prior • Large receptive fields do not necessarily impair neural coding if the number/density of units is fixed (Zhang and Sejnowski, 1999; Majima et al., 2017) Image-level information is thrown away in hierarchical processing?

Caveats with evaluation by identification • Given a prediction, identify
the most similar one among candidates • Even with the quality of distinguishing two broad categories (e.g., dark vs. bright), the pairwise identiﬁcation accuracy can reach 75% • High identiﬁcation does not imply good reconstruction •Use multiple evaluations, including visual inspections

How are we fooled by generative AIs? Introducing Generative AIs
• Our (old) intuition: Only truthful things look realistic • But now, generative AIs are producing a lot of realistic-looking but not truthful things • Researchers need to update the intuition to better estimate Pr[T|R] T: Truthful R: Realistic-looking

Illusions in AI-driven scientiﬁc research Messeri & Crockett (2024) Alchemist:
"It's shining golden... I’ve found how to make gold!” *MMVTJPOPG FYQMBOBUPSZEFQUI *MMVTJPOPG FYQMBOBUPSZCSFBEUI *MMVTJPOPG PCKFDUJWJUZ

Summary •Brain decoding as psychological measurement •Reconstruction: Zero-shot prediction of
arbitrary instances using compositional latent representation •Classiﬁcation+hallucination by text-guided methods? •Cherry-picking •Low-diversity data with train/test overlap •Recovery failure: Misspeciﬁcation of latent features •Output dimension collapse and linear scaling of diversity • Image information preserved across visual hierarchy • How are we fooled by generative AIs

The thin line between reconstruction, classific...

The thin line between reconstruction, classification, and hallucination in brain decoding

More Decks by Yuki Kamitani

Other Decks in Science

Featured

Transcript