explicitly showing the delineation of responsibilities between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264– 4096–4096–1000. neurons in a kernel map). The second convolutional layer takes as input the (response-normalized Krizhevsky et al., 2012 D N N 1ɹ D N N 2 D N N 3 D N N 4 D N N 5 D N N 6 D N N 7 D N N 8 convolutional layers fully-connected layers • Won the object recognition challenge in 2012 • 60 million parameters and 650,000 neurons (units) AlexNet • Trained with 1.2 million annotated images to classify 1,000 object categories
optimized for real world data or tasks to perform relevant functions • Allows for precise alignment between DNN activations and neural signals DNN as a model of the brain? • Acts as a mechanistic model for neural processes (e.g., Doerig et al., 2023) • Serves as a feature generator, functioning as an interface between the brain and the mind (Kamitani et al., 2025)
Only a subset of DNNs exhibit hierarchical similarity. Higher-performing DNNs often show less resemblance (Nonaka et al., 2021). • Prediction accuracy in individual DNN units is modest (correlations up to 0.5), with inflated prediction accuracy by noise ceiling. Encoding analysis and unit contribution: • Only a subset of DNN units contribute to neural prediction. Not suited to characterize the layer-wise representation (Nonaka et al., 2021). Dependency on analysis methods: • Both low-level and high-level brain areas can be explained by specific DNN layers, depending on the analytic approach (Sexton & Love, 2022). Dependence on training data over DNN architecture: • Prediction relies heavily on the diversity of training data rather than the network architecture itself (Conwell et al., 2023). ◦ Linear prediction may still be too flexible.
not discard much pixel-level information. • Large receptive fields do not necessarily reduce neural coding capacity if the unit density remains sufficient (Zhang & Sejnowski, 1999; Majima et al., 2017). • Near perfect recovery even from high-level layers with a very weak image prior (deep image prior; Ulyanov et al., 2020)
1) AI researchers have often tried to build knowledge into their agents 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. Sutton, R. (2019). The bitter Lesson. http:// www.incompleteideas.net/IncIdeas/BitterLesson.html. Inspiration Constructs (e.g., orientation selectivity reward prediction error working memory..)
characterization in latent space • Predictive validity with real-world variables (behavior, image, text, etc.) • Radical behavioralism (Skinner) • Prediction and control over explanation ਆ୩೭߁ (2023). ͱ৺ͷՊֶͷʮϛουϥΠϑΫϥΠγεʯۚࢠॻ. https://www.note.kanekoshobo.co.jp/n/nd90894f959b1.
"It's shining golden... I’ve found how to make gold!” Illusion of explanatory depth Illusion of explanatory breadth Illusion of objectivity Shirakawa et al., (2024); “Spurious reconstruction” Kamitani et al. (2025) ਆ୩ʢ2022ʣ࣮ݧσʔλղੳ࠶ೖɿจΛʮϑΣΠΫχϡʔεʯʹ ͠ͳ͍ͨΊʹ Speaker Deck https://speakerdeck.com/ykamit/shi-yan- detajie-xi-zai-ru-men-lun-wen-wo-hueikuniyusu-nisinaitameni.