Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transformers as Scientific Models?

Transformers as Scientific Models?

Presented at The 2024 Swedish Congress of Philosophy.

Andreas Chatzopoulos

June 09, 2024
Tweet

Transcript

  1. UNIVERSITY OF GOTHENBURG ‣ McCulloch & Pitts 1940s ‣ Crude

    model of a biological neuron ‣ Can serve as logic gates NEURON MODEL a b 1 a b 1 b 0 or and not 3
  2. UNIVERSITY OF GOTHENBURG ‣ Rosenblatt 1950s ‣ More realistic -

    takes varying synaptic strength into account (Hebbian theory) PERCEPTRON a b y Σ 4 w1 w2
  3. UNIVERSITY OF GOTHENBURG AI IN 2024 ‣ Mostly Artificial Neural

    Networks ‣ Not indented as brain models – tools to accomplish various tasks 6
  4. UNIVERSITY OF GOTHENBURG EXCEPTION ‣ Convolution Neural Networks – in

    many ways developed with inspiration from the structure of the visual cortex 7
  5. UNIVERSITY OF GOTHENBURG ‣ Layered system that processes information in

    a hierarchical way to extract image features ‣ Similar to how neurons in the visual cortex respond to particular stimuli CONVOLUTIONAL NEURAL NETWORKS (CNNs) 8
  6. UNIVERSITY OF GOTHENBURG NETWORKS AS TOOLS ‣ Others are developed

    as tools to accomplish various tasks with little regard to the functions of the brain 9
  7. UNIVERSITY OF GOTHENBURG ‣ Developed in 2017 ‣ Designed to

    handle sequential data like text and speech – attention functionality allows the transformer to focus on specific parts of the input sequence ‣ Basis for Large Language Models (LLMs) ‣ Constructed to solve a specific problem, not to model a particular brain functionality TRANSFORMERS 10
  8. UNIVERSITY OF GOTHENBURG ‣ Neuron and astrocytes in the brain

    could theoretically implement the core computations performed by transformers networks. ‣ Provides a novel perspective of the relationship between LLMs and the brain BRAIN- TRANSFORMERS? 11
  9. UNIVERSITY OF GOTHENBURG Could we accidentally have stumbled upon a

    model of a mechanism that actually exists in the brain? 12
  10. UNIVERSITY OF GOTHENBURG HOW-ACTUAL MODEL ‣ Models a phenomenon in

    the way it actually occurs – a model of how things actually are PHILOSOPHICAL ANALYSIS HOW-POSSIBLY MODEL ‣ Propositional model of how a phenomena might possibly occur – how things could possibly be 14
  11. UNIVERSITY OF GOTHENBURG ‣ Scientific progress entails moving towards how-actual

    ‣ A hypothesis moves towards corroboration EPISTEMIC PLAUSABILITY how-actual how-plausibly how-possibly 15
  12. UNIVERSITY OF GOTHENBURG Even if this could be seen as

    a model of a functionality that actually exist in the brain, it would only be a model of one particular feature of the brain 16
  13. UNIVERSITY OF GOTHENBURG What if it is not an exact

    model of this feature? What if it is somewhat accurate? 17
  14. UNIVERSITY OF GOTHENBURG HOW SHOULD THIS DIFFERENCE BE UNDERSTOOD? PHILOSOPHICAL

    ANALYSIS how-actual Target Model (Stuart Glennan) 18 how-possibly ? Model Model Model Model
  15. UNIVERSITY OF GOTHENBURG PHILOSOPHICAL ANALYSIS (Stuart Glennan) 20 The relationship

    is one of similarities in degrees and respect Target Model
  16. UNIVERSITY OF GOTHENBURG PHILOSOPHICAL ANALYSIS (Stuart Glennan) 21 Since the

    relationship is about similarities, about representing more or less, we cannot say: ‣ Model A represents a possibility that is actual ‣ Model B represents a possibility that is not actual THEY REPRESENT IN DEGREES AND RESPECT, NOT EITHER OR
  17. UNIVERSITY OF GOTHENBURG PHILOSOPHICAL ANALYSIS (Stuart Glennan) 22 If we

    hold a model to less strict similarity requirements, it may succeed in representing a target, if only roughly
  18. UNIVERSITY OF GOTHENBURG PHILOSOPHICAL ANALYSIS (Stuart Glennan) 23 INSTEAD OF

    DIVIDING MODELS INTO POSSIBLY–ACTUAL: ‣ Adjust their similarity requirements ‣ A model that succeeds in representing a target due to decreased similarity requirements should be viewed as a how-roughly model If we hold a model to less strict similarity requirements, it may succeed in representing a target, if only roughly
  19. UNIVERSITY OF GOTHENBURG ‣ Postulates circular orbits for the planets

    ‣ We now this to be incorrect: ‣ An inaccurate, how-possibly model according to old view ‣ Shift to thinking about similarities: ‣ With decreased similarity requirements, the model would be correct ‣ A how-roughly model that captures important features, even if it is not 100% correct EXAMPLE: COPERNICAN MODEL
  20. UNIVERSITY OF GOTHENBURG HOW DO WE TEST THIS? 27 Models

    can be tested by running them in simulations where their performance is examined ‣ When we interact with ChatGPT, we are running the LLM in a simulation ‣ When we are using image recognition with CNNs, we run simulations that employ these models
  21. UNIVERSITY OF GOTHENBURG 28 BUILDING MODELS AND SIMULATIONS Same methodology

    could be employed for many different cognitive theories ‣ Language processing (LLMs) ‣ Visual processing (CNNs) ‣ ...
  22. UNIVERSITY OF GOTHENBURG 29 TESTING BY EMBEDDING To test the

    simulations it's often advantageous to use them in agents, placed in an environment
  23. UNIVERSITY OF GOTHENBURG ‣ Agents in an environment in a

    reinforcement learning scenario where the agent behavior is studied in various ways ‣ Simulated ecosystem where behavior is learned through reinforcement learning RIGHT NOW
  24. UNIVERSITY OF GOTHENBURG I want to develop this: ‣ Augment

    agents with LLMs ‣ Theory behind this is that the agents "cognitive functions" could be viewed as how-roughly models that are tested in an environment GOING FORWARD
  25. UNIVERSITY OF GOTHENBURG TO SUM IT UP 36 Just as

    the Copernican Model could learn us things about the solar system without being 100% correct, maybe transformers and LLMs can teach us about the brain without being 100% accurate models of the brain, or even intended as brain-models to begin with. They could be seen as how-roughly models. Simulated environments with embedded agents that employs transformer-based LLMs could help us to test this, and the same methodology could be used to test other cognitive theories.