Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

【論文紹介】The Geometry of Numerical Reasoning Langu...

Stardust
November 28, 2024

【論文紹介】The Geometry of Numerical Reasoning Language Models Compare Numeric Properties in Linear Subspaces

Stardust

November 28, 2024
Tweet

More Decks by Stardust

Other Decks in Research

Transcript

  1. ࿦จ঺հɿ The Geometry of Numerical Reasoning: Language Models Compare Numeric

    Properties in Linear Subspaces ஶऀɿAhmed Oumar El-Shangiti, Tatsuya Hiraoka, Hilal AlQuabeh, Benjamin Heinzerling, Kentaro Inui ࿦จϦϯΫɿhttps://arxiv.org/pdf/2410.13194
  2. Do LLMs leverage the linear subspace of entity-numerical attributes when

    solving numerical reasoning tasks? Research Question Linear subspace Numerical (logical) reasoning task Logical Reasoning Was Cristiano born before Messi? (Cristiano, born-before, Messi) Factual Recall “When was Cristiano born?” (Cristiano, born- in, 1985) “When was Messi born?” (Messi, born-in, 1987) 1985 < 1987 < means born- before low-dimensional (Linear) subspaces are used during knowledge extraction [Heinzerling and Inui, 2024].
  3. Overview of this study show the LLMs’ capability to solve

    the numerical reasoning tasks from the viewpoint of behavioral observation. look into the representation of LLMs. - identify the linear subspace corresponding to numerical attributes with partial least-squares (PLS) and intervene in the representation to test whether the model utilizes the linearly represented information do experiments on three numerical properties to demonstrate that LLMs leverage the linear subspace for reasoning tasks.
  4. Experimental settings 5,000 questions about numerical reasoning that include two

    entities each, based on WikiData. Task Dataset Model Llama3-8B-instruction Preprocess To focus the subsequent experiments on entities for which the LLM has reliable numerical knowledge, any entities that the LLM could not answer correctly were filtered out. Main Experiment Internal representation examined the inner workings of the LLM when solving the knowledge extraction and the numerical reasoning using PLS. → Details in next slide.
  5. Internal representation experiment (I) ~Prediction~ Process (1) filter out the

    entities that the model predicted their comparison incorrectly (2) feed a context vector that contains the comparison prompt (e.g., Was Cristiano born prior to Messi? (3) extract the hidden states of the last token of each entity from the LLM’s hidden states at a particular layer. (4) These hidden states are then used to train a PLS model [Wold, 1975] with a 5 component to predict the corresponding numerical attribute of each entity. Input: X_i = h_C^(l) \in R^d Output: y_i = 1965 N x 1 N x 5 5 x 1 Nxd dx5
  6. Fitting results of 5 component PLS model R^2 near 0.8

    means a good fit. 5 component is enough to explain the data.
  7. Internal representation experiment (II) ~Comparison~ Process similar to (I) but

    predict Yes/No to a context vector containing a comparison
  8. Internal representation experiment (III) ~Intervention~ Motivation PLS can handle correlations

    of X and Y but does not indicate causality. Process use 1st PLS component v :
  9. Effects of the proposed Intervention ※ɹAxis "Effect of intervention" is

    not clear. Why [0,1]? Obviously better Slightly better, v is not so important for this task Probably using PLS again...?
  10. Limitations and Discussions Limitations Discussions ᶃɹError Analysis ᶄɹOnly a single

    Llama3 model ᶅɹOnly three numerical attributes ᶆɹHyperparameter sensitivity on α (personal opinion) - Can accuracy on PLS model indicate the LLM's internal mechanism? - is it surprising enough? linearity is also not complete. - effective of intervention should be based on LLM's output? - ᶆ seems to be a critical limitation. - some wording ʢgeometry, causality...ʣ