Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyData Pittsburgh - June 2024

PyData Pittsburgh - June 2024

This deck contains slides for the introductory and closing remarks from PyData Pittsburgh's June 2024 event, Radically Improving Neural Networks with Insights from Modern Neuroscience.

It also contains the slides for Patrick's lightning talk summarizing Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet, a research paper from Anthropic.

Patrick Harrison

June 12, 2024
Tweet

More Decks by Patrick Harrison

Other Decks in Technology

Transcript

  1. Caveats: 1. None of this is my own work 2.

    I am not a machine learning interpretability expert
  2. Two hypotheses: 1. Everything the model "knows" about the input

    is contained in these numbers 2. Combinations of neurons activating together correspond to real-world concepts
  3. def add(x, y) : -3.1 0.1 1.7 ... -0.8 0

    0 0 1.4 0 0 0 0.8 0 ... 0 0 -2.7 0
  4. def add(x, y) : -3.1 0.1 1.7 ... -0.8 -2.9

    0.3 1.5 ... -0.5 0 0 0 1.4 0 0 0 0.8 0 ... 0 0 -2.7 0
  5. def add(x, y) : -3.1 0.1 1.7 ... -0.8 -2.9

    0.3 1.5 ... -0.5 0 0 0 1.4 0 0 0 0.8 0 ... 0 0 -2.7 0 (these should be close)
  6. def add(x, y) : -3.1 0.1 1.7 ... -0.8 -2.9

    0.3 1.5 ... -0.5 0 0 0 1.4 0 0 0 0.8 0 ... 0 0 -2.7 0 (these should be close) (this should be sparse)
  7. def add(x, y) : -2.9 0.3 1.5 ... -0.5 -3.1

    0.1 1.7 ... -0.8 (these should be close) 0 0 0 1.4 0 0 0 0.8 0 ... 0 0 -2.7 0 "features"
  8. 0 0 0 10 0 0 0 0.8 0 ...

    0 0 -2.7 0 "features"
  9. -3.1 0.1 1.7 ... -0.8 0 0 0 10 0

    0 0 0.8 0 ... 0 0 -2.7 0 "features"
  10. You can now financially support PyData Pittsburgh Contributions are tax-deductbile!

    Eligible for employer match! https://pypgh.org/donate
  11. Data science for improving outcomes in mental health crises Pim

    Welle Chief Data Scientist Allegheny County Department of Human Services