Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Decoder-only architecture or How the f*ck does ...

Decoder-only architecture or How the f*ck does ChatGPT work?

As part of my learning journey in generative AI—specifically the attention mechanism—I gave this presentation at work.
It was mainly a way for me to reinforce my own understanding.

Stefan M.

March 06, 2025
Tweet

More Decks by Stefan M.

Other Decks in Programming

Transcript

  1. Overview IOKI loves AI Tokenizer text2id Embeddings id2vector Attention QKV

    in multi heads FFNN MLP with 3 layers Sampling temperature top_p Tokenizer id2text .
  2. Why? Embeddings IOKI loves AI Tokenizer text2id Embeddings id2vector To

    get multiple dimensions of a token. It returns a vector. How close is one token to another.
  3. Why? Embeddings IOKI loves AI Tokenizer text2id Embeddings id2vector To

    get multiple dimensions of a token. It returns a vector. How close is one token to another. King Queen Cat Dog Example: 3 dimensions
  4. Why? Embeddings IOKI loves AI Tokenizer text2id Embeddings id2vector To

    get multiple dimensions of a token. It returns a vector. How close is one token to another. Fun fact: text-embedding-3-small: 1536d text-embedding-3-large: 3072d King Queen Cat Dog Example: 3 dimensions
  5. Why? Attention IOKI loves AI Tokenizer text2id Embeddings id2vector Attention

    QKV in multi heads How much each token vector attend to another
  6. Why? Attention IOKI loves AI Tokenizer text2id Embeddings id2vector Attention

    QKV in multi heads IOKI IOKI(0.20) loves(0.30) AI(0.50) loves IOKI(0.46) loves(0.10) AI(0.53) AI IOKI(0.76) loves(0.14) AI(0.10) How much each token vector attend to another
  7. Why? FFNN IOKI loves AI Tokenizer text2id Embeddings id2vector Attention

    QKV in multi heads FFNN MLP with 3 layers Refine each vector individually. “What is this token about?”
  8. Why? Sampling IOKI loves AI Tokenizer text2id Embeddings id2vector Attention

    QKV in multi heads FFNN MLP with 3 layers Sampling temperature top_p Find the next token, based on the last vector
  9. Why? Tokenzier IOKI loves AI Tokenizer text2id Embeddings id2vector Attention

    QKV in multi heads FFNN MLP with 3 layers Sampling temperature top_p Tokenizer id2text To transform the token back to text.
  10. DOT IOKI loves AI Tokenizer text2id Embeddings id2vector Attention QKV

    in multi heads FFNN MLP with 3 layers Sampling temperature top_p Tokenizer id2text .
  11. The secret IOKI loves AI Tokenizer text2id Embeddings id2vector Attention

    QKV in multi heads FFNN MLP with 3 layers Sampling temperature top_p Tokenizer id2text . Repeat?!
  12. The secret IOKI loves AI. Tokenizer text2id Embeddings id2vector Attention

    QKV in multi heads FFNN MLP with 3 layers Sampling temperature top_p Tokenizer id2text EOS Repeat?! Repeat?!
  13. Tokenizer (text2id) IOKI loves AI Tokenizer text2id IOKI loves AI

    IO: 3982 KI: 66495 loves: 19620 AI: 20837
  14. Embeddings IOKI loves AI Tokenizer text2id Vectorizer/Embeddings id2vector IO: 3982

    → [0.12, -0.45, 0.88] KI: 66495 → [-0,67, 0.33, -0.22] loves: 19620 → [0.04, 0.91, -0.77] AI: 20837 → [0.55, -0.12, 0.34]
  15. Attention - Remember Why? IOKI loves AI Tokenizer text2id Embeddings

    id2vector Attention QKV in multi heads How much each word/token/vector “attends” to all other word/token/vector?
  16. Attention - Remember Why? IOKI loves AI Tokenizer text2id Embeddings

    id2vector Attention QKV in multi heads How much “IO” relates to other tokens KI loves AI
  17. Attention IOKI loves AI Tokenizer text2id Embeddings id2vector Attention QKV

    in multi heads … → [0.12, -0.45, 0.88] Q[-0.3, 0.20, -0.92] K[[0.14, -0.22, 0.54], [...], [...], [...]] V[0.64, 0.18, -0.45]
  18. Attention IOKI loves AI … → [0.12, -0.45, 0.88] Q[-0.3,

    0.20, -0.92] K[[0.14, -0.22, 0.54], [...], [...], [...]] V[0.64, 0.18, -0.45] Head 1 Head 2 Head 3 Q[-0.3] Q[0.20] Q[0.88] K[[0.14], [.], [.], [.]] K[[-0.22], [.], [.], [.]] K[[0.54], [.], [.], [.]] V[0.64] V[-0.18] V[-0.45]
  19. Attention IOKI loves AI … → [0.12, -0.45, 0.88] Q[-0.3,

    0.20, -0.92] K[[0.14, -0.22, 0.54], [...], [...], [...]] V[0.64, 0.18, -0.45] Head 1 Head 2 Head 3 [Q[-0.3], Q[.], Q[.], Q[.]] [Q[0.20], Q[.], Q[.], Q[.]] [Q[0.88], Q[.], Q[.], Q[.]] K[[0.14], [.], [.], [.]] K[[-0.22], [.], [.], [.]] K[[0.54], [.], [.], [.]] [V[0.64], V[.], V[.], V[.]] [V[-0.18], V[.], V[.], V[.]] [V[-0.45], V[.], V[.], V[.]]
  20. Attention IOKI loves AI … → [0.12, -0.45, 0.88] Q[-0.3,

    0.20, -0.92] K[[0.14, -0.22, 0.54], [...], [...], [...]] V[0.64, 0.18, -0.45] Head 1 Head 2 Head 3 Q[-0.3] Q[0.20] Q[0.88] K[[0.14], [.], [.], [.]] K[[-0.22], [.], [.], [.]] K[[0.54], [.], [.], [.]] V[0.64] V[-0.18] V[-0.45] softmax(Q * Kt) * V
  21. Attention IOKI loves AI … → [0.12, -0.45, 0.88] Q[-0.3,

    0.20, -0.92] K[[0.14, -0.22, 0.54], [...], [...], [...]] V[0.64, 0.18, -0.45] Head 1 Head 2 Head 3 Q[-0.3] Q[0.20] Q[0.88] K[[0.14], [.], [.], [.]] K[[-0.22], [.], [.], [.]] K[[0.54], [.], [.], [.]] V[0.64] V[-0.18] V[-0.45] softmax(Q * Kt) * V concat(Vhead1 , Vhead2 , Vhead3 )
  22. FFNN IOKI loves AI Tokenizer text2id Embeddings id2vector Attention QKV

    in multi heads FFNN MLP with 3 layers … → New_IO[0.22, -0.22, 0.99] → Refined_New_IO[...] … → New_KI[...] → Refined_New_KI[...] … → New_loves[...] → Refined_New_loves[...] … → New_AI[...] → Refined_New_AI[...]
  23. Sampling IOKI loves AI Tokenizer text2id Embeddings id2vector Attention QKV

    in multi heads FFNN MLP with 3 layers Sampling temperature top_p Projection before “sampling”!
  24. Projection IOKI loves AI Tokenizer text2id Embeddings id2vector Attention QKV

    in multi heads FFNN MLP with 3 layers Sampling temperature top_p Refined_New_AI[0.33, 0.11, 0.97] → P[...,N] P.size = number of tokens in the dictionary P.values = probability of the token in the dict
  25. Sampling IOKI loves AI Tokenizer text2id Embeddings id2vector Attention QKV

    in multi heads FFNN MLP with 3 layers Sampling temperature top_p [...,N]... Which to choose?
  26. Tokenzier (id2text) IOKI loves AI Tokenizer text2id Embeddings id2vector Attention

    QKV in multi heads FFNN MLP with 3 layers Sampling temperature top_p Tokenizer id2text 13: .
  27. Overview IOKI loves AI Tokenizer text2id Embeddings id2vector Attention QKV

    in multi heads FFNN MLP with 3 layers Sampling temperature top_p Tokenizer id2text .
  28. Overview IOKI loves AI. Tokenizer text2id Embeddings id2vector Attention QKV

    in multi heads FFNN MLP with 3 layers Sampling temperature top_p Tokenizer id2text EOS Repeat?!
  29. Training data in Embeddings IOKI loves AI Tokenizer Embeddings Training

    data IO: 3982 → [0.12, -0.45, 0.88] KI: 66495 → [-0,67, 0.33, -0.22] loves: 19620 → [0.04, 0.91, -0.77] AI: 20837 → [0.55, -0.12, 0.34]
  30. Parameter in Attention IOKI loves AI Tokenizer Embeddings Attention Training

    data Parameter … → [0.12, -0.45, 0.88] Q[-0.3, 0.20, -0.92] K[[0.14, -0.22, 0.54], [...], [...], [...]] V[0.64, 0.18, -0.45]
  31. Parameter in Attention IOKI loves AI Tokenizer Embeddings Attention Training

    data Parameter [0.12, -0.45, 0.88] * WeightsQ = Q [...] * WeightsK = K [...] * WeightsV = V
  32. p 1 p 2 p 7 p7 = p1 *

    w1 + p2 * w2 + …. w1 w2
  33. IOKI loves AI Tokenizer Embeddings Attention FFNN Sampling Training data

    Parameter Parameter Projection before “sampling”! Parameter
  34. IOKI loves AI Tokenizer Embeddings Attention FFNN Sampling Training data

    Parameter Parameter Projection before “sampling”! Parameter