Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Build your own LLM, Live, with MicroGPT

Build your own LLM, Live, with MicroGPT

Andrej Karpathy's MicroGPT is a single file no dependency mini GPT, you can run it live with me. We'll talk about how GPT works, we'll look at the code, we'll reflect on the Torch-based nanoGPT "big brother" and you'll end the talk with stronger intuitions about how our next best token guessing overlords are working. Plus you'll have a mini GPT running on your laptop in 200 lines of Python.
https://www.meetup.com/pydata-london-meetup/events/314472144/?eventOrigin=group_upcoming_events

Avatar for ianozsvald

ianozsvald

May 05, 2026

More Decks by ianozsvald

Other Decks in Science

Transcript

  1. Strategist/Trainer/Speaker/Author 25+ years Figuring where LLMs fit into DS Interim

    Chief Data Scientist By [ian]@ianozsvald[.com] Ian Ozsvald Signing 3rd Edition!
  2. You → Run Karpathy’s MicroGPT (no dependencies!) Talk about Transformers

    & Attention Char-only tokens, single names, tiny model Goals today By [ian]@ianozsvald[.com] Ian Ozsvald
  3. Download and run it please! Demo at cmd line Does

    it run for you? Demo... By [ian]@ianozsvald[.com] Ian Ozsvald
  4. This is nanoGPT (3 Layers) Imagine just 1 Layer for

    Micro Talk about inference first https://bbycroft.net/llm (huge credits) What does it look like? By [ian]@ianozsvald[.com] Ian Ozsvald
  5. Queries ask what this token is looking for Keys know

    if they answer a Query Values are answers to share back Many Heads learn different things All in aid of modifying the information that flows forwards Q K V Learned Projections (ahead of time) By [ian]@ianozsvald[.com] Ian Ozsvald
  6. Word Token Embedding By [ian]@ianozsvald[.com] Ian Ozsvald a e i

    o u Vowels (a→e+o+u less so i) are self-similar, dis-similar to consonants at Iteration 999 (z)
  7. We talk about learning after inference Backprop depends on ._children

    in Value() (SEE CODE) 4k params in microGPT does 59k calculations Minimise the loss to the known next token By [ian]@ianozsvald[.com] Ian Ozsvald Backpropagation
  8. xxx Computation graph for 4k param By [ian]@ianozsvald[.com] Ian Ozsvald

    Just 15 layers to see some of the circa 60k calculations
  9. Final nodes for loss By [ian]@ianozsvald[.com] Ian Ozsvald n is

    token position If probs[target_id] == 1.0 then neg log loss is 0.0 (else pos) Now we can modify weights to try to reduce the loss
  10. By [ian]@ianozsvald[.com] Ian Ozsvald Props to bbycroft microGPT is 1

    layer (4k) The dot is 3 layer nanoGPT (85k) GPT 3 is Big GPT 5.5 is ... GPT 3 (174B)
  11. 6 months of intellectually motivated research Make GPT Funny, ARC

    AGI 1, nanoGPT + more May 15th → Write GPT from Scratch (ambitious…) Network of 70+, message me on LinkedIn Playgroup .org.uk By [ian]@ianozsvald[.com] Ian Ozsvald
  12. MicroGPT opens the door→nanoGPT – Start with inference, then work

    back to training Knowing the ‘how’ is surprisingly complex (I’m not there yet) Join network of fellow searchers in PlayGroup.org.uk Conclusion By [ian]@ianozsvald[.com] Ian Ozsvald