Build your own LLM, Live, with MicroGPT

MicroGPT PyDataLondon 2026-04 main talk @IanOzsvald – ianozsvald.com

Strategist/Trainer/Speaker/Author 25+ years Figuring where LLMs fit into DS Interim
Chief Data Scientist By [ian]@ianozsvald[.com] Ian Ozsvald Signing 3rd Edition!

You → Run Karpathy’s MicroGPT (no dependencies!) Talk about Transformers
& Attention Char-only tokens, single names, tiny model Goals today By [ian]@ianozsvald[.com] Ian Ozsvald

Download and run it please! Demo at cmd line Does
it run for you? Demo... By [ian]@ianozsvald[.com] Ian Ozsvald

This is nanoGPT (3 Layers) Imagine just 1 Layer for
Micro Talk about inference first https://bbycroft.net/llm (huge credits) What does it look like? By [ian]@ianozsvald[.com] Ian Ozsvald

Queries ask what this token is looking for Keys know
if they answer a Query Values are answers to share back Many Heads learn different things All in aid of modifying the information that flows forwards Q K V Learned Projections (ahead of time) By [ian]@ianozsvald[.com] Ian Ozsvald

Word Token Embedding By [ian]@ianozsvald[.com] Ian Ozsvald a e i
o u Vowels (a→e+o+u less so i) are self-similar, dis-similar to consonants at Iteration 999 (z)

xxx WTE shows vowel structural similarity By [ian]@ianozsvald[.com] Ian Ozsvald
Vowels are dis-similar (blue) to consonants

We talk about learning after inference Backprop depends on ._children
in Value() (SEE CODE) 4k params in microGPT does 59k calculations Minimise the loss to the known next token By [ian]@ianozsvald[.com] Ian Ozsvald Backpropagation

xxx Computation graph for 4k param By [ian]@ianozsvald[.com] Ian Ozsvald
Just 15 layers to see some of the circa 60k calculations

Final nodes for loss By [ian]@ianozsvald[.com] Ian Ozsvald n is
token position If probs[target_id] == 1.0 then neg log loss is 0.0 (else pos) Now we can modify weights to try to reduce the loss

By [ian]@ianozsvald[.com] Ian Ozsvald Props to bbycroft microGPT is 1
layer (4k) The dot is 3 layer nanoGPT (85k) GPT 3 is Big GPT 5.5 is ... GPT 3 (174B)

6 months of intellectually motivated research Make GPT Funny, ARC
AGI 1, nanoGPT + more May 15th → Write GPT from Scratch (ambitious…) Network of 70+, message me on LinkedIn Playgroup .org.uk By [ian]@ianozsvald[.com] Ian Ozsvald

MicroGPT opens the door→nanoGPT – Start with inference, then work
back to training Knowing the ‘how’ is surprisingly complex (I’m not there yet) Join network of fellow searchers in PlayGroup.org.uk Conclusion By [ian]@ianozsvald[.com] Ian Ozsvald

Build your own LLM, Live, with MicroGPT

Build your own LLM, Live, with MicroGPT

ianozsvald

More Decks by ianozsvald

Other Decks in Science

Featured

Transcript

MicroGPT PyDataLondon 2026-04 main talk @IanOzsvald – ianozsvald.com

Strategist/Trainer/Speaker/Author 25+ years Figuring where LLMs fit into DS Interim

You → Run Karpathy’s MicroGPT (no dependencies!) Talk about Transformers

Download and run it please! Demo at cmd line Does

This is nanoGPT (3 Layers) Imagine just 1 Layer for

Queries ask what this token is looking for Keys know

Word Token Embedding By [ian]@ianozsvald[.com] Ian Ozsvald a e i

xxx WTE shows vowel structural similarity By [ian]@ianozsvald[.com] Ian Ozsvald

We talk about learning after inference Backprop depends on ._children

xxx Computation graph for 4k param By [ian]@ianozsvald[.com] Ian Ozsvald

Final nodes for loss By [ian]@ianozsvald[.com] Ian Ozsvald n is

By [ian]@ianozsvald[.com] Ian Ozsvald Props to bbycroft microGPT is 1

6 months of intellectually motivated research Make GPT Funny, ARC

MicroGPT opens the door→nanoGPT – Start with inference, then work