ARC AGI Kaggle with llama3 First
Steps
PyDataLondon 2024-07 lightning talk
@IanOzsvald – ianozsvald.com
Slide 2
Slide 2 text
LLMs great at memorisation, can they reason?
F. Chollet argues that they’re bad at reasoning
$1M prize if LLM/other can solve these challenges
Abstract shapes “initial → target” in JSON
Open-weights models only (runs in off-line env)
Abstraction & Reasoning Challenge
By [ian]@ianozsvald[.com] Ian Ozsvald
Slide 3
Slide 3 text
What rules do you need?
By [ian]@ianozsvald[.com] Ian Ozsvald
Slide 4
Slide 4 text
Llama.cpp with quantised Llama 8B (and 70B)
Python llama.cpp bindings
Ask for 200 solutions
Try grid, list, grid+list representations
Grid only – poor. List better. Grid+list slightly better
First solution
By [ian]@ianozsvald[.com] Ian Ozsvald
Slide 5
Slide 5 text
Llama (normally) writes code
By [ian]@ianozsvald[.com] Ian Ozsvald
Bad syntax, no code, raw_input,
injection back into the training data (changing ints to strings)
Slide 6
Slide 6 text
Llama 3 8B IQ2 (heavy quant), some run
correctly on 3x3 “train” problem
Very fast, runs on 3090 (24GB VRAM)
Do you use Llama 3? Alpaca? ROPE?
Do you have text correctness metrics?
Summary
By [ian]@ianozsvald[.com] Ian Ozsvald