LLMs vs Chess

LLM vs Chess PyDataLondon 2026-01 lightning talk @IanOzsvald – ianozsvald.com

Can LLMs play chess? Recent paper made curious choices A
new benchmark? I’m learning chess, so I needed a distraction By [ian]@ianozsvald[.com] Ian Ozsvald

Random bot & Komodo engine (ELO 250) No move history,
no FEN (?), inverted board for black unicode board, Tools of legal moves What’s in the other paper? By [ian]@ianozsvald[.com] Ian Ozsvald

Stockfish level 0 (elo 1350?) or configurable ELO (or...is it?)
GLM 4.7, Opus 4.5, GPT 5.2, DeepSeek Terminus 3.1 LLM plays Black, run 3 games, UCI moves (e.g. e2e4) only Max 3 attempts at a legal moves, ‘resign’ allowed My approach By [ian]@ianozsvald[.com] Ian Ozsvald

By [ian]@ianozsvald[.com] Ian Ozsvald Prompt FEN includes 50 move rule,
castling and en-passant which are missing from graphical board

By [ian]@ianozsvald[.com] Ian Ozsvald GLM 4.7 board state prior to
loss – good description, quick to resign!

By [ian]@ianozsvald[.com] Ian Ozsvald Claude Opus 4.5 Bishop c8 and
rook move (is it blocked?)

By [ian]@ianozsvald[.com] Ian Ozsvald They all write bad (non UCI)
answers! DeepSeek Terminus 3.1 JSONDecodeError

Stockfish level 0 (1350 ELO?) vs 4 models They all
lose, or resign. Stockfish wins 6*4 models They all make illegal moves, sometimes repeatedly Can’t make stockfish ‘easy’ (maybe I’m missing something?) Ablation – ASCII board or no FEN? More games! Outcomes & next steps By [ian]@ianozsvald[.com] Ian Ozsvald

LLMs vs Chess

LLMs vs Chess

ianozsvald

More Decks by ianozsvald

Other Decks in Science

Featured

Transcript

LLM vs Chess PyDataLondon 2026-01 lightning talk @IanOzsvald – ianozsvald.com

Can LLMs play chess? Recent paper made curious choices A

Random bot & Komodo engine (ELO 250) No move history,

Stockfish level 0 (elo 1350?) or configurable ELO (or...is it?)

By [ian]@ianozsvald[.com] Ian Ozsvald Prompt FEN includes 50 move rule,

By [ian]@ianozsvald[.com] Ian Ozsvald GLM 4.7 board state prior to

By [ian]@ianozsvald[.com] Ian Ozsvald Claude Opus 4.5 Bishop c8 and

By [ian]@ianozsvald[.com] Ian Ozsvald They all write bad (non UCI)

Stockfish level 0 (1350 ELO?) vs 4 models They all