Running llama.cpp on the CPU

January 09, 2024

Science

570

Running llama.cpp on the CPU

A 5 minute lightning talk introducing llama.cpp, showing how we can run gguf models on the CPU without needing a GPU. I show llama2, WizardCoder and Llava multimodal, with command line arguments and links to the source gguf files.
To be written up on: https://notanumber.email/ and https://ianozsvald.com/
License - Creative Commons By Attribution

ianozsvald

January 09, 2024

Tweet

More Decks by ianozsvald

See All by ianozsvald

playgroup - PyDataLondon 2025-10 Lightning Talk

0

12

Successful Projects through a bit of Rebellion

0

75

Valuable Lessons Learned on Kaggle’s ARC AGI LLM Challenge (PyDataGlobal 2024)

0

450

Valuable Lessons Learned on Kaggle’s ARC AGI LLM challenge

0

230

ARC AGI Kaggle with llama3 - First Steps

0

230

Failing to reason with LLMs (ARC AGI kaggle update with Llama3)

0

120

Llama.cpp for fun (and maybe profit) - 30 minute

0

240

Llama.cpp for fun (and maybe profit) - 30 minute

0

140

Llama.cpp for fun (and maybe profit)

0

140

Other Decks in Science

See All in Science

データマイニング - コミュニティ発見

PRO

0

170

実力評価性能を考慮した弓道高校生全国大会の大会制度設計の提案 / (konakalab presentation at MSS 2025.03)

2

220

データベース05: SQL(2/3) 結合質問

PRO

0

840

データマイニング - ノードの中心性

PRO

0

310

知能とはなにかーヒトとAIのあいだー

0

120

機械学習 - 授業概要

PRO

0

280

Accelerated Computing for Climate forecast

PRO

0

130

高校生就活へのDA導入の提案

0

6.1k

データベース03: 関係データモデル

PRO

1

300

機械学習 - pandas入門

PRO

0

360

データベース01: データベースを使わない世界

PRO

1

890

【論文紹介】Is CLIP ideal? No. Can we fix it?Yes! 第65回コンピュータビジョン勉強会＠関東

5

1.7k

Featured

See All Featured

The Hidden Cost of Media on the Web [PixelPalooza 2025]

1

56

Automating Front-end Workflow

1371

200k

Rails Girls Zürich Keynote

95

14k

What’s in a name? Adding method to the madness

productmarketing

PRO

24

3.8k

The Art of Programming - Codeland 2020

56

14k

Building an army of robots

306

46k

186

16k

jQuery: Nuts, Bolts and Bling

65

8k

Evolution of real-time – Irina Nazarova, EuRuKo, 2024

9

1k

The Cost Of JavaScript in 2023

55

9.3k

Sharpening the Axe: The Primacy of Toolmaking

46

2.6k

Fight the Zombie Pattern Library - RWD Summit 2016

234

17k

Transcript

llama.cpp – what do we get? PyDataLondon 2024-01 lightning talk
@IanOzsvald – ianozsvald.com
No need for a GPU+VRAM Llama.cpp runs on CPU+RAM Nothing
sent off your machine llama.cpp By [ian]@ianozsvald[.com] Ian Ozsvald X
Prototype ideas! By [ian]@ianozsvald[.com] Ian Ozsvald llama-2-7b-chat.Q5_K_M.gguf 5GB on disk
and in RAM, near real time
Experiment with coding assistants (base llama2 model not good at
this) By [ian]@ianozsvald[.com] Ian Ozsvald
WizardCoder is good (tuned llama2) By [ian]@ianozsvald[.com] Ian Ozsvald wizardcoder-python-34b
-v1.0.Q5_K_S.gguf 22GB on disk & RAM 15s for example You can replace CoPilot with this for completions
Llava multi-modal Extract facts from images? By [ian]@ianozsvald[.com] Ian Ozsvald
llava-v1.5-7b-Q4_K.gguf 4GB on disk & RAM 5s for example llama.cpp provides ./server
By [ian]@ianozsvald[.com] Ian Ozsvald
Try Mixtral, Phi2, UniNER etc Wild wild west (Aug+ is
sane) What could you prototype? Let’s discuss in the break – what are you building? Summary By [ian]@ianozsvald[.com] Ian Ozsvald