Running llama.cpp on the CPU - Speaker Deck

Tweet

Tweet

Slide 1

Slide 1 text

llama.cpp – what do we get? PyDataLondon 2024-01 lightning talk @IanOzsvald – ianozsvald.com

Slide 2

Slide 2 text

No need for a GPU+VRAM Llama.cpp runs on CPU+RAM Nothing sent off your machine llama.cpp By [ian]@ianozsvald[.com] Ian Ozsvald X

Slide 3

Slide 3 text

Prototype ideas! By [ian]@ianozsvald[.com] Ian Ozsvald llama-2-7b-chat.Q5_K_M.gguf 5GB on disk and in RAM, near real time

Slide 4

Slide 4 text

Experiment with coding assistants (base llama2 model not good at this) By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 5

Slide 5 text

WizardCoder is good (tuned llama2) By [ian]@ianozsvald[.com] Ian Ozsvald wizardcoder-python-34b -v1.0.Q5_K_S.gguf 22GB on disk & RAM 15s for example You can replace CoPilot with this for completions

Slide 6

Slide 6 text

Llava multi-modal Extract facts from images? By [ian]@ianozsvald[.com] Ian Ozsvald llava-v1.5-7b-Q4_K.gguf 4GB on disk & RAM 5s for example llama.cpp provides ./server

Slide 7

Slide 7 text

By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 8

Slide 8 text

Try Mixtral, Phi2, UniNER etc Wild wild west (Aug+ is sane) What could you prototype? Let’s discuss in the break – what are you building? Summary By [ian]@ianozsvald[.com] Ian Ozsvald