llama.cpp – what do we get?
PyDataLondon 2024-01 lightning talk
@IanOzsvald – ianozsvald.com
Slide 2
Slide 2 text
No need for a GPU+VRAM
Llama.cpp runs on CPU+RAM
Nothing sent off your machine
llama.cpp
By [ian]@ianozsvald[.com] Ian Ozsvald
X
Slide 3
Slide 3 text
Prototype ideas!
By [ian]@ianozsvald[.com] Ian Ozsvald
llama-2-7b-chat.Q5_K_M.gguf
5GB on disk and in RAM,
near real time
Slide 4
Slide 4 text
Experiment with coding assistants
(base llama2 model not good at this)
By [ian]@ianozsvald[.com] Ian Ozsvald
Slide 5
Slide 5 text
WizardCoder is good (tuned llama2)
By [ian]@ianozsvald[.com] Ian Ozsvald
wizardcoder-python-34b
-v1.0.Q5_K_S.gguf
22GB on disk & RAM
15s for example
You can replace CoPilot
with this for completions
Slide 6
Slide 6 text
Llava multi-modal
Extract facts from images?
By [ian]@ianozsvald[.com] Ian Ozsvald
llava-v1.5-7b-Q4_K.gguf
4GB on disk & RAM
5s for example
llama.cpp provides ./server
Slide 7
Slide 7 text
By [ian]@ianozsvald[.com] Ian Ozsvald
Slide 8
Slide 8 text
Try Mixtral, Phi2, UniNER etc
Wild wild west (Aug+ is sane)
What could you prototype?
Let’s discuss in the break – what are you
building?
Summary
By [ian]@ianozsvald[.com] Ian Ozsvald