Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Running llama.cpp on the CPU

ianozsvald
January 09, 2024

Running llama.cpp on the CPU

A 5 minute lightning talk introducing llama.cpp, showing how we can run gguf models on the CPU without needing a GPU. I show llama2, WizardCoder and Llava multimodal, with command line arguments and links to the source gguf files.
To be written up on: https://notanumber.email/ and https://ianozsvald.com/
License - Creative Commons By Attribution

ianozsvald

January 09, 2024
Tweet

More Decks by ianozsvald

Other Decks in Science

Transcript

  1. No need for a GPU+VRAM Llama.cpp runs on CPU+RAM Nothing

    sent off your machine llama.cpp By [ian]@ianozsvald[.com] Ian Ozsvald X
  2. Experiment with coding assistants (base llama2 model not good at

    this) By [ian]@ianozsvald[.com] Ian Ozsvald
  3. WizardCoder is good (tuned llama2) By [ian]@ianozsvald[.com] Ian Ozsvald wizardcoder-python-34b

    -v1.0.Q5_K_S.gguf 22GB on disk & RAM 15s for example You can replace CoPilot with this for completions
  4. Llava multi-modal Extract facts from images? By [ian]@ianozsvald[.com] Ian Ozsvald

    llava-v1.5-7b-Q4_K.gguf 4GB on disk & RAM 5s for example llama.cpp provides ./server
  5. Try Mixtral, Phi2, UniNER etc Wild wild west (Aug+ is

    sane) What could you prototype? Let’s discuss in the break – what are you building? Summary By [ian]@ianozsvald[.com] Ian Ozsvald