Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Finetuning LLMs on consumer GPUs

Finetuning LLMs on consumer GPUs

Aniket Maurya

November 07, 2023
Tweet

More Decks by Aniket Maurya

Other Decks in Programming

Transcript

  1. November 2023 1. Finetuning LLMs on consumer GPUs 2. LLM

    Evaluation framework and datasets 3. Deep Dive into Transformers 4. Effortlessly analyze multifaceted financial documents with LlamaIndex
  2. Finetuning LLMs on custom datasets Aniket Maurya, Developer Advocate at

    Lightning AI November 2023 X.com/aniketmaurya linkedin.com/in/aniketmaurya
  3. • Overview of LLMs • Parameter efficient finetuning with instruction

    dataset • Training on consumer GPUs Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 3 Agenda
  4. Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 6

    What are LLMs Source: Attention is All you Need
  5. Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 10

    Parameter Efficient Finetuning Source : https://lightning.ai/pages/community/tutorial/lora-llm/
  6. • Remove untruthfulness and toxicity • Customize the output and

    tone of language • Privacy and control Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 12 Why Finetune LLMs
  7. • Setup model • Prepare data • Finetune the model

    Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 14 Finetuning LLMs
  8. • Setup model • Prepare data • Finetune the model

    Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 15 Finetuning LLMs
  9. • Setup model • Prepare data • Finetune the model

    Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 16 Finetuning LLMs
  10. • 4-bit quantized finetuning and inference • Minimal code, easy

    to debug and hack • TPU support • Flash-Attention 2 Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 19 Lit-GPT
  11. • Llama 7B, fp32: ~28GB • Llama 7B, fp16: ~14GB

    Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 26 Memory Required to load Llama
  12. • Activation memory • Gradient memory • Optimizer memory •

    Model memory Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 27 Memory Usage
  13. • Activation memory • Gradient memory • Optimizer memory •

    Model memory Source: https://tinkerd.net/blog/machine-learning/distributed-training/ 28 Memory Usage
  14. 31 • Reduce the model's context length • Use lower

    precision • Reduce the micro batch size Avoid OOM
  15. • 4-bit quantization 32 • Reduce the model's context length

    • Use lower precision • Reduce the micro batch size Avoid OOM
  16. 34 • Reduce the model's context length • Use lower

    precision • 4-bit quantization • Do sharding across multiple GPUs • Reduce the micro batch size Avoid OOM
  17. • Lit-GPT with LoRA finetuning • Lower Precision and 4-bit

    quantization • Distributed training and activation checkpointing Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 36 Conclusion