Finetuning LLMs on consumer GPUs

November 2023 1. Finetuning LLMs on consumer GPUs 2. LLM
Evaluation framework and datasets 3. Deep Dive into Transformers 4. Effortlessly analyze multifaceted ﬁnancial documents with LlamaIndex

Finetuning LLMs on custom datasets Aniket Maurya, Developer Advocate at
Lightning AI November 2023 X.com/aniketmaurya linkedin.com/in/aniketmaurya

• Overview of LLMs • Parameter efficient finetuning with instruction
dataset • Training on consumer GPUs Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 3 Agenda

What are LLMs Lightning AI ©2023 Proprietary and Confidential. All
Rights Reserved. 4

Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 5
What are LLMs

What are LLMs Source: Attention is All you Need

What are LLMs

What are LLMs *Decoder

Parameter Efficient Finetuning Source : https://lightning.ai/pages/community/tutorial/lora-llm/

Parameter Efficient Finetuning

• Remove untruthfulness and toxicity • Customize the output and
tone of language • Privacy and control Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 12 Why Finetune LLMs

Finetuning LLMs on instruction dataset

• Setup model • Prepare data • Finetune the model
Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 14 Finetuning LLMs

• 4-bit quantized ﬁnetuning and inference • Minimal code, easy
to debug and hack • TPU support • Flash-Attention 2 Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 19 Lit-GPT

Finetuning Llama on instruction dataset

Setup Model

Prepare Dataset

Finetune

• Llama 7B, fp32: ~28GB • Llama 7B, fp16: ~14GB
Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 26 Memory Required to load Llama

• Activation memory • Gradient memory • Optimizer memory •
Model memory Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 27 Memory Usage

• Activation memory • Gradient memory • Optimizer memory •
Model memory Source: https://tinkerd.net/blog/machine-learning/distributed-training/ 28 Memory Usage

29 • Reduce the micro batch size Avoid OOM

30 • Reduce the model's context length • Reduce the
micro batch size Avoid OOM

31 • Reduce the model's context length • Use lower
precision • Reduce the micro batch size Avoid OOM

• 4-bit quantization 32 • Reduce the model's context length
• Use lower precision • Reduce the micro batch size Avoid OOM

34 • Reduce the model's context length • Use lower
precision • 4-bit quantization • Do sharding across multiple GPUs • Reduce the micro batch size Avoid OOM

Avoid OOM

• Lit-GPT with LoRA ﬁnetuning • Lower Precision and 4-bit
quantization • Distributed training and activation checkpointing Lightning AI ©2023 Proprietary and Confidential. All Rights Reserved. 36 Conclusion

Aniket Maurya

Finetuning LLMs on consumer GPUs

Finetuning LLMs on consumer GPUs

More Decks by Aniket Maurya

Other Decks in Programming

Featured

Transcript