How to Get Large Language Models Small

https://unsplash.com/de/fotos/mann-hammert-hammer-auf-heisse-eisenstange-BINLgyrG_fI … how to get large language models small …
Lothar Wieske

https://unsplash.com/de/fotos/gluhbirne-auf-schwarzer-oberflache-fmTde1Fe23A QLoRA can replicate 16-bit full finetuning performance with a
quantized 4-bit base model and Low-rank Adapters (LoRA) at least up to 33B and 65B scales

https://unsplash.com/de/fotos/1-미국-달러-지폐-8lnbXtxFGZw https://www.bloomberg.com/company/press/generative-ai-to-become-a-1-3-trillion-market-by-2032-research-finds/

https://unsplash.com/de/fotos/bnW9O5ZOys4 Breakthroughs in generative artificial intelligence have the potential to
bring about sweeping changes to the global economy, according to Goldman Sachs Research. As tools using advances in natural language processing work their way into businesses and society, they could drive a 7% (or almost $7 trillion) increase in global GDP and lift productivity growth by 1.5 percentage points over a 10-year period (April 5th, 2023). https://www.goldmansachs.com/insights/pages/generative-ai-could-raise-global-gdp-by-7-percent.html

Wir sind bei der wahrscheinlich größten Digitalen Disruption dieses Jahrhunderts
gerade live dabei. Holger Schmidt

GTC Oct 2017 / Munich Keynote Jensen Huang

https://unsplash.com/de/fotos/M2cFm9iHXSc Attention Is All You Need Transformers Architecture Google 2017

https://unsplash.com/de/fotos/5u6bz2tYhX8 Generative Pretrained Transformer GPT

https://unsplash.com/de/fotos/V3dHmb1MOXM GPT

https://unsplash.com/de/fotos/leyUrzdwurc Generative Pretraining Finetuning Prompt Engineering

https://unsplash.com/de/fotos/TOzgRFJ0JxY Contrary to how it may seem when we observe
its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

https://hai.stanford.edu/news/2023-state-ai-14-charts

https://unsplash.com/de/fotos/2987j1bsfxo ??? Generative AI Platform ???

Hardware Accelerators (Amazon Trainium/Inferentia, Google TPUv4, Nvidia H100, …) Infrastructure
Cloud / Datacenter / Edge / Embedded Open Foundation Models Bigscience (BLOOM), OpenAI (GPT-2) Model Hubs Huggingface, Replicate Closed Foundation Models via APIs OpenAI (GPT-3, GPT-4) Generative AI Apps GitHub CoPilot Foundation Models Customization Knowledge Destillation, Finetuning https://unsplash.com/de/fotos/2987j1bsfxo

https://unsplash.com/photos/_Wie1QFLmKc Cathedral

https://unsplash.com/photos/pzQdd81Fhgk Bazaar

Nvidia T4

https://unsplash.com/de/fotos/빨간색-파란색과-흰색-꽃-5TK1F5VfdIk BLOOM

https://unsplash.com/de/fotos/frau-und-madchen-malen-an-der-wand-0Ec42BFrTJ4 DOLLY

https://dl.acm.org/doi/pdf/10.1145/3571884.3604316 Liesenfeld&ETAL: Opening up ChatGPT: Tracking openness, transparency, and accountability
in instruction-tuned text generators

https://unsplash.com/de/fotos/cpu-konzept-fur-zentralcomputer-prozessoren-3d-rendering-konzeptionelles-bild-_LIZ36OHGKk Accelerated Computing

NVIDIA Grace Hopper Superchip / GH200

https://unsplash.com/de/fotos/selektive-fokusfotografie-von-massbandern-cixohzDpNIo

https://arxiv.org/pdf/2303.15647.pdf Lialin&ETAL: Scaling Down to Scale Up: A Guide to
Parameter-Efficient Fine-Tuning

https://unsplash.com/de/fotos/herbst-und-schneefoto-2DhWmC57I20 LoRA … freeze many, train some …

https://unsplash.com/de/fotos/실험실-플라스크를-들고-있는-사람-H9t723yPjYI LoRA Paper - Table 4, Page 8 / Hu
ETAL: LoRA: Low-Rank Adaptation of Large Language Models. (arXiv:2106.09685v2) Method # Params (Training) WikiSQL Acc. MLNI-M Acc. SAMSum FineTuning 175 B 73.8 89.5 52.0/28.0/44.5 LoRA 5 M 73.4 91.7 53.8/29.8/45.9 GPT-3 175 B = *

https://unsplash.com/de/fotos/sortierte-zahlenfotografie-yD5rv8_WzxA QLoRA … Quantized Low-Rank Adaptation …

Dettmers ETAL: QLoRA: Efficient Finetuning of Quantized LLMs / arXiv:2305.14314

https://unsplash.com/de/fotos/rote-farbe-auf-weissem-pflaster-JqWvPQZXhIc QLORA can be seen as an equalizing factor that
helps to close the resource gap between large corporations and small teams with consumer GPUs democratizing

https://unsplash.com/de/fotos/die-strasse-ist-in-der-nahe-des-tagsuber-fahrenden-fahrzeugs-schneebedeckt-R5SrmZPoO40

How to Get Large Language Models Small

How to Get Large Language Models Small

Lothar Wieske

More Decks by Lothar Wieske

Other Decks in Technology

Featured

Transcript

https://unsplash.com/de/fotos/mann-hammert-hammer-auf-heisse-eisenstange-BINLgyrG_fI … how to get large language models small …

https://unsplash.com/de/fotos/gluhbirne-auf-schwarzer-oberflache-fmTde1Fe23A QLoRA can replicate 16-bit full finetuning performance with a

https://unsplash.com/de/fotos/1-미국-달러-지폐-8lnbXtxFGZw https://www.bloomberg.com/company/press/generative-ai-to-become-a-1-3-trillion-market-by-2032-research-finds/

https://unsplash.com/de/fotos/bnW9O5ZOys4 Breakthroughs in generative artificial intelligence have the potential to

Wir sind bei der wahrscheinlich größten Digitalen Disruption dieses Jahrhunderts

GTC Oct 2017 / Munich Keynote Jensen Huang

https://unsplash.com/de/fotos/M2cFm9iHXSc Attention Is All You Need Transformers Architecture Google 2017

2017

https://unsplash.com/de/fotos/5u6bz2tYhX8 Generative Pretrained Transformer GPT

https://unsplash.com/de/fotos/V3dHmb1MOXM GPT

https://unsplash.com/de/fotos/leyUrzdwurc Generative Pretraining Finetuning Prompt Engineering

https://unsplash.com/de/fotos/TOzgRFJ0JxY Contrary to how it may seem when we observe

https://hai.stanford.edu/news/2023-state-ai-14-charts

https://unsplash.com/de/fotos/2987j1bsfxo ??? Generative AI Platform ???

Hardware Accelerators (Amazon Trainium/Inferentia, Google TPUv4, Nvidia H100, …) Infrastructure

https://unsplash.com/photos/_Wie1QFLmKc Cathedral

https://unsplash.com/photos/pzQdd81Fhgk Bazaar

Nvidia T4

https://unsplash.com/de/fotos/빨간색-파란색과-흰색-꽃-5TK1F5VfdIk BLOOM

https://unsplash.com/de/fotos/frau-und-madchen-malen-an-der-wand-0Ec42BFrTJ4 DOLLY

https://dl.acm.org/doi/pdf/10.1145/3571884.3604316 Liesenfeld&ETAL: Opening up ChatGPT: Tracking openness, transparency, and accountability

https://unsplash.com/de/fotos/cpu-konzept-fur-zentralcomputer-prozessoren-3d-rendering-konzeptionelles-bild-_LIZ36OHGKk Accelerated Computing

NVIDIA Grace Hopper Superchip / GH200

https://unsplash.com/de/fotos/selektive-fokusfotografie-von-massbandern-cixohzDpNIo

https://arxiv.org/pdf/2303.15647.pdf Lialin&ETAL: Scaling Down to Scale Up: A Guide to

https://unsplash.com/de/fotos/herbst-und-schneefoto-2DhWmC57I20 LoRA … freeze many, train some …

https://unsplash.com/de/fotos/실험실-플라스크를-들고-있는-사람-H9t723yPjYI LoRA Paper - Table 4, Page 8 / Hu

https://unsplash.com/de/fotos/sortierte-zahlenfotografie-yD5rv8_WzxA QLoRA … Quantized Low-Rank Adaptation …

Dettmers ETAL: QLoRA: Efficient Finetuning of Quantized LLMs / arXiv:2305.14314

https://unsplash.com/de/fotos/rote-farbe-auf-weissem-pflaster-JqWvPQZXhIc QLORA can be seen as an equalizing factor that

https://unsplash.com/de/fotos/die-strasse-ist-in-der-nahe-des-tagsuber-fahrenden-fahrzeugs-schneebedeckt-R5SrmZPoO40