Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to Get Large Language Models Small

Avatar for Lothar Wieske Lothar Wieske
December 05, 2023

How to Get Large Language Models Small

SEACON digital 2023
software engineering + architecture conference | 4. + 5. Dezember
Online/Hybrid

Dienstag, 5. Dezember 2019

How to Get Large Language Models Small

Avatar for Lothar Wieske

Lothar Wieske

December 05, 2023
Tweet

More Decks by Lothar Wieske

Other Decks in Technology

Transcript

  1. https://unsplash.com/de/fotos/gluhbirne-auf-schwarzer-oberflache-fmTde1Fe23A QLoRA can replicate 16-bit full finetuning performance with a

    quantized 4-bit base model and Low-rank Adapters (LoRA) at least up to 33B and 65B scales
  2. https://unsplash.com/de/fotos/bnW9O5ZOys4 Breakthroughs in generative artificial intelligence have the potential to

    bring about sweeping changes to the global economy, according to Goldman Sachs Research. As tools using advances in natural language processing work their way into businesses and society, they could drive a 7% (or almost $7 trillion) increase in global GDP and lift productivity growth by 1.5 percentage points over a 10-year period (April 5th, 2023). https://www.goldmansachs.com/insights/pages/generative-ai-could-raise-global-gdp-by-7-percent.html
  3. https://unsplash.com/de/fotos/TOzgRFJ0JxY Contrary to how it may seem when we observe

    its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
  4. Hardware Accelerators (Amazon Trainium/Inferentia, Google TPUv4, Nvidia H100, …) Infrastructure

    Cloud / Datacenter / Edge / Embedded Open Foundation Models Bigscience (BLOOM), OpenAI (GPT-2) Model Hubs Huggingface, Replicate Closed Foundation Models via APIs OpenAI (GPT-3, GPT-4) Generative AI Apps GitHub CoPilot Foundation Models Customization Knowledge Destillation, Finetuning https://unsplash.com/de/fotos/2987j1bsfxo
  5. https://unsplash.com/de/fotos/실험실-플라스크를-들고-있는-사람-H9t723yPjYI LoRA Paper - Table 4, Page 8 / Hu

    ETAL: LoRA: Low-Rank Adaptation of Large Language Models. (arXiv:2106.09685v2) Method # Params (Training) WikiSQL Acc. MLNI-M Acc. SAMSum FineTuning 175 B 73.8 89.5 52.0/28.0/44.5 LoRA 5 M 73.4 91.7 53.8/29.8/45.9 GPT-3 175 B = *
  6. https://unsplash.com/de/fotos/rote-farbe-auf-weissem-pflaster-JqWvPQZXhIc QLORA can be seen as an equalizing factor that

    helps to close the resource gap between large corporations and small teams with consumer GPUs democratizing