Slide 1

Slide 1 text

AI Nemo Framework Mana Murakami, Senior Solution Architect, NVIDIA | Nov 2nd 2023

Slide 2

Slide 2 text

Agenda • AI • NVIDIA Nemo Framework •

Slide 3

Slide 3 text

• AI

Slide 4

Slide 4 text

AI Wikipedia NLLB-200 CODEX MegaMolBART e-Diffi GPT-3 How has NVIDIA contributed to acceleration of AI? NVIDIA has been a pioneer in the field of AI since the very beginning. Our GPU platform has enabled the rapid development of AI – from the training of neural networks, to inference in the data center, on-device AI in the car and in the cloud, and the deployment of AI to tackle challenging problems like conversational AI and translation. NVIDIA's GPU-accelerated computing platform is the engine of AI – it is the most important computing platform of our time. **Generated using NVIDIA NeMo service 530B

Slide 5

Slide 5 text

Transformer LLM 0 2000 4000 6000 8000 10000 12000 14000 2017 2018 2019 2020 2021 2022 Transformer and LLM Research Papers Per Year Dall-E 2 ChatGPT NLLB-200 TRANSFORMER BERT GPT-3 CODEX MegaMolBART M Parameters year

Slide 6

Slide 6 text

Pre-Training • Fine-Tuning ( SFT や RLHF ) • • Inference • LLM STEP.3 STEP.2 STEP.1

Slide 7

Slide 7 text

Pre-Training • • Fine-Tuning ( SFT や RLHF ) • • • • checkpoint Fine-Tuning Inference • • LLM STEP.3 STEP.2 STEP.1

Slide 8

Slide 8 text

NVIDIA Nemo

Slide 9

Slide 9 text

カスタムLLM開発の為のNVIDIA NeMo P-tuning, SFT, Adapters, RLHF, AliBi AI NeMo Framework NVIDIA AI Enterprise … NVIDIA DGX Cloud

Slide 10

Slide 10 text

NeMo Framework AI Training Inference ✓ 3D Parallelism : Data, Tensor & Pipeline, Sequence Parallelisms, Selective Activation Recomputation ✓ LLM : Adapters, RLHF, AliBi, SFT ✓ ✓ : SLURM, Nephele, Kubernetes – K8s ( ) ✓ ✓ ✓ ✓ LLMs: BERT >100B, T5-MoE, T5, GPT-3, Inform ✓ Multi-modal: StableDiffusion, ViT, ViT-CLIP, Instruct- Pix2Pix, Imagen NVIDIA DGX SuperPODs NVIDIA DGX Cloud NVIDIA DGX Systems https://developer.nvidia.com/nemo-framework ※1 ※1 ※1 inform, Multi-modal

Slide 11

Slide 11 text

Nemo Toolkit NeMo Framework AI Nemo Training コンテナ Nemo Inference コンテナ TensorRT-LLM NGC Pytorch NVIDIA GPU PyTorch 3D AI LLM GPU Triton Inference Server PyTorch Lightning Megatron Core Nemo Megatron Launcher

Slide 12

Slide 12 text

Pre-Training • Fine-Tuning ( SFT や RLHF ) • • Inference • LLM STEP.3 STEP.2 STEP.1

Slide 13

Slide 13 text

Tensor & Pipeline Parallelism Sequence Parallelism Selective Activation Recomputation AI GPU 0 Time Feature Traditionally, LLMs (>175GB) every activation is recomputed GPU 1 GPU 2 . . . . . . . . . . . . Saved Recomputed Batch

Slide 14

Slide 14 text

3D (NVIDIA Megatron) Efficient Large-Scale Language Model Training on GPU Clusters, Deepak Narayanan et al., 2021

Slide 15

Slide 15 text

3D (NVIDIA Megatron) Efficient Large-Scale Language Model Training on GPU Clusters, Deepak Narayanan et al., 2021 DGX A100 NVLINK(gen3): 300GB/s IB(HDR): 25GB/s

Slide 16

Slide 16 text

FSDP (Fully Sharded Data Parallel), ZeRO-3 PyTorch DeepSpeed FSDP Data Parallelism Model Parallelism • 3D Parallelismとの違いは? N個のGPUに分割する場合、入力データがData Parallelism N N個に分割される https://www.deepspeed.ai/2021/03/07/zero3-offload.html Data parallel ZeRO-3

Slide 17

Slide 17 text

3D Parallelism vs. FSDP • FSDP • • • FSDP • • • • • Activations •

Slide 18

Slide 18 text

3D Parallelism vs. FSDP • FSDP • • • FSDP • • • • • Activations • Parallelism

Slide 19

Slide 19 text

Nemo Frameworkの学習性能 (300億トークンの学習) Time to train 300B tokens in DAYS (A100) – BF16 3072 GPUs (384 DGX A100) 1600 GPUs (200 DGX A100) 800 GPUs (100 DGX A100) 480 GPUs (60 DGX A100) 160 GPUs (20 DGX A100) 64 GPUs (8 DGX A100) GPT-3: 2B 0.2 0.3 0.6 1.1 3.2 8.0 GPT-3: 5B 0.4 0.8 1.6 2.7 8.0 20.0 GPT-3: 20B 1.7 3.2 6.4 10.7 32.0 79.9 GPT-3: 43B 3.6 6.9 13.7 22.9 68.7 171.7 Pre-Training using Nemo Framework 𝑠𝑒𝑐𝑜𝑛𝑑𝑠 ≈ 8𝑃𝑇 𝑛𝑋 P = Model Parameters T = Tokens n = GPU Count X = tFLOPS per GPU A100 theoretical peak ~312, measured avg 163

Slide 20

Slide 20 text

Pre-Training • Fine-Tuning ( SFT や RLHF ) • • Inference • LLM STEP.3 STEP.2 STEP.1

Slide 21

Slide 21 text

生成AI向け学習済モデルの公開 NVIDIA Nemo Framework • Hugging Face NGC • from nemo.collections import nlp as nemo_nlp from nemo.utils.exp_manager import exp_manager import pytorch_lightning as pl from omegaconf import OmegaConf #update config setting config = OmegaConf.load("text_classification_config.yaml“) config.model.tokenizer.vocab_file ="vocab.txt“ config.model.dataset.num_classes=2 config.model.train_ds.file_path ="train_nemo_format.tsv'" config.model.validation_ds.file_path ="dev_nemo_format.tsv“ config.model.language_model.pretrained_model_name = "cl- tohoku/bert-base-japanese" trainer = pl.Trainer(**config.trainer) model = nemo_nlp.models.TextClassificationModel(cfg=config.model, trainer=trainer) tohoku/bert-base nemo model

Slide 22

Slide 22 text

Community NVIDIA Nemo Framework • Llama2 HF StarCoder #!/bin/sh git-lfs clone https://huggingface.co/meta-llama/Llama-2-7b-hf python3 /opt/NeMo/scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py \ --in-file=./Llama2-7b-hf/ --out-file=llama2-7b.nemo Llama2-7b-hf [ ] convert-llama2-from-huggingface-format-to-nemo-format

Slide 23

Slide 23 text

Nemo のモデル カスタマイズツール群 PROMPT ENGINEERING PROMPT LEARNING PARAMETER EFFICIENT FINE-TUNING INSTRUCTION TUNING Data, compute & investment Accuracy for specific use-cases • • • • • • • • • • • • • • • • • • • Few-shot learning • Chain-of-thought reasoning System promptin g • Prompt tuning • P-tuning • Adapters • LoRA • IA3 • SFT • RLHF

Slide 24

Slide 24 text

Nemo Framework • INSTRUCTION TUNING • [Playbook] NeMo Framework Supervised fine-tuning (SFT) with Llama2 • [Documentation] Reinforcement Learning from Human Feedback • [Documentation] Instruction Following Taught by Supervised Fine-Tuning (SFT) • [Documentation] Model Fine-Tuning • [Jupyter Notebook] SFT example for Text Classification • PEFT • [Playbook] NeMo Framework PEFT with Llama2 • [Documentation] Generalized PEFT Framework • PEFT Training and Inference for GPT-style Models • PEFT Training and Inference for mT5/T5-style Models • [Jupyter Notebook] Optimize GPT model for Extractive Q&A using LoRA • Prompt Learning • [Documentation] Model Prompt Learning

Slide 25

Slide 25 text

Pre-Training • Fine-Tuning ( SFT や RLHF ) • • Inference • LLM STEP.3 STEP.2 STEP.1

Slide 26

Slide 26 text

TensorRT-LLM: LLM https://github.com/NVIDIA/TensorRT-LLM TensorRT-LLM LLM NVIDIA GPU TensorRT-LLM # define a new activation def silu(input: Tensor) → Tensor: return input * sigmoid(input) #implement models like in DL FWs class BertModel(Module) def init (…) self.layers = ModuleList([…]) def forward (…) hidden = self.embedding(…) for layer in self.layers: hidden_states = layer(hidden) return hidden Numbers are preliminary based on internal evaluation Triton LLM batching LLM GPU- Node NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Slide 27

Slide 27 text

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. TensorRT-LLM TensorRT FasterTransformer • TensorRT-LLM FasterTransformer LLM • pytorch API • API TensorRT Engine • FasterTransformer • TensorRT • TensorRT python API • LLM TensorRT • FasterTranformer TensorRT OpenAI Triton CUTLASS •

Slide 28

Slide 28 text

TensorRT-LLM • • Custom MHAs Inflight-batching paged attention quantized KV cache • TCO • • 5 • energy / inference H100 FP8 w/ IFB in TensorRT-LLM vs A100 FP16 PyTorch

Slide 29

Slide 29 text

その他の機能 Nemo Framework • • • [Documentation] NeMo Data Curator • • • Hyperparameter • [GTC Session] s41904 How to Avoid the Staggering Cost of Training State-of-the-art Large Language Models • • NeMo-Guardrails •

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

まとめ Nemo Framework • Pre-Training Inference LLM • NVIDIA Megatron 3D • HuggingFace checkpoint NVIDIA checkpoint • INSTRUCTION TUNING PEFT PROMPT LEARNING • TensorRT-LLM + Triton Inference Server

Slide 32

Slide 32 text

Language Models Text-to-Image Models Image-to-Image Models GPT T5, mT5, T5-MoE BERT Stable Diffusion v1.5/v2.0 Imagen Vision Transformers CLIP Dreambooth InstructPix2Pix 現在のモデルサポート状況とコンテナの入手方法 Nemo Framework Prompt: A 'sks' dog mecha robot. Instruction: Make it on a beach Download Now - Language Apply Now – Multimodal (Coming Soon!) Now Available!

Slide 33

Slide 33 text

Appendix. • NVIDIA Generative AI Solutions • NVIDIA NeMo Framework • NeMo Guardrails TechBlog • What are Large Language Models? • What Are Large Language Models Used For? • What are Foundation Models? • How To Create A Custom Language Model? • Adapting P-Tuning to Solve Non-English Downstream Tasks • NVIDIA AI Platform Delivers Big Gains for Large Language Models • The King’s Swedish: AI Rewrites the Book in Scandinavia • eBook Asset • No Hang Ups With Hangul: KT Trains Smart Speakers, Customer Call Centers With NVIDIA AI GTC • How to Build Generative AI for Enterprise Use-cases • Leveraging Large Language Models for Generating Content • Power Of Large Language Models: The Current State and Future Potential • Generative AI Demystified • Efficient At-Scale Training and Deployment of Large Language Models – GTC Session • Hyperparameter Tool GTC Session

Slide 34

Slide 34 text

No content