Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

대AI시대: Uncharted AI 신정규 / 래블업 주식회사

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

“Make AI Accessible”

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

 A smallest reference NVIDIA SuperPod (1 SU) config requires: – 12 switches – >500 cables – 400 Gbps Infiniband NDR  Dominant traffic patterns – Cloud: North-South – Kubernetes: Hybrid (NS+EW) – HPC & AI: East-West – End-to-end MLOps: Hybrid (NS+EW)  Software stack: NVIDIA NCCL

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Compute Network Storage Energy

Slide 16

Slide 16 text

“Make AI Scalable”

Slide 17

Slide 17 text

1500+ GPUs on Single Site 100+ and growing! 13,000 Enterprise GPUs Managed 2M Environment Downloads 100+ Enterprise Accounts in 3 Continents

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Open- source Backend.AI Core Lablup Enterprise

Slide 21

Slide 21 text

open-source

Slide 22

Slide 22 text

aiodocker aiomonitor aiotools raftify callosum Backend.AI

Slide 23

Slide 23 text

raftify @ 2024 Contribution Academy

Slide 24

Slide 24 text

raftify aiodocker aiomonitor aiotools callosum Backend.AI DPDK FreeBSD OpenStack NBA tensorflow numpy vllm openblas python bitsandbytes ML/AI zeromq pyzmq aiohttp googletest Messaging Infrastructure / OS Maintenance Contributing

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

TUI interface for easy installation  Automates package-based installation  Supports auto-install meta- configuration  Enables offline installation  Allows anyone to effortlessly set up a Backend.AI cluster Backend.AI CLI Installer https://www.backend.ai/ko/download

Slide 27

Slide 27 text

TUI interface for easy installation  Automates package-based installation  Supports auto-install meta- configuration  Enables offline installation  Allows anyone to effortlessly set up a Backend.AI cluster Backend.AI CLI Installer https://www.backend.ai/ko/download

Slide 28

Slide 28 text

Simplify Backend.AI development  Easily set up and manage complex Backend.AI dev environments  Enables seamless customization and hacking of Backend.AI  Empowers everyone to own and modify their AI infrastructure bndev https://bnd.ai/bndev

Slide 29

Slide 29 text

Backend.AI Core Open-source Lablup Enterprise

Slide 30

Slide 30 text

core

Slide 31

Slide 31 text

ION Model recipe for Backend.AI Inference NIM As a first-class Inference runtime Intel Gaudi Gaudi 2/3 integration Agent Selector Plugin Fine-grained Job batching OIDC Unified authentication Cillum Integration eBPF Model Store GraceHopper GH200 / GB200 support 24.09 Core Priority Scheduler GPUDirect Storage Speedup Legacy+Cloud Hybrid clustering

Slide 32

Slide 32 text

Backend.AI Agent (Docker backend) Docker (moby) Native Accelerator Plugin(s) Backend.AI Agent (K8s backend v2) K8s API server kubelet Backend.AI Manager Standalone K8s cluster Controller Node Computing Node(s) Computing Node(s) Proxied comptue node Proxied accelerator device Vendor-provided device operators kubelet Vendor-provided device operators kubelet Vendor-provided device operators Proxied comptue node Proxied accelerator device Proxied comptue node Proxied accelerator device ✓ Job queue managed by Backend.AI Sokovan ✓ User/project resource quota ✓Full accelerator support ✓Full GPU virtualization support for CUDA (proprietary enterprise version only!) ✓Full storage folder (vfolder) support ✓Vendor-provided accelerator support ✓Each node treated as individual virtual nodes △ Limited storage folder (vfolder) support : Components under development : Existing components : External components Next-gen Sokovan: Multi-engine architecture Kubernetes as first-class citizen

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

enterprise

Slide 43

Slide 43 text

______ made easy

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

Scaling made easy

Slide 46

Slide 46 text

 Project-based pipeline management  Various templates – Foundation model training – Fine-tuning – Automatic deploy  Custom task nodes with Partners FastTrack 2 https://cloud-mlops.backend.ai

Slide 47

Slide 47 text

 ‘Finetuning via Prompt’ – NO DATA required! – Uses NemoTron-4-340B  Starts from Llama3.1, Gemma2  Built on FastTrack  Runs on Backend.AI Cloud The waitlist is now open! finetun.ing https://www.finetun.ing

Slide 48

Slide 48 text

GUI-based cluster design tool  Tailors cluster configuration to desired scale and performance  Automatically calculates effective performance, required hardware, and estimated costs  Ideal for validating optimal architecture before actual deployment Design your own cluster at our demo booth! Backend.AI Cluster designer

Slide 49

Slide 49 text

Interactive cluster management interface  Enables complex cluster operations through terminal chatting  Utilizes Gemma-based fine-tuned models to accurately understand user intentions  Supports on-premises conversational fine-tuning pipeline construction  Integrates packages like TorchTune, LangGraph, and LangChain Backend.AI Helmsman Agent

Slide 50

Slide 50 text

Acceleration made easy

Slide 51

Slide 51 text

X64 High performance, widely used in PC & HPC Armv9 Energy-efficient, dominates mobile devices RISC-V Open-source, highly customizable

Slide 52

Slide 52 text

Intel Gaudi Gaudi 2/3 GraceHopper GH200 / GB200 NVIDIA Ampere A10 / A40 / A100 NVIDIA Blackwell x86-64 based node NVIDIA Hopper H100 / H100 NVL / H200 AMD Instinct MI250 / MI300 Rebellion ATOM / ATOM+ FuriosaAI Warboy / RNGD Google TPU TPU v3/4/5 Amazon Inferentia / Trainium v2 NVIDIA Turing Titan RTX / RTX 8000 NVIDIA Volta V100 X64 High performance, widely used in PC & HPC Armv9 Energy-efficient, dominates mobile devices RISC-V Open-source, highly customizable Jetson TX / Xavier / AGX Orin Coral EdgeTPU Groq GroqCard GraphCore IPU / BOW AMD RDNA RDNA2

Slide 53

Slide 53 text

Inference made easy

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

PALI: Performant AI Launcher for Inference

Slide 56

Slide 56 text

ION Model recipe for Backend.AI Inference NIM As a first-class Inference runtime Hugging Face model via model URL PALI: Performant AI Launcher for Inference Model Store Open Models Lablup GPU Virtualizer Backend.AI Model Player Partner Models

Slide 57

Slide 57 text

PALI Square: Dedicated hardware infrastructure appliance for PALI  Easily scalable by connecting multiple PALI-equipped appliances  Optimized architecture for AI workloads – Delivers high performance and low latency PALI Performant AI Launcher for Inference Intel Gaudi Gaudi 2/3 integration GraceHopper GH200 / GB200 A6000 / L40 x86-64 based node Model Store Per architecture / chip PALI2: Scalable AI H/W Infrastructure

Slide 58

Slide 58 text

PALI2: Scalable AI H/W Infrastructure Upcoming releases  PALI based on NVIDIA GH200 reference platform (Korea) – Pre-orders in October, sales from Q4  Instant.AI by Kyocera Mirai Envision (Japan) – Launching on October 1st, 2024  PALI^2 appliances for US and European markets – Expected as early as Q4 this year https://www.kcme.jp/product/instant-ai-server/

Slide 59

Slide 59 text

Comprehensive language model platform  Incorporates PALI, FastTrack, Talkativot, and Helmsman  Simplifies large-scale language model deployment and operation – Provides ready-to-use inference and fine- tuning settings  Talkativot enables easy creation of customized chatbots PALANG: PALI for LANGuage models

Slide 60

Slide 60 text

Comprehensive language model platform  Incorporates PALI, FastTrack, Talkativot, and Helmsman  Simplifies large-scale language model deployment and operation – Provides ready-to-use inference and fine- tuning settings  Talkativot enables easy creation of customized chatbots PALANG: PALI for LANGuage models PALI Performant AI Launcher for Inference GraceHopper GH200 / GB200 A6000 / L40 x86-64 based node Model Store Per architecture / chip FastTrack MLOps 2 Helmsman Conversional Backend.AI management UX Talkativot Chatbot UI for Language models LLMs Model weights H100 / B200 x86-64 based node

Slide 61

Slide 61 text

Gemma2-based language model family  Easily customizable through Finetun.ing  Versatile applications – Backend model for Helmsman – Enterprise-level agents “G” : one more thing

Slide 62

Slide 62 text

Scaling Acceleration Inference made easy

Slide 63

Slide 63 text

AI made easy

Slide 64

Slide 64 text

No content

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

CORE 24.09 / 25.03 From cell to society ION Open Model Recpies for AI Inference BNDEV DevStack manager FastTrack MLOps 2 CLI Installer Interactive terminal UI Site Designer Helmsman Conversional Backend.AI management UX PALI Performant AI Launcher for Inference PALI2 PALI Appliance PALANG Language model-oriented AI Inference platform GARNET Gemma2-based SLM Next-gen Sokovan Also with Kubernetes finetun.ing Model tuning by discussion, without data WebUI 3 Neo Rewritten UI/UX

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

No content

Slide 71

Slide 71 text

No content

Slide 72

Slide 72 text

No content

Slide 73

Slide 73 text

No content

Slide 74

Slide 74 text

No content

Slide 75

Slide 75 text

No content

Slide 76

Slide 76 text

No content