지속적인 AI를 위한 모델 추론 최적화 및 경량화

지속적인 AI를 위한 모델 추론 최적화 및 경량화 노타 김태호
CTO

Speaker Introduction Senior Research Scientist, KAIST Institute Research Intern, Universite
de Montreal (P.I. Yoshua Bengio) Research Intern, CUHK (P.I. Xiaogang Wang) CTO / Co-Founder, Nota B.S. Bio and Brain Engineering, KAIST M.S. Electrical Engineering, KAIST 08' 13' 15' 15' 17' Ph.D. (ABD) Electrical Engineering, KAIST 15' CEO / Co-Founder, Nota 14' 20'

7 © 2025. Nota Inc. Growing Divide Between Hardware and
Software Software-driven optimization is essential to close the gap Transformer GPT-1 BERT GPT-2 Megatron LM Microsoft T-NLG GPT-3 GShard Switch Transformer Megatron-Turing P100 (12GB) TPUv2 (16GB) V100 (32GB) TPUv3 (32GB) A100 (40GB) A100-80 (80GB) H100 (80GB) 0 0 1 10 100 1000 10000 2015 2016 2017 2018 2019 2021 2022 2023 매개변수 수 (단위: 십억) Bridge the gap with NPU and SW AI Model Size : 410x / 2 Yrs HW Capacity : 2x / 2 Yrs Source: AI and Memory Wall (2024)

8 © 2025. Nota Inc. Hardware for Training and Inference
학습용 AI칩은 한 종류만 쓰이지만 추론용 AI칩은 매우 많다. Training Inference

9 © 2025. Nota Inc. Can we deploy and optimize
any models on any devices? No! What’s the problems? Inference Models

10 © 2025. Nota Inc. Things Don’t Always Go As
Planned AI Model Target Device

11 © 2025. Nota Inc. Problems: What’s happening on the
journey? Diverse model conversion options Out of Memory Operation not supported Latency is too high Various types of SDKs AI Model Target Hardware A Target Hardware B Target Hardware C

13 © 2025. Nota Inc. Series C ($43M in total)
AI Optimization SDK for Samsung NPU Supports 35+ AI Devices Founded (Spun out from KAIST) Released NetsPresso® SDK Brief History 2024 2015 2023 Strategic Financial Location / Team Company Overview Snapshot EU Subsidiary Berlin, Germany Headquarter Daejeon, S. Korea R&D Center Seoul, S. Korea US Subsidiary, Sunnyvale, US R&D 69% BD Ops In progress for IPO to KOSDAQ (Q4 2025) CB Insights Global AI 100 A long-term contract with a top-tier mobile AP vendor 2025 Korean MIT

14 © 2025. Nota Inc. Nota AI in the News
2025 AI 100 by CB Insights • Recognized for its cutting- edge AI technology and transformative impact. • Being named by CB Insights as one of the top 100 most promising private AI companies in 2025 is a significant achievement and a strong indicator of its innovation and potential in the accelerated computing and hardware space, particularly within AI infrastructure at the edge.

15 © 2025. Nota Inc. NetsPresso®: One-Stop-Shop for AI Development
NetsPresso Bridges the Gap Between Diverse Hardware and AI Models AI HWs (IP, SoC) AI SW Stack Industry SDV Mobile Surveillance Home IoT Task/Model/ Application Mobility Computer Vision Speech/Audio LLMs GenAI

16 © 2025. Nota Inc. Solution Nota AI Business NetsPresso®
Accelerates Edge AI Development Process Platform • Type: Traffic accident • Percentage: 80% • Emergency Level: 4 (Serious) • Possibility of second accident: 20% • Expected level of traffic congestion: 5 (Very serious) • Immediate action: Slow down, move over, and give right of way to emergency vehicles. • Variable message sign: "Accident ahead, slow down and be cautious." • Summary: A traffic accident occurred on the road, causing traffic congestion and the need for immediate attention to clear the incident and ensure road safety. Accident Alert Type Message …

18 © 2025. Nota Inc. Data Collection Training Optimization Deployment
Testing NetsPresso is an AI model optimization solution that considers hardware specifications. It provides a modular pipeline that supports the entire AI lifecycle, from model development and device optimization to testing. AI model optimization solution supporting the AI lifecycle process Automation Modular Python Package Seamless integration with existing pipelines Intuitive GUI Fast and easy AI model deployment

19 © 2025. Nota Inc. Dataset Deployment Module AI Model
Optimization Solution Considering Hardware Specifications Model zoo Work flow 1 Model Development Trainer Compressor Work flow 2 Device Optimization Quantizer IR Converter Graph Optimization Work flow 3 Testing on Real Devices Compile Benchmarker Device farm

20 © 2025. Nota Inc. Dataset Model zoo Trainer Compressor
Workflow 1: Fast and Efficient AI Model Development Model Development Compressed model Expanded Compatible Model Range Quickly test the latest SOTA models. Automated Model Training Find optimal hyperparameters and reduce training time. Model Compression Reduce computational load and shrink model size using Pruning and Filter Decomposition.

21 © 2025. Nota Inc. 01. DarkNet-based - CSPDarkNet 02.
Efficient CNNs - MobileNetV3 - MobileNetV4 - MixNet Support for Various AI Models and Automated Training List of models compatible with NetsPresso Trainer * Some models are optimized for NetsPresso Trainer. For a full list, refer to the documentation. Automated model training and visualization of training experiments 04. Hybrid Models - MobileViT 03. Transformer-based - ViT (Vision Transformer) - MixTransformer - EfficientFormer NetsPresso Trainer (GUI)

22 © 2025. Nota Inc. Comparison of Model Training and
Compression Results ▪ Measured on QCS6490 ▪ Input Resolution: 224 x 224 input shape ▪ Latency reduced by 80.98% ▪ Accepted at ECCV 2024: https://arxiv.org/abs/2404.11630 * The smaller the value, the faster the inference speed ▪ Measured on Snapdragon Gen3 ▪ Input Resolution: 512 x 512 ▪ Latency reduced by 81.59% ▪ All operations run on the NPU AI Acceleration Benefits Using NetsPresso 371.7 70.7 -80.98% Image Classification (DeiT) 101.6 18.7 -81.59% Semantic Segmentation (SegFormer-b0)

23 © 2025. Nota Inc. model Quantize IR Convert Graph
Optimize Workflow 2: Device Optimization for Hardware-Specific AI Device Optimization Optimized model Model Operator HW Optimization Modify model operators that are incompatible with hardware to ensure smooth AI model execution. Improved Computational Speed Compress models without performance degradation to enhance computational speed Model Compression Reduce model size, memory usage, and power consumption.

24 © 2025. Nota Inc. • Layer-wise profiling on target
devices. • Supports Full Int8 and Mixed Precision Quantization. • Implements Automatic Mixed Precision Quantization, considering the trade-off between accuracy loss and inference speed. Applies quantization with an optimal balance Performance Measurement on Real Devices Identifies bottleneck layers through visualization Achieves an optimal balance through layer visualization Scheduled for Release: End of May 2025 NetsPresso Optimizer(GUI)

26 © 2025. Nota Inc. model Compile Benchmark Device Farm
Workflow 3: Testing on Real Devices Testing on Real Devices Validation & Insight Cross-HW Performance Analysis Measure and compare performance on different hardware. Real-World Testing Execute AI models on real devices in a Device Farm. Test Automation Quickly and easily verify model compatibility and performance on target devices.

27 © 2025. Nota Inc. Seamless Deployment from AI Model
Conversion to Testing https://launchx.netspresso.ai/ NetsPresso LaunchX(GUI) Automatic Conversion Measure model performance Real Device

30 © 2025. Nota Inc. 노타 최적화 성공 사례 생성형
AI 불가능 생성형 AI 4.47 ms 최적화 이미지 생성 1885.8 ms 이미지 생성 738.5 ms 2.4x 가속 회사 Without Nota AI With Nota AI Transformer Model 101.6 ms 5.4x 가속 Transformer Model 18.7 ms Transformer Model 3.93 ms 5.4x 가속 Transformer Model 0.73 ms ViT Model 불가능 최적화 ViT Model 가능 Transformer Model 2.09 ms 5.2x 가속 Transformer Model 0.40 ms * Measured on Qualcomm Snapdragon 8 Gen 3, Arm Ethos-U55, Renesas RZ/V2L, and NXP i.MX 93 respectively.

지속적인 AI를 위한 모델 추론 최적화 및 경량화

지속적인 AI를 위한 모델 추론 최적화 및 경량화

Lablup Inc.

More Decks by Lablup Inc.

Featured

Transcript

지속적인 AI를 위한 모델 추론 최적화 및 경량화 노타 김태호

Speaker Introduction Senior Research Scientist, KAIST Institute Research Intern, Universite

4 © 2025. Nota Inc. 위대한 시기

5 © 2025. Nota Inc. 증가하는 학습비용, 감소하는 추론비용

6 © 2025. Nota Inc. 위태한 시기

7 © 2025. Nota Inc. Growing Divide Between Hardware and

8 © 2025. Nota Inc. Hardware for Training and Inference

9 © 2025. Nota Inc. Can we deploy and optimize

10 © 2025. Nota Inc. Things Don’t Always Go As

11 © 2025. Nota Inc. Problems: What’s happening on the

© 2025. Nota Inc. Nota’s Story Company Intro

13 © 2025. Nota Inc. Series C ($43M in total)

14 © 2025. Nota Inc. Nota AI in the News

15 © 2025. Nota Inc. NetsPresso®: One-Stop-Shop for AI Development

16 © 2025. Nota Inc. Solution Nota AI Business NetsPresso®

© 2025. Nota Inc. Nota’s Story NetsPresso® Platform

18 © 2025. Nota Inc. Data Collection Training Optimization Deployment

19 © 2025. Nota Inc. Dataset Deployment Module AI Model

20 © 2025. Nota Inc. Dataset Model zoo Trainer Compressor

21 © 2025. Nota Inc. 01. DarkNet-based - CSPDarkNet 02.

22 © 2025. Nota Inc. Comparison of Model Training and

23 © 2025. Nota Inc. model Quantize IR Convert Graph

24 © 2025. Nota Inc. • Layer-wise profiling on target

25 © 2025. Nota Inc.

26 © 2025. Nota Inc. model Compile Benchmark Device Farm

27 © 2025. Nota Inc. Seamless Deployment from AI Model

28 © 2025. Nota Inc.

29 © 2025. Nota Inc. Device Farm

30 © 2025. Nota Inc. 노타 최적화 성공 사례 생성형

31 © 2025. Nota Inc. NetsPresso® Platform Ecosystem NetsPresso® Positioned