Gemma for Your Device - IO Extended Inchoen 2025

Gemma for Your Device Sungmin Han

Google I/O Extended 25 Proprietary & Confidential • Edge AI
Index • Gemma 3 • Demonstration • Conclusion

Edge AI #1

Google I/O Extended 25 Proprietary & Confidential Edge AI 엣지(Edge)
AI는 중앙 집적화된 처리 서버 없이 데이터가 발생하는 가까운 지점에서 AI를 직접 활용 AI

Google I/O Extended 25 Proprietary & Confidential “Your Data, Your
Rule”

Google I/O Extended 25 Proprietary & Confidential Why Edge AI
저비용 예측 가능한 비용 활용 및 컴퓨팅 효율화된 AI 모델을 사용으로 인한 비용 효율화 개인정보 보호 데이터가 발생한 지점 외에 데이터 전파가 일어나지 않 으므로 개인정보를 보호할 수 있음 투명성 가중치가 공개된 모델 활용 으로 인해 모델의 특성을 투명하게 확인할 수 있으며 , 필요에 따라 최신 미세조 정 및 정렬(Align) 기능 활 용 가능

Cloud-based Model API Server Model (Hidden) Private Network Private Connection
With Token

Open-weights Model Your Server Model Your Network Customized Connection Download
& Training

최신 스마트폰 요구 메모리 12GB (+4GB) 12GB (+4GB) 12GB Galaxy
S25 iPhone 17 Pixel 10

단말기 요구 메모리 8GB 8GB Raspberry Pi 5 With Hailo
NVIDIA Jetson Orin Nano Host-memory (8, 8L) System-memory (10H) 128-bit LPDDR5

On-device Model (Edge-serving) GPU / NPU Optimized Model Your Device

Gemma 3 #2

Weights가 공개된 오픈 모델. 자유로운 연구 및 상업 활용과, 자체
서버에 구동 가능한 옵션. 특히 경량 모델로 높은 품질을 제공받을 수 있어, Edge-serving에 유리. 최신 모델의 경우 Multimodality (Vision) 제공. Gemma Gemma: 경량 오픈 모델로 유연한 추론 운 영

Google I/O Extended 25 Proprietary & Confidential

Gemma Architecture https://developers.googleblog.com/en/gemma-explained-overview-gemma-model-family-architectures/ INPUT Embedding Linear Softmax OUTPUT Multi-head Attention
Add & Norm Feed Forward Add & Norm Transformer 디코더 기반 아키텍 처 경량 모델에 특화 Feed Forward는 MLP로 구 현 Nx

Gemma 3 Sizes (int-4 / int-8) 1B 529MB 4B 2.56GB
12B 7.55GB Video source: Google Deepmind https://deepmind.google/discover/blog/gemma-scope-helping-the-safety-community-shed-light-on-the-inner-workings-of-language-models/ 27B 27.05GB (int8)

model_id = "google/gemma-3-1b-it" quantization_config = BitsAndBytesConfig(load_in_8bit=True) model = Gemma3ForCausalLM.from_pretrained( model_id,
quantization_config=quantization_config, token=HF_TOKEN ).eval() tokenizer = AutoTokenizer.from_pretrained(model_id) messages = [ [ { "role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."},] }, { "role": "user", "content": [{"type": "text", "text": "Write a poem on Hugging Face, the company"},] }, ], ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device).to(torch.bfloat16) with torch.inference_mode(): outputs = model.generate(**inputs, max_new_tokens=64) outputs = tokenizer.batch_decode(outputs)

1B 가장 가벼운 모델 소규모 애플리케이션에 적 합한 텍스트 모델
4B 성능과 유연성 균형을 바탕 으로 한 모델 Multimodality 지원 12B 복잡한 작업을 위해 설계된 강력한 언어 모델 Multimodality 지원 27B 개선된 이해력, 정교한 응용 분야에 적합한 모델 Multimodality 지원

Source: Google Deepmind https://deepmind.google/models/gemma/gemma-3/

Source: Google Deepmind https://deepmind.google/models/gemma/gemma-3/ MMLU-Pro

Source: Google Deepmind https://deepmind.google/models/gemma/gemma-3/ LiveCodeBench

Source: Google Deepmind https://deepmind.google/models/gemma/gemma-3/ MATH

Gemma 3n

For your device On-device 경험을 위해 고안된 Gemma 3n은 휴대폰,
태블릿, 노트북에서 최고의 성능을 발 휘할 수 있도록 만들어진 Multimodal LLM 모델 Per-Layer Embeddings(PLE)와 MatFormer를 통해 구성된 Gemma 3n은 적은 용량의 Vram의 가속기에서도 고품질의 결과를 얻어낼 수 있도 록 설계

E2B 3.14GB | int4 가장 가벼운 모델 소규모 애플리케이션에 적
합한 텍스트 모델 E4B 4.41 GB | int4 성능과 유연성 균형을 바탕 으로 한 모델 Multimodality 지원

Source: Google Developers Blog https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/

Text Image Video Audio Multimodal Capabilities

Demonstration #3

Gemma App Demo Repository: https://github.com/KennethanCeyer/gemma-app Gemma 3 Demo

Use-case 1: Browser Gemma Local Computer

Model Conversion for Edge serving Model LiteRT Model Package LiteRT
Conversion MediaPipe Packaging MediaPipe Runtime MediaPipe Package (.bin / .task)

Gemma 3n Demo

Gemma 3n Audio Demo

Conclusion #4

Google I/O Extended 25 Proprietary & Confidential Conclusion 저비용, 고효율,
안전한 Edge AI 경량 모델에서 프론티어급 모델 활 용 Hybrid 모델 활용 고려 필요 비동기 UX 고민 필 요

Thank you! Proprietary & Confidential Google Events + Experiences Proprietary
& Confidential [email protected] linked.in/sungmin.han

Gemma for Your Device - IO Extended Inchoen 2025

Gemma for Your Device - IO Extended Inchoen 2025

More Decks by Sungmin Han

Other Decks in Science

Featured

Transcript