Designing local Generative AI inference with AWS IoT Greengrass | AWS re:Invent 2025

© 2025, Amazon Web Services, Inc. or its affiliates. All
rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. D E V 3 1 6 Kohei “Max” MATSUSHITA Designing local Generative AI inference with AWS IoT Greengrass He / Him Tech. Evangelist | An AWS Hero Soracom, Inc.

rights reserved. • Real-world environments ― Physical AI • Local vs Cloud — Trade-offs and Design • [DEMO] Teleoperation Latency comparison • Operating and Updating Local AI model w/ AWS IoT Greengrass • [Demo] Model Update at the Local • Designing Sustainable Physical AI Agenda

rights reserved. Physical AI enables machines to sense, decide, and physically interact with real-world environments. Local(Edge) AI and Cloud AI: Implementation layers — where inference and updates happen. Physical AI Physical AI The concept of empowering real-world with AI Edge AI Cloud AI The architecture and placement of AI inference Inference where data is produced Inference where data is collected

rights reserved. Responsiveness Autonomy Collaboration AI can act quickly when it sees or senses something. It helps machines and places react fast to real-world changes. AI lets machines think and act by themselves. They are no longer just tools — they can learn and create new value. AI understands what people want and works together with us. People and machines can now reach goals as one team. Impacts of Physical AI

rights reserved. LINKWIZ (Japan) automates robot teaching by combining industrial robots with 3D scanners. The system recognizes part positions automatically — tasks that were once aligned by hand. https://linkwiz.co.jp/en/ Case Study: Physical AI in Action

rights reserved. In the robotics industry, “teaching” means showing a robot how to do tasks with human guidance. This process allows the robot to repeat those tasks automatically during operation. Teaching in Robotics

rights reserved. Rule-based control works great — as long as the world stays predictable. But in the real world, things change. And that’s where it starts to fall behind. Limits of Rule- Based Control

rights reserved. A I | R E A L - W O R L D While LLMs understand and generate language, VLAs perceive the world through sensors — turning understanding into motion. They are a key enabler of Physical AI, where intelligence works not only in cyberspace, but in the real world as well. A model specialized in understanding the world through vision and language is called a VLM (Vision-Language Model). Extends LLMs beyond text — connecting perception, reasoning, and physical interaction. VLA Vision-Language-Action model

rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Camera is placed here (off screen) • Raspberry Pi 5 (single-board computer) with UVC camera • SO-101 (6-axis robotic arm) • LeRobot (open-source framework for robot learning) • ACT (imitation learning algorithm) Using a VLA model trained to detect a black-object

rights reserved. Roles of LLM and VLA Model Type Input Output Example LLM (with Multimodal) Text, Images, Code Text, Reasoning, Multimodal Understanding Amazon Nova, Claude, GPT-OSS VLA (Vision-Language-Action) Text, Images, Sensor / Actuator Data Actions (Movement, Control) RT-2, LeRobot, SmolVLA VLA — a generative model that turns understanding into action in the real world.

rights reserved. Typical Architecture with VLA / VLM • Sensors • Cameras etc … Real-world Inputs • Signals • Actuators etc… Outputs Prompt (text) VLA VLA … Vision-Language-Action model VLM … Vision-Language model VLM perception & reasoning Action Generation

rights reserved. Forms of Inference Architecture — Local and Cloud Local Inference Camera VLA Actuator Cloud Inference Near Low Latency Device Camera VLA Actuator Device Cloud Far High Latency

rights reserved. [DEMO] Teleoperation Latency comparison Leader Arm (input) USB (ttyACM0) Follower Arm (Output) USB (ttyACM1) Raspberry Pi On Local lerobot-teleoperate (py) Leader Arm USB (ttyACM0) Follower Arm USB (ttyACM1) Raspberry Pi Via Cloud lerobot-teleoperate (py) Instance on IaaS US-East socat (ttyVACM0) socat (TCP) socat (tcp to tcp) LTE network Teleoperation using `socat` for TCP tunneling with LTE network socat … A CLI tool for bidirectional data transfer between two data streams.

rights reserved. [DEMO] Teleoperation Latency comparison USB to USB (Directly) $ ping US-East 64 bytes from **: icmp_seq=1 ttl=62 time=132 ms 64 bytes from **: icmp_seq=2 ttl=62 time=129 ms FPS = 60 Hz Full performance achieved On Local Via Cloud FPS = 3-4 Hz 500–600 ms latency observed

rights reserved. R E C A P Decide where to infer based on latency tolerance and network reliability. Human intention breaks beyond 100 ms (Nielsen Norman Group, 1993 / 2014) • Above the border && stable connectivity -> choose Cloud • Below the border || unstable connectivity -> choose Local Latency Borderline — Cloud or Local

rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Even then… what should we consider for Local AI?

rights reserved. Camera + Robotics Safety + Control Systems Privacy + Context Awareness Latency defines behavior. In camera and robotics systems, even small delays change how machines move and react. Keeping inference local ensures smooth, natural motion that stays aligned with intent. Decisions must be local. In safety and control environments, every millisecond counts. Critical reactions like emergency stops or anomaly detection can’t wait for the cloud. Sensitive data should stay on the device. When working with cameras, microphones, or personal sensors, privacy matters. Local inference lets you analyze data where it’s created — without sending it away. When Local AI Matters

rights reserved. We often consider … Which VLA model should we choose ?

rights reserved. T H E A N S W E R I S Updatable The power of local AI isn’t in the model itself — but in its ability to keep evolving. • Models must learn to live with change • Continuous delivery keeps intelligence alive • “Updatable” is the foundation of Physical AI

rights reserved. U P D A T A B L E F O R D E V I C E S brings cloud capabilities to edge devices — enabling secure, scalable updates and deployments for software, firmware, and AI models. Turning HARDware into SOFTware. AWS IoT Greengrass

rights reserved. U P D A T A B L E F O R D E V I C E S • Current version: V2 • Runs as a Java-based agent (GGCore) on the OS • Applications are deployed as “components” • Docker containers can be used as components • Components communicate locally through IPC AWS IoT Greengrass IPC; Inter-Process Communication.

rights reserved. [DEMO] Local VLA Container Update AWS IoT Greengrass Core Raspberry Pi 5 - Raspberry Pi OS Camera Follower Arm /dev/video0 /dev/ttyACM1 Runtime + Model bundled Docker container AWS IoT Greengrass Deployment AWS Cloud Device export by Recipe AWS IoT Greengrass Amazon Elastic Container Registry (Amazon ECR) Docker Container ⚫ Runtime (LeRobot) ⚫ Model Hugging Face White-object recognition model Black-object recognition model docker push docker pull docker build Deploy 1 2 3a 3b

rights reserved. To stay valuable, AI must keep evolving — wherever it runs. AWS IoT Greengrass makes that evolution reliable at the edge. • AI models lose value quickly without updates — where they run doesn’t matter • Continuous model delivery is now a key requirement for Physical AI • AWS IoT Greengrass brings cloud-grade update capability to edge devices • A managed framework makes updates simpler and more reliable than custom builds * Model management from Amazon SageMaker AI Edge Manager is now unified under AWS IoT Greengrass. Why Update Matters I N S I G H T

rights reserved. Connectivity Needs high-speed network ANYWHERE. How do we ensure it? File Size The model is HUGE. How can we distribute it robustly? Challenges to Make AI Updatable

rights reserved. The Foundation Model is HUGE for IoT devices How should I deploy within the limitations?

rights reserved. Methods for Model deploy Using AWS IoT Greengrass A) As Artifact ― Basically Store in Amazon Simple Storage Service (S3) as an AWS IoT Greengrass artifact. Automatically downloaded during deployment. Limitation is ~2GB. B) Bundled in Container Embed the model inside the Docker image. Simplifies deployment, but increases image size. Amazon Elastic Container Registry(ECR) limitations apply. C) External ― Flexible Host the model outside AWS IoT Greengrass components. (e.g., in Hugging Face repository). Provides full flexibility, but requires custom access control and download logic.

rights reserved. E X A M P L E Embed the model inside the Docker image with build. It’s very simple. Bundled in Container Dockerfile

rights reserved. Building sustainable Physical AI isn’t about where the inference runs — but about how it keeps evolving. • Local AI when instant response is required (around the 100 ms border) • Cloud AI when latency is tolerable — simplifying devices and reducing cost • But more importantly, sustainability comes from continuous updates • AWS IoT Greengrass makes AI “Updatable” — keeping intelligence aligned with the real-world Physical AI becomes sustainable when it can keep evolving — wherever it lives. Conclusion: Designing Sustainable Physical AI

rights reserved. Please complete the session survey in the mobile app © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you ! Kohei “Max” MATSUSHITA He / Him Tech. Evangelist | An AWS Hero Soracom, Inc.

Designing local Generative AI inference with AW...

Designing local Generative AI inference with AWS IoT Greengrass | AWS re:Invent 2025

More Decks by SORACOM（ソラコム）

Other Decks in Technology

Featured

Transcript