Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Designing local Generative AI inference with AW...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Designing local Generative AI inference with AWS IoT Greengrass | AWS re:Invent 2025

2025年12月開催『AWS re:Invent 2025』で、ソラコム松下(max)が発表した資料です。

Avatar for SORACOM(ソラコム)

SORACOM(ソラコム) PRO

December 03, 2025

More Decks by SORACOM(ソラコム)

Other Decks in Technology

Transcript

  1. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. D E V 3 1 6 Kohei “Max” MATSUSHITA Designing local Generative AI inference with AWS IoT Greengrass He / Him Tech. Evangelist | An AWS Hero Soracom, Inc.
  2. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • Real-world environments ― Physical AI • Local vs Cloud — Trade-offs and Design • [DEMO] Teleoperation Latency comparison • Operating and Updating Local AI model w/ AWS IoT Greengrass • [Demo] Model Update at the Local • Designing Sustainable Physical AI Agenda
  3. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Physical AI enables machines to sense, decide, and physically interact with real-world environments. Local(Edge) AI and Cloud AI: Implementation layers — where inference and updates happen. Physical AI Physical AI The concept of empowering real-world with AI Edge AI Cloud AI The architecture and placement of AI inference Inference where data is produced Inference where data is collected
  4. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Responsiveness Autonomy Collaboration AI can act quickly when it sees or senses something. It helps machines and places react fast to real-world changes. AI lets machines think and act by themselves. They are no longer just tools — they can learn and create new value. AI understands what people want and works together with us. People and machines can now reach goals as one team. Impacts of Physical AI
  5. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. LINKWIZ (Japan) automates robot teaching by combining industrial robots with 3D scanners. The system recognizes part positions automatically — tasks that were once aligned by hand. https://linkwiz.co.jp/en/ Case Study: Physical AI in Action
  6. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. In the robotics industry, “teaching” means showing a robot how to do tasks with human guidance. This process allows the robot to repeat those tasks automatically during operation. Teaching in Robotics
  7. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Rule-based control works great — as long as the world stays predictable. But in the real world, things change. And that’s where it starts to fall behind. Limits of Rule- Based Control
  8. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. A I | R E A L - W O R L D While LLMs understand and generate language, VLAs perceive the world through sensors — turning understanding into motion. They are a key enabler of Physical AI, where intelligence works not only in cyberspace, but in the real world as well. A model specialized in understanding the world through vision and language is called a VLM (Vision-Language Model). Extends LLMs beyond text — connecting perception, reasoning, and physical interaction. VLA Vision-Language-Action model
  9. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Camera is placed here (off screen) • Raspberry Pi 5 (single-board computer) with UVC camera • SO-101 (6-axis robotic arm) • LeRobot (open-source framework for robot learning) • ACT (imitation learning algorithm) Using a VLA model trained to detect a black-object
  10. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Roles of LLM and VLA Model Type Input Output Example LLM (with Multimodal) Text, Images, Code Text, Reasoning, Multimodal Understanding Amazon Nova, Claude, GPT-OSS VLA (Vision-Language-Action) Text, Images, Sensor / Actuator Data Actions (Movement, Control) RT-2, LeRobot, SmolVLA VLA — a generative model that turns understanding into action in the real world.
  11. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Typical Architecture with VLA / VLM • Sensors • Cameras etc … Real-world Inputs • Signals • Actuators etc… Outputs Prompt (text) VLA VLA … Vision-Language-Action model VLM … Vision-Language model VLM perception & reasoning Action Generation
  12. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where Should AI Inference Run?
  13. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Forms of Inference Architecture — Local and Cloud Local Inference Camera VLA Actuator Cloud Inference Near Low Latency Device Camera VLA Actuator Device Cloud Far High Latency
  14. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. [DEMO] “Far ?” Local and Cloud — Latency comparison
  15. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. [DEMO] Teleoperation Latency comparison Leader Arm (input) USB (ttyACM0) Follower Arm (Output) USB (ttyACM1) Raspberry Pi On Local lerobot-teleoperate (py) Leader Arm USB (ttyACM0) Follower Arm USB (ttyACM1) Raspberry Pi Via Cloud lerobot-teleoperate (py) Instance on IaaS US-East socat (ttyVACM0) socat (TCP) socat (tcp to tcp) LTE network Teleoperation using `socat` for TCP tunneling with LTE network socat … A CLI tool for bidirectional data transfer between two data streams.
  16. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. On Local
  17. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Via Cloud
  18. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. [DEMO] Teleoperation Latency comparison USB to USB (Directly) $ ping US-East 64 bytes from **: icmp_seq=1 ttl=62 time=132 ms 64 bytes from **: icmp_seq=2 ttl=62 time=129 ms FPS = 60 Hz Full performance achieved On Local Via Cloud FPS = 3-4 Hz 500–600 ms latency observed
  19. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. R E C A P Decide where to infer based on latency tolerance and network reliability. Human intention breaks beyond 100 ms (Nielsen Norman Group, 1993 / 2014) • Above the border && stable connectivity -> choose Cloud • Below the border || unstable connectivity -> choose Local Latency Borderline — Cloud or Local
  20. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Even then… what should we consider for Local AI?
  21. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Camera + Robotics Safety + Control Systems Privacy + Context Awareness Latency defines behavior. In camera and robotics systems, even small delays change how machines move and react. Keeping inference local ensures smooth, natural motion that stays aligned with intent. Decisions must be local. In safety and control environments, every millisecond counts. Critical reactions like emergency stops or anomaly detection can’t wait for the cloud. Sensitive data should stay on the device. When working with cameras, microphones, or personal sensors, privacy matters. Local inference lets you analyze data where it’s created — without sending it away. When Local AI Matters
  22. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Camera + Robotics Safety + Control Systems Privacy + Context Awareness Latency defines behavior. In camera and robotics systems, even small delays change how machines move and react. Keeping inference local ensures smooth, natural motion that stays aligned with intent. Decisions must be local. In safety and control environments, every millisecond counts. Critical reactions like emergency stops or anomaly detection can’t wait for the cloud. Sensitive data should stay on the device. When working with cameras, microphones, or personal sensors, privacy matters. Local inference lets you analyze data where it’s created — without sending it away. When Local AI Matters
  23. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. We often consider … Which VLA model should we choose ?
  24. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. “ MODEL CHOICE IS CRITICAL
  25. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. T H E A N S W E R I S Updatable The power of local AI isn’t in the model itself — but in its ability to keep evolving. • Models must learn to live with change • Continuous delivery keeps intelligence alive • “Updatable” is the foundation of Physical AI
  26. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. U P D A T A B L E F O R D E V I C E S brings cloud capabilities to edge devices — enabling secure, scalable updates and deployments for software, firmware, and AI models. Turning HARDware into SOFTware. AWS IoT Greengrass
  27. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. U P D A T A B L E F O R D E V I C E S • Current version: V2 • Runs as a Java-based agent (GGCore) on the OS • Applications are deployed as “components” • Docker containers can be used as components • Components communicate locally through IPC AWS IoT Greengrass IPC; Inter-Process Communication.
  28. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. [DEMO] Local VLA Container Update with AWS IoT Greengrass
  29. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. [DEMO] Local VLA Container Update AWS IoT Greengrass Core Raspberry Pi 5 - Raspberry Pi OS Camera Follower Arm /dev/video0 /dev/ttyACM1 Runtime + Model bundled Docker container AWS IoT Greengrass Deployment AWS Cloud Device export by Recipe AWS IoT Greengrass Amazon Elastic Container Registry (Amazon ECR) Docker Container ⚫ Runtime (LeRobot) ⚫ Model Hugging Face White-object recognition model Black-object recognition model docker push docker pull docker build Deploy 1 2 3a 3b
  30. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  31. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. (Revisited) Using a VLA model trained to detect a black-object
  32. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  33. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. After update — detecting a white-object
  34. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. White-object model in action — ignores other colors
  35. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. To stay valuable, AI must keep evolving — wherever it runs. AWS IoT Greengrass makes that evolution reliable at the edge. • AI models lose value quickly without updates — where they run doesn’t matter • Continuous model delivery is now a key requirement for Physical AI • AWS IoT Greengrass brings cloud-grade update capability to edge devices • A managed framework makes updates simpler and more reliable than custom builds * Model management from Amazon SageMaker AI Edge Manager is now unified under AWS IoT Greengrass. Why Update Matters I N S I G H T
  36. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Connectivity Needs high-speed network ANYWHERE. How do we ensure it? File Size The model is HUGE. How can we distribute it robustly? Challenges to Make AI Updatable
  37. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. The Foundation Model is HUGE for IoT devices How should I deploy within the limitations?
  38. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Methods for Model deploy Using AWS IoT Greengrass A) As Artifact ― Basically Store in Amazon Simple Storage Service (S3) as an AWS IoT Greengrass artifact. Automatically downloaded during deployment. Limitation is ~2GB. B) Bundled in Container Embed the model inside the Docker image. Simplifies deployment, but increases image size. Amazon Elastic Container Registry(ECR) limitations apply. C) External ― Flexible Host the model outside AWS IoT Greengrass components. (e.g., in Hugging Face repository). Provides full flexibility, but requires custom access control and download logic.
  39. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. E X A M P L E Embed the model inside the Docker image with build. It’s very simple. Bundled in Container Dockerfile
  40. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Building sustainable Physical AI isn’t about where the inference runs — but about how it keeps evolving. • Local AI when instant response is required (around the 100 ms border) • Cloud AI when latency is tolerable — simplifying devices and reducing cost • But more importantly, sustainability comes from continuous updates • AWS IoT Greengrass makes AI “Updatable” — keeping intelligence aligned with the real-world Physical AI becomes sustainable when it can keep evolving — wherever it lives. Conclusion: Designing Sustainable Physical AI
  41. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Please complete the session survey in the mobile app © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you ! Kohei “Max” MATSUSHITA He / Him Tech. Evangelist | An AWS Hero Soracom, Inc.