Slide 1

Slide 1 text

Building the Computing System for Autonomous Micromobility Vehicles: Design Constraints and Architectural Optimizations Official Website: https://perceptin.io Bo Yu1, Wei Hu1, Leimeng Xu1, Jie Tang2, Shaoshan Liu1, Yuhao Zhu3 1. PerceptIn Inc 2. South China University of Technology 3. University of Rochester

Slide 2

Slide 2 text

Introduction • First thorough study of a commercial computing system for micromobility 1. complement previous academic research work in this area 2. summarize our R&D efforts in the past three years in building a suitable computing system for autonomous vehicles • Objectives: 1. highlight design constraints unique to autonomous vehicles 2. identify new architecture and systems problems for autonomous machines • Contributions: 1. introduce real autonomous vehicle workloads, and unobscured, unnormalized data from our deployed vehicles that future research can build on 2. present a detailed performance characterization of our vehicles 3. highlight that the computing system shouldn’t be optimized alone

Slide 3

Slide 3 text

Micromobility as a Service

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Business Story Behind this Paper Business Story Behind: https://ieeexplore.ieee.org/document/9195123/ • Option 1: Optimization of commercial mobile SoC • Option 2: Procurement of specialized autonomous driving computing systems • Option 3: Development of proprietary autonomous driving computing systems

Slide 6

Slide 6 text

Autonomous Driving Infrastructure • On-Vehicle Processing • Sensing • Perception • Planning • Cloud Services • Map generation • Simulation • ML model training

Slide 7

Slide 7 text

Design Constraints 1.Latency and Throughput 1.computing and physical control latencies 2.throughput 10 Hz 2.Energy and Thermal 1.computing and sensing drop operation time from 10 to 7.7 hrs 2.conventional cooling for thermal 3.Cost and Safety 1.low cost to sustain $1 per trip cost 2.proactive and reactive paths

Slide 8

Slide 8 text

Design Constraints: LiDAR vs Camera • Latency 1. LiDAR-based localization algorithm takes 100 ms to 1 s 2. our vision based localization algorithm finishes in about 25 ms on an embedded FPGA • Power 1. LiDAR is one order of magnitude more power hungry than cameras • Cost 1. LiDAR is one order of magnitude more expensive than cameras • Depth Quality 1. LiDARs directly provide depth information at the precision of 2 cm

Slide 9

Slide 9 text

Software Pipeline • Task-Level Parallelism • sensing, perception, and planning are serialized • different sensor processing are independent

Slide 10

Slide 10 text

Systems-on-a-Vehicle (SoV)

Slide 11

Slide 11 text

Algorithm-Hardware Mapping • Sensing 1. Zynq FPGA to perform sensor synch and feature extraction • Planning 1. executing on CPU with great CAN interface support • Perception 1. scene understanding 2. localization • Partial Reconfiguration 1. time shares part of the FPGA resources at runtime 2. swap time < 3 ms

Slide 12

Slide 12 text

Performance Characterizations • Avg. latency and variation • mean is 164 ms • 99th percent is over 300 ms • Variation mostly caused by scene complexity • Latency distribution • sensing contributes significantly to the latency • Perception is the biggest contributor to latency

Slide 13

Slide 13 text

Sensor Synchronization • Sensing has been overlooked • most research work focus on perception and planning in robotics • Sensing crucial to perception • e.g. in sensor fusion • Sensor synch • a challenging problem in real world! • trigger at the same time • processing latency variation

Slide 14

Slide 14 text

Sensing-Computing Co-Design • Localization 1. Sync sensor samples from both the camera and the IMU 2. processing pipeline introduces variable latency • Camera-IMU sync 1. IMU with short processing time 2. Camera with long processing time

Slide 15

Slide 15 text

Sensor Synchronization Architecture • Software-Hardware Sync Design Principles 1. Single time source trigger 2. Obtain timestamp close to sensor • Our Design 1. GPS provides satellite atomic time as the universal time source 2. Camera 30 FPS IMU 240 FPS 3. Camera trigger signal is down-sampled 8 times from the IMU signal

Slide 16

Slide 16 text

Concluding Remarks • Holistic SoV Optimizations 1. move beyond optimizing only one part of the computing platform 2. understand the constraints and tradeoffs from an SoV perspective • Horizontal Cross-Accelerator Optimization 1. most previous studies focus on one accelerator 2. exploit interactions across accelerators 3. present the dataflow across different on-vehicle algorithms and their inherent task-level parallelisms • Architecture for Autonomous Machines 1. static dataflow pattern 2. each stage imposes additional constraints • “TCO” Model for Autonomous Machines 1. need a comprehensive cost model for autonomous machines 2. balance between cloud processing and on-vehicle processing

Slide 17

Slide 17 text

Thank you!