Slide 1

Slide 1 text

Low-Latency Proactive Continuous Vision Yiming Gan Department of Computer Science, University of Rochester with Yuxian Qiu, Shanghai Jiao Tong University Lele Chen, University of Rochester Jingwen Leng, Shanghai Jiao Tong University Yuhao Zhu University of Rochester

Slide 2

Slide 2 text

Continuous Vision: Long Frame Latency

Slide 3

Slide 3 text

Bottleneck: Serialization Light Raw Pixels Sensor

Slide 4

Slide 4 text

Bottleneck: Serialization Light Raw Pixels Sensor RGB Image Signal Processor

Slide 5

Slide 5 text

Light Raw Pixels Sensor RGB Bottleneck: Serialization Results DNN Accelerator Image Signal Processor

Slide 6

Slide 6 text

Traditional Pipeline Sensing Frame 1 Imaging Vision Frame 2 Sensing Imaging Vision Frame 3 Sensing Imaging Vision Latency Latency Latency

Slide 7

Slide 7 text

Proactive Pipeline Sensing Frame 1 Imaging Vision Latency

Slide 8

Slide 8 text

Proactive Pipeline Sensing Frame 1 Imaging Vision Latency Pred

Slide 9

Slide 9 text

Proactive Pipeline Sensing Frame 1 Imaging Vision Latency Pred Sensing Frame 2 Imaging Vision Vision Sensing Frame 3 Imaging

Slide 10

Slide 10 text

Proactive Pipeline Sensing Frame 1 Imaging Vision Latency Pred Sensing Frame 2 Imaging Vision Vision Sensing Frame 3 Imaging Chek Chek

Slide 11

Slide 11 text

Proactive Pipeline Sensing Frame 1 Imaging Vision Latency Pred Sensing Frame 2 Imaging Vision Vision Sensing Frame 3 Imaging Chek Chek Latency Vision Fail Check

Slide 12

Slide 12 text

Proactive Pipeline Sensing Frame 1 Imaging Vision Latency Pred Sensing Frame 2 Imaging Vision Vision Sensing Frame 3 Imaging Chek Chek Latency Vision Fail Check Pass Check Latency

Slide 13

Slide 13 text

Gains Sensing Frame 1 Imaging Vision Latency Pred Sensing Frame 2 Imaging Vision Vision Sensing Frame 3 Imaging Chek Chek Latency Vision Fail Check Pass Check Latency

Slide 14

Slide 14 text

Challenges Sensing Frame 1 Imaging Vision Latency Pred Sensing Frame 2 Imaging Vision Vision Sensing Frame 3 Imaging Chek Chek Latency Vision Fail Check Pass Check Latency Resource Contention

Slide 15

Slide 15 text

Solutions

Slide 16

Slide 16 text

Solutions

Slide 17

Slide 17 text

Challenges Sensing Frame 1 Imaging Vision Latency Pred Sensing Frame 2 Imaging Vision Vision Sensing Frame 3 Imaging Chek Chek Latency Vision Fail Check Pass Check Latency Energy Wasting

Slide 18

Slide 18 text

Solutions • Relaxing Checking Criterion (Threshold T)

Slide 19

Slide 19 text

Solutions • Relaxing Checking Criterion (Threshold T) • Relaxing Checking Frequency (Degree K)

Slide 20

Slide 20 text

Frames Sequence Precise Frames Unchecked- Predicted Frames Checked- Predicted Frames Time Predicted Sequence

Slide 21

Slide 21 text

PVF Framework Static Dynamic Vision Apps SoC Sensor BUS Memory CPU NPU DSP GPU ISP

Slide 22

Slide 22 text

PVF Framework Static Dynamic Vision Apps SoC Sensor BUS Memory CPU NPU DSP GPU ISP Similarity T Degree K “ ” Accuracy Target Similarity Metric etc.

Slide 23

Slide 23 text

PVF Framework Static Dynamic Vision Apps SoC Sensor BUS Memory CPU NPU DSP GPU ISP Similarity T Degree K Predictor

Slide 24

Slide 24 text

PVF Framework Static Dynamic Vision Apps SoC Sensor BUS Memory CPU NPU DSP GPU ISP Similarity T Degree K Predictor Runtime

Slide 25

Slide 25 text

PVF Framework Static Dynamic Vision Apps SoC Sensor BUS Memory CPU NPU DSP GPU ISP Similarity T Degree K Predictor Runtime Checking

Slide 26

Slide 26 text

PVF Framework Static Dynamic Vision Apps SoC Sensor BUS Memory CPU NPU DSP GPU ISP Similarity T Degree K Predictor Runtime Checking Scheduler

Slide 27

Slide 27 text

PVF Framework Static Dynamic Vision Apps Similarity T Degree K SoC Sensor BUS Memory CPU NPU DSP GPU ISP Predictor Scheduler Control Checking Runtime

Slide 28

Slide 28 text

Experimental Setup I. In house simulator modeling state-of-the art SoCs • Real measurement of latency and energy on different IPs.

Slide 29

Slide 29 text

Experimental Setup I. In house simulator modeling state-of-the art SoCs • Real measurement of latency and energy on different IPs. II. RTL Implementations for NPU and Predictor • 20x20 Systolic Array for NPU, 10x10 Systolic Array for Predictor

Slide 30

Slide 30 text

Experimental Setup III. Evaluate on Object Detection and Tracking • KITTI dataset for object detection, VOT-challange for tracking. I. In house simulator modeling state-of-the art SoCs • Real measurement of latency and energy on different IPs. II. RTL Implementations for NPU and Predictor • 20x20 Systolic Array for NPU, 10x10 Systolic Array for Predictor

Slide 31

Slide 31 text

Experimental Setup III. Evaluate on Object Detection and Tracking • KITTI dataset for object detection, VOT-challange for tracking. I. In house simulator modeling state-of-the art SoCs • Real measurement of latency and energy on different IPs. II. RTL Implementations for NPU and Predictor • 20x20 Systolic Array for NPU, 10x10 Systolic Array for Predictor IV. Different Input Resolutions

Slide 32

Slide 32 text

Baselines I. Base • Baseline with traditional execution pipeline II. BO • Baseline with optimized back-end III. FCFS • Traditional pipeline with multiple hardware IPs

Slide 33

Slide 33 text

Results 0 25 50 75 100 0 12.5 25 37.5 50 Energy Budget (mJ) Latency Reduction (%) Better

Slide 34

Slide 34 text

Results 0 25 50 75 100 0 12.5 25 37.5 50 Energy Budget (mJ) Latency Reduction (%) PVF Better

Slide 35

Slide 35 text

Results 0 25 50 75 100 0 12.5 25 37.5 50 Energy Budget (mJ) Latency Reduction (%) Base BO FCFS Better PVF

Slide 36

Slide 36 text

Conclusion I. Long Latency Bottleneck Continuous Vision II. Proactive Execution Pipeline 1) Leveraging Heterogeneities in Mobile SoCs 2) Relaxed Checking III. Non-mission-critical System

Slide 37

Slide 37 text

Collaborators Yuxian Qiu Jingwen Leng Lele Chen Yuhao Zhu

Slide 38

Slide 38 text

Questions