Algorithm-SoC Co-Design for Mobile Continuous Vision

Slide 1

Slide 1 text

Algorithm-SoC Co-Design for Mobile Continuous Vision Yuhao Zhu Department of Computer Science University of Rochester with Anand Samajdar, Georgia Tech Matthew Mattina, ARM Research Paul Whatmough, ARM Research

Slide 2

Slide 2 text

Mobile Continuous Vision: Excessive Energy Consumption

Slide 3

Slide 3 text

Mobile Continuous Vision: Excessive Energy Consumption 720p, 30 FPS

Slide 4

Slide 4 text

Mobile Continuous Vision: Excessive Energy Consumption Energy Budget: (under 3 W TDP) 109 nJ/pixel 720p, 30 FPS

Slide 5

Slide 5 text

Mobile Continuous Vision: Excessive Energy Consumption Energy Budget: (under 3 W TDP) 109 nJ/pixel Object Detection Energy Consumption 1400 nJ/pixel 720p, 30 FPS

Slide 6

Slide 6 text

Application Drivers for Continuous Vision 3 Autonomous Drones

Slide 7

Slide 7 text

Application Drivers for Continuous Vision 3 Autonomous Drones ADAS

Slide 8

Slide 8 text

Application Drivers for Continuous Vision 3 Autonomous Drones Augmented Reality ADAS

Slide 9

Slide 9 text

Application Drivers for Continuous Vision 3 Autonomous Drones Augmented Reality ADAS Security Camera

Slide 10

Slide 10 text

Expanding the Scope 4 Vision Kernels RGB Frames Semantic Results Conventional Scope

Slide 11

Slide 11 text

Expanding the Scope 4 Vision Kernels RGB Frames Semantic Results Imaging Photons Conventional Scope

Slide 12

Slide 12 text

Expanding the Scope Our Scope 4 Vision Kernels RGB Frames Semantic Results Imaging Photons

Slide 13

Slide 13 text

Expanding the Scope Our Scope 4 Vision Kernels RGB Frames Semantic Results Imaging Photons Motion Metadata

Slide 14

Slide 14 text

Expanding the Scope Our Scope 4 Vision Kernels RGB Frames Semantic Results Imaging Photons Motion Metadata f(xt) =

Slide 15

Slide 15 text

Expanding the Scope Our Scope 4 Vision Kernels RGB Frames Semantic Results Imaging Photons Motion Metadata diﬀ (motion) f(xt) = (xt ⊖ xt-1)

Slide 16

Slide 16 text

Expanding the Scope Our Scope 4 Vision Kernels RGB Frames Semantic Results Imaging Photons Motion Metadata diﬀ (motion) f(xt) = f(x1, …, t-1) ⊕ (xt ⊖ xt-1)

Slide 17

Slide 17 text

Expanding the Scope Our Scope 4 Vision Kernels RGB Frames Semantic Results Imaging Photons Motion Metadata diﬀ (motion) synthesis f(xt) = f(x1, …, t-1) ⊕ (xt ⊖ xt-1)

Slide 18

Slide 18 text

Expanding the Scope Our Scope 4 Vision Kernels RGB Frames Semantic Results Imaging Photons Motion Metadata diﬀ (motion) synthesis cheap f(xt) = f(x1, …, t-1) ⊕ (xt ⊖ xt-1)

Slide 19

Slide 19 text

Expanding the Scope Our Scope 4 Vision Kernels RGB Frames Semantic Results Imaging Photons Motion Metadata diﬀ (motion) synthesis Motion-based Synthesis cheap f(xt) = f(x1, …, t-1) ⊕ (xt ⊖ xt-1)

Slide 20

Slide 20 text

Getting Motion Data 5 Vision Kernels RGB Frames Semantic Results Imaging Photons Motion Metadata

Slide 21

Slide 21 text

Getting Motion Data 5 Vision Kernels RGB Frames Semantic Results Imaging Photons Conversion Demosaic … Bayer Domain Dead Pixel Correction … YUV Domain Temporal Denoising …

Slide 22

Slide 22 text

Getting Motion Data 5 Vision Kernels RGB Frames Semantic Results Imaging Photons Conversion Demosaic … Bayer Domain Dead Pixel Correction … YUV Domain Temporal Denoising …

Slide 23

Slide 23 text

Getting Motion Data 5 Vision Kernels RGB Frames Semantic Results Imaging Photons Conversion Demosaic … Bayer Domain Dead Pixel Correction … YUV Domain Temporal Denoising … Frame k

Slide 24

Slide 24 text

Getting Motion Data 5 Vision Kernels RGB Frames Semantic Results Imaging Photons Conversion Demosaic … Bayer Domain Dead Pixel Correction … YUV Domain Temporal Denoising … Frame k

Slide 25

Slide 25 text

Getting Motion Data 5 Vision Kernels RGB Frames Semantic Results Imaging Photons Conversion Demosaic … Bayer Domain Dead Pixel Correction … YUV Domain Temporal Denoising … Frame k-1 Frame k

Slide 26

Slide 26 text

Getting Motion Data 5 Vision Kernels RGB Frames Semantic Results Imaging Photons Conversion Demosaic … Bayer Domain Dead Pixel Correction … YUV Domain Temporal Denoising … Frame k-1 Frame k Motion Vector =

Slide 27

Slide 27 text

Getting Motion Data 5 Vision Kernels RGB Frames Semantic Results Imaging Photons Conversion Demosaic … Bayer Domain Dead Pixel Correction … YUV Domain Temporal Denoising … Motion Info. Frame k-1 Frame k Motion Vector =

Slide 28

Slide 28 text

Synthesis Operation 6 f(xt) = f(x1, …, t-1) ⊕ (xt ⊖ xt-1) diﬀ (motion) synthesis Motion-based Synthesis

Slide 29

Slide 29 text

Synthesis Operation 6 f(xt) = f(x1, …, t-1) ⊕ (xt ⊖ xt-1) diﬀ (motion) synthesis Motion-based Synthesis ▸ Synthesis operation: Extrapolate based on motion vectors

Slide 30

Slide 30 text

Synthesis Operation 6 f(xt) = f(x1, …, t-1) ⊕ (xt ⊖ xt-1) diﬀ (motion) synthesis Motion-based Synthesis ▸ Synthesis operation: Extrapolate based on motion vectors

Slide 31

Slide 31 text

Synthesis Operation 6 f(xt) = f(x1, …, t-1) ⊕ (xt ⊖ xt-1) diﬀ (motion) synthesis Motion-based Synthesis ▸ Synthesis operation: Extrapolate based on motion vectors

Slide 32

Slide 32 text

Synthesis Operation 6 f(xt) = f(x1, …, t-1) ⊕ (xt ⊖ xt-1) diﬀ (motion) synthesis Motion-based Synthesis ▸ Synthesis operation: Extrapolate based on motion vectors

Slide 33

Slide 33 text

Inference (I-Frame) Extrapolation (E-Frame) Inference (I-Frame) Extrapolation (E-Frame) Extrapolation Window = 2 Extrapolation (E-Frame) Extrapolation Window = 3 t4 t0 t1 t2 t3 Synthesis Operation 6 f(xt) = f(x1, …, t-1) ⊕ (xt ⊖ xt-1) diﬀ (motion) synthesis Motion-based Synthesis ▸ Synthesis operation: Extrapolate based on motion vectors

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text

Synthesis Operation 6 f(xt) = f(x1, …, t-1) ⊕ (xt ⊖ xt-1) diﬀ (motion) synthesis Motion-based Synthesis ▸ Synthesis operation: Extrapolate based on motion vectors ▸ Address three challenges: ▹ Handle deformable parts ▹ Filter motion noise ▹ When to inference vs. extrapolate? ▹ See paper for details!

Slide 41

Slide 41 text

Slide 42

Slide 42 text

7 Euphrates An Algorithm-SoC Co-Designed System for Energy-Efﬁcient Mobile Continuous Vision

Slide 43

Slide 43 text

7 Euphrates An Algorithm-SoC Co-Designed System for Energy-Efﬁcient Mobile Continuous Vision Algorithm Motion-based tracking and detection synthesis.

Slide 44

Slide 44 text

7 Euphrates An Algorithm-SoC Co-Designed System for Energy-Efﬁcient Mobile Continuous Vision SoC Exploits synergies across IP blocks. Enables task autonomy. Algorithm Motion-based tracking and detection synthesis.

Slide 45

Slide 45 text

7 Euphrates An Algorithm-SoC Co-Designed System for Energy-Efﬁcient Mobile Continuous Vision Results 66% energy saving & 1% accuracy loss with RTL/measurement. SoC Exploits synergies across IP blocks. Enables task autonomy. Algorithm Motion-based tracking and detection synthesis.

Slide 46

Slide 46 text

Slide 47

Slide 47 text

SoC Architecture 8 Vision Kernels RGB Frames Semantic Results Imaging Photons

Slide 48

Slide 48 text

SoC Architecture 8 CNN Accelerator Vision Kernels RGB Frames Semantic Results Imaging Photons

Slide 49

Slide 49 text

SoC Architecture 8 Image Signal Processor CNN Accelerator Vision Kernels RGB Frames Semantic Results Imaging Photons

Slide 50

Slide 50 text

SoC Architecture 8 Image Signal Processor CNN Accelerator Camera Sensor Sensor Interface On-chip Interconnect Vision Kernels RGB Frames Semantic Results Imaging Photons

Slide 51

Slide 51 text

SoC Architecture 9 Image Signal Processor CNN Accelerator Camera Sensor Sensor Interface On-chip Interconnect

Slide 52

Slide 52 text

DRAM Display SoC Architecture 9 Image Signal Processor CNN Accelerator Camera Sensor Sensor Interface On-chip Interconnect CPU (Host) Memory Controller DMA Engine SoC

Slide 53

Slide 53 text

DRAM Display Frame Buffer SoC Architecture 9 Image Signal Processor CNN Accelerator Camera Sensor Sensor Interface On-chip Interconnect CPU (Host) Memory Controller DMA Engine SoC

Slide 54

Slide 54 text

DRAM Display Frame Buffer SoC Architecture 9 Image Signal Processor CNN Accelerator Camera Sensor Sensor Interface On-chip Interconnect CPU (Host) Memory Controller DMA Engine SoC

Slide 55

Slide 55 text

DRAM Display Frame Buffer SoC Architecture 9 Image Signal Processor CNN Accelerator Camera Sensor Sensor Interface On-chip Interconnect CPU (Host) Memory Controller DMA Engine SoC

Slide 56

Slide 56 text

DRAM Display Frame Buffer SoC Architecture 9 Image Signal Processor CNN Accelerator Camera Sensor Sensor Interface On-chip Interconnect CPU (Host) Memory Controller DMA Engine SoC

Slide 57

Slide 57 text

DRAM Display Frame Buffer SoC Architecture 9 CNN Accelerator Camera Sensor Sensor Interface On-chip Interconnect CPU (Host) Memory Controller DMA Engine Image Signal Processor SoC

Slide 58

Slide 58 text

DRAM Display Frame Buffer SoC Architecture 9 CNN Accelerator Camera Sensor Sensor Interface On-chip Interconnect CPU (Host) Memory Controller DMA Engine Image Signal Processor Metadata SoC

Slide 59

Slide 59 text

DRAM Display Frame Buffer SoC Architecture 9 CNN Accelerator Camera Sensor Sensor Interface On-chip Interconnect CPU (Host) Memory Controller DMA Engine Image Signal Processor Metadata 1 SoC

Slide 60

Slide 60 text

DRAM Display Frame Buffer SoC Architecture 9 CNN Accelerator Camera Sensor Sensor Interface On-chip Interconnect CPU (Host) Memory Controller DMA Engine Image Signal Processor Motion Controller Metadata 1 2

Slide 61

Slide 61 text

Slide 62

Slide 62 text

Slide 63

Slide 63 text

Slide 64

Slide 64 text

ISP Augmentation ▸ Expose motion vectors to the rest of the SoC 10

Slide 65

Slide 65 text

ISP Augmentation ▸ Expose motion vectors to the rest of the SoC ▸ Design decision: transfer MVs through DRAM 10

Slide 66

Slide 66 text

ISP Augmentation ▸ Expose motion vectors to the rest of the SoC ▸ Design decision: transfer MVs through DRAM ▹ One 1080p frame: 8KB MV trafﬁc vs. ~6MB pixel data 10

Slide 67

Slide 67 text

ISP Augmentation ▸ Expose motion vectors to the rest of the SoC ▸ Design decision: transfer MVs through DRAM ▹ One 1080p frame: 8KB MV trafﬁc vs. ~6MB pixel data ▹ Easy to piggyback on the existing SoC communication scheme 10

Slide 68

Slide 68 text

Slide 69

Slide 69 text

Temporal Denoising Stage Motion Estimation Motion Compensation SRAM DMA Demosaic Color Balance ISP Internal Interconnect SoC Interconnect ISP Pipeline Frame Buffer (DRAM) ISP Sequencer Noisy Frame Denoised Frame Prev. Noisy Frame Prev. Denoised Frame ISP Augmentation ▸ Expose motion vectors to the rest of the SoC ▸ Design decision: transfer MVs through DRAM ▹ One 1080p frame: 8KB MV trafﬁc vs. ~6MB pixel data ▹ Easy to piggyback on the existing SoC communication scheme ▸ Light-weight modiﬁcation to ISP Sequencer 10

Slide 70

Slide 70 text

Slide 71

Slide 71 text

Slide 72

Slide 72 text

Slide 73

Slide 73 text