Challenges: Memory Accesses
▸ Irregular Access Pattern
▹Accesses are not sequential
Slide 22
Slide 22 text
Challenges: Memory Accesses
▸ Irregular Access Pattern
▹Accesses are not sequential
Slide 23
Slide 23 text
Challenges: Memory Accesses
▸ Irregular Access Pattern
▹Accesses are not sequential
▹Severely hurts the efficiency of hardware acceleration
Slide 24
Slide 24 text
Challenges: Memory Accesses
▸ Irregular Access Pattern
▹Accesses are not sequential
▹Severely hurts the efficiency of hardware acceleration
▸ Large Footprint
Slide 25
Slide 25 text
Challenges: Memory Accesses
▸ Irregular Access Pattern
▹Accesses are not sequential
▹Severely hurts the efficiency of hardware acceleration
▸ Large Footprint
▹1080P is ~5.9 MB and 4K is ~23.7 MB
Slide 26
Slide 26 text
Challenges: Memory Accesses
▸ Irregular Access Pattern
▹Accesses are not sequential
▹Severely hurts the efficiency of hardware acceleration
▸ Large Footprint
▹1080P is ~5.9 MB and 4K is ~23.7 MB
▹Cannot be fully captured by a typical on-chip memory
Slide 27
Slide 27 text
Our Design
(x’, y’)
(x, y)
Slide 28
Slide 28 text
Our Design
▸ Enforce a streaming data access
(x’, y’)
(x, y)
Slide 29
Slide 29 text
Our Design
▸ Enforce a streaming data access
(x’, y’)
(x, y)
Slide 30
Slide 30 text
Our Design
▸ Enforce a streaming data access
(x’, y’)
(x, y)
Slide 31
Slide 31 text
Our Design
▸ Enforce a streaming data access
▸ Reduce unnecessary computations
(x’, y’)
(x, y)
Slide 32
Slide 32 text
Our Design
▸ Enforce a streaming data access
▸ Reduce unnecessary computations
▹Perform boundary checking
Slide 33
Slide 33 text
Our Design
▸ Enforce a streaming data access
▸ Reduce unnecessary computations
▹Perform boundary checking
▸ Fully pipeline pixel rendering
Slide 34
Slide 34 text
Setup and Evaluation
8
Slide 35
Slide 35 text
Setup and Evaluation
8
▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
Slide 36
Slide 36 text
Setup and Evaluation
8
▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
▸ Pascal GPU on the Nvidia Jetson TX2
Slide 37
Slide 37 text
Setup and Evaluation
8
▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
▸ Pascal GPU on the Nvidia Jetson TX2
▸ Real User Trace Evaluation
Slide 38
Slide 38 text
Setup and Evaluation
8
▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
▸ Pascal GPU on the Nvidia Jetson TX2
▸ Real User Trace Evaluation
▸ Baseline: Original algorithm implemented on GPU and FPGA
Slide 39
Slide 39 text
Setup and Evaluation
8
Energy Savings(%)
0
20
40
60
RC Elephant NYC Rhino Paris Venice
Saving over FPGA Savings over GPU
▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
▸ Pascal GPU on the Nvidia Jetson TX2
▸ Real User Trace Evaluation
▸ Baseline: Original algorithm implemented on GPU and FPGA
Slide 40
Slide 40 text
Setup and Evaluation
8
Energy Savings(%)
0
20
40
60
RC Elephant NYC Rhino Paris Venice
Saving over FPGA Savings over GPU
▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
▸ Pascal GPU on the Nvidia Jetson TX2
▸ Real User Trace Evaluation
▸ Baseline: Original algorithm implemented on GPU and FPGA
Slide 41
Slide 41 text
Setup and Evaluation
8
Energy Savings(%)
0
20
40
60
RC Elephant NYC Rhino Paris Venice
Saving over FPGA Savings over GPU
▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
▸ Pascal GPU on the Nvidia Jetson TX2
▸ Real User Trace Evaluation
▸ Baseline: Original algorithm implemented on GPU and FPGA
Slide 42
Slide 42 text
Conclusion
9
Slide 43
Slide 43 text
Conclusion
9
▸ Virtual reality popularity is
growing rapidly
Slide 44
Slide 44 text
Conclusion
9
▸ 360-degree video rendering
consumes excessive power
▸ Virtual reality popularity is
growing rapidly
Slide 45
Slide 45 text
Conclusion
9
▸ 360-degree video rendering
consumes excessive power
▸ Our co-design achieves on
average 26.4% and 40.0%
energy savings over baselines
▸ Virtual reality popularity is
growing rapidly