Energy-Efficient 360-Degree Video Rendering on FPGA via Algorithm- Architecture Co-Design

F0c4b39a71fc7c752d4e6c451f6f678b?s=47 HorizonLab
February 24, 2020

Energy-Efficient 360-Degree Video Rendering on FPGA via Algorithm- Architecture Co-Design

FPGA 2020 presentation. Presented by Qiuyue Sun.

F0c4b39a71fc7c752d4e6c451f6f678b?s=128

HorizonLab

February 24, 2020
Tweet

Transcript

  1. Energy-Efficient 360-Degree Video Rendering on FPGA via Algorithm- Architecture Co-Design

    Qiuyue Sun Amir Taherin Yawo Siatitse Yuhao Zhu
  2. Virtual Reality

  3. None
  4. None
  5. 360-Degree Video Delivery Pipeline

  6. 360-Degree Video Delivery Pipeline Original Frame

  7. 360-Degree Video Delivery Pipeline Rendering Original Frame

  8. 360-Degree Video Delivery Pipeline Rendering Original Frame

  9. 360-Degree Video Delivery Pipeline Rendering Field of View (FOV) Original

    Frame
  10. 360-Degree Video Delivery Pipeline Rendering Field of View (FOV) Consumes

    over 4 W power Exceeds TDP of typical mobile devices Original Frame
  11. Rendering 4 Current Rendering Algorithm

  12. Rendering 4 Current Rendering Algorithm Mapping Perspective Update Filtering

  13. Rendering 4 Current Rendering Algorithm Mapping Perspective Update Filtering Matrix

    Multiplication
  14. Rendering 4 Current Rendering Algorithm Mapping Perspective Update Filtering Matrix

    Multiplication Cartesian Coordinates
  15. Rendering 4 Current Rendering Algorithm Mapping Perspective Update Filtering Matrix

    Multiplication Cartesian Coordinates Linear Interpolation
  16. Current Implementation Field of View (FOV) Original Frame

  17. Current Implementation (x, y) (x’, y’) Field of View (FOV)

    Original Frame
  18. Current Implementation (x, y) (x’, y’) Field of View (FOV)

    Original Frame
  19. Challenges: Memory Accesses

  20. Challenges: Memory Accesses ▸ Irregular Access Pattern

  21. Challenges: Memory Accesses ▸ Irregular Access Pattern ▹Accesses are not

    sequential
  22. Challenges: Memory Accesses ▸ Irregular Access Pattern ▹Accesses are not

    sequential
  23. Challenges: Memory Accesses ▸ Irregular Access Pattern ▹Accesses are not

    sequential ▹Severely hurts the efficiency of hardware acceleration
  24. Challenges: Memory Accesses ▸ Irregular Access Pattern ▹Accesses are not

    sequential ▹Severely hurts the efficiency of hardware acceleration ▸ Large Footprint
  25. Challenges: Memory Accesses ▸ Irregular Access Pattern ▹Accesses are not

    sequential ▹Severely hurts the efficiency of hardware acceleration ▸ Large Footprint ▹1080P is ~5.9 MB and 4K is ~23.7 MB
  26. Challenges: Memory Accesses ▸ Irregular Access Pattern ▹Accesses are not

    sequential ▹Severely hurts the efficiency of hardware acceleration ▸ Large Footprint ▹1080P is ~5.9 MB and 4K is ~23.7 MB ▹Cannot be fully captured by a typical on-chip memory
  27. Our Design (x’, y’) (x, y)

  28. Our Design ▸ Enforce a streaming data access (x’, y’)

    (x, y)
  29. Our Design ▸ Enforce a streaming data access (x’, y’)

    (x, y)
  30. Our Design ▸ Enforce a streaming data access (x’, y’)

    (x, y)
  31. Our Design ▸ Enforce a streaming data access ▸ Reduce

    unnecessary computations (x’, y’) (x, y)
  32. Our Design ▸ Enforce a streaming data access ▸ Reduce

    unnecessary computations ▹Perform boundary checking
  33. Our Design ▸ Enforce a streaming data access ▸ Reduce

    unnecessary computations ▹Perform boundary checking ▸ Fully pipeline pixel rendering
  34. Setup and Evaluation 8

  35. Setup and Evaluation 8 ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104

  36. Setup and Evaluation 8 ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104

    ▸ Pascal GPU on the Nvidia Jetson TX2
  37. Setup and Evaluation 8 ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104

    ▸ Pascal GPU on the Nvidia Jetson TX2 ▸ Real User Trace Evaluation
  38. Setup and Evaluation 8 ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104

    ▸ Pascal GPU on the Nvidia Jetson TX2 ▸ Real User Trace Evaluation ▸ Baseline: Original algorithm implemented on GPU and FPGA
  39. Setup and Evaluation 8 Energy Savings(%) 0 20 40 60

    RC Elephant NYC Rhino Paris Venice Saving over FPGA Savings over GPU ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104 ▸ Pascal GPU on the Nvidia Jetson TX2 ▸ Real User Trace Evaluation ▸ Baseline: Original algorithm implemented on GPU and FPGA
  40. Setup and Evaluation 8 Energy Savings(%) 0 20 40 60

    RC Elephant NYC Rhino Paris Venice Saving over FPGA Savings over GPU ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104 ▸ Pascal GPU on the Nvidia Jetson TX2 ▸ Real User Trace Evaluation ▸ Baseline: Original algorithm implemented on GPU and FPGA
  41. Setup and Evaluation 8 Energy Savings(%) 0 20 40 60

    RC Elephant NYC Rhino Paris Venice Saving over FPGA Savings over GPU ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104 ▸ Pascal GPU on the Nvidia Jetson TX2 ▸ Real User Trace Evaluation ▸ Baseline: Original algorithm implemented on GPU and FPGA
  42. Conclusion 9

  43. Conclusion 9 ▸ Virtual reality popularity is growing rapidly

  44. Conclusion 9 ▸ 360-degree video rendering consumes excessive power ▸

    Virtual reality popularity is growing rapidly
  45. Conclusion 9 ▸ 360-degree video rendering consumes excessive power ▸

    Our co-design achieves on average 26.4% and 40.0% energy savings over baselines ▸ Virtual reality popularity is growing rapidly