Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Energy-Efficient 360-Degree Video Rendering on FPGA via Algorithm- Architecture Co-Design

HorizonLab
February 24, 2020

Energy-Efficient 360-Degree Video Rendering on FPGA via Algorithm- Architecture Co-Design

FPGA 2020 presentation. Presented by Qiuyue Sun.

HorizonLab

February 24, 2020
Tweet

More Decks by HorizonLab

Other Decks in Technology

Transcript

  1. Energy-Efficient 360-Degree Video
    Rendering on FPGA via Algorithm-
    Architecture Co-Design
    Qiuyue Sun
    Amir Taherin
    Yawo Siatitse
    Yuhao Zhu

    View Slide

  2. Virtual Reality

    View Slide

  3. View Slide

  4. View Slide

  5. 360-Degree Video Delivery Pipeline

    View Slide

  6. 360-Degree Video Delivery Pipeline
    Original Frame

    View Slide

  7. 360-Degree Video Delivery Pipeline
    Rendering
    Original Frame

    View Slide

  8. 360-Degree Video Delivery Pipeline
    Rendering
    Original Frame

    View Slide

  9. 360-Degree Video Delivery Pipeline
    Rendering
    Field of View

    (FOV)
    Original Frame

    View Slide

  10. 360-Degree Video Delivery Pipeline
    Rendering
    Field of View

    (FOV)
    Consumes over 4 W power
    Exceeds TDP of typical mobile devices
    Original Frame

    View Slide

  11. Rendering
    4
    Current Rendering Algorithm

    View Slide

  12. Rendering
    4
    Current Rendering Algorithm
    Mapping
    Perspective
    Update
    Filtering

    View Slide

  13. Rendering
    4
    Current Rendering Algorithm
    Mapping
    Perspective
    Update
    Filtering
    Matrix Multiplication

    View Slide

  14. Rendering
    4
    Current Rendering Algorithm
    Mapping
    Perspective
    Update
    Filtering
    Matrix Multiplication
    Cartesian Coordinates

    View Slide

  15. Rendering
    4
    Current Rendering Algorithm
    Mapping
    Perspective
    Update
    Filtering
    Matrix Multiplication
    Cartesian Coordinates
    Linear Interpolation

    View Slide

  16. Current Implementation
    Field of View

    (FOV)
    Original Frame

    View Slide

  17. Current Implementation
    (x, y)
    (x’, y’)
    Field of View

    (FOV)
    Original Frame

    View Slide

  18. Current Implementation
    (x, y)
    (x’, y’)
    Field of View

    (FOV)
    Original Frame

    View Slide

  19. Challenges: Memory Accesses

    View Slide

  20. Challenges: Memory Accesses
    ▸ Irregular Access Pattern

    View Slide

  21. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential

    View Slide

  22. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential

    View Slide

  23. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential
    ▹Severely hurts the efficiency of hardware acceleration

    View Slide

  24. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential
    ▹Severely hurts the efficiency of hardware acceleration
    ▸ Large Footprint

    View Slide

  25. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential
    ▹Severely hurts the efficiency of hardware acceleration
    ▸ Large Footprint
    ▹1080P is ~5.9 MB and 4K is ~23.7 MB

    View Slide

  26. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential
    ▹Severely hurts the efficiency of hardware acceleration
    ▸ Large Footprint
    ▹1080P is ~5.9 MB and 4K is ~23.7 MB
    ▹Cannot be fully captured by a typical on-chip memory

    View Slide

  27. Our Design
    (x’, y’)
    (x, y)

    View Slide

  28. Our Design
    ▸ Enforce a streaming data access
    (x’, y’)
    (x, y)

    View Slide

  29. Our Design
    ▸ Enforce a streaming data access
    (x’, y’)
    (x, y)

    View Slide

  30. Our Design
    ▸ Enforce a streaming data access
    (x’, y’)
    (x, y)

    View Slide

  31. Our Design
    ▸ Enforce a streaming data access
    ▸ Reduce unnecessary computations
    (x’, y’)
    (x, y)

    View Slide

  32. Our Design
    ▸ Enforce a streaming data access
    ▸ Reduce unnecessary computations
    ▹Perform boundary checking

    View Slide

  33. Our Design
    ▸ Enforce a streaming data access
    ▸ Reduce unnecessary computations
    ▹Perform boundary checking
    ▸ Fully pipeline pixel rendering

    View Slide

  34. Setup and Evaluation
    8

    View Slide

  35. Setup and Evaluation
    8
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104

    View Slide

  36. Setup and Evaluation
    8
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2

    View Slide

  37. Setup and Evaluation
    8
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2
    ▸ Real User Trace Evaluation

    View Slide

  38. Setup and Evaluation
    8
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2
    ▸ Real User Trace Evaluation
    ▸ Baseline: Original algorithm implemented on GPU and FPGA

    View Slide

  39. Setup and Evaluation
    8
    Energy Savings(%)
    0
    20
    40
    60
    RC Elephant NYC Rhino Paris Venice
    Saving over FPGA Savings over GPU
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2
    ▸ Real User Trace Evaluation
    ▸ Baseline: Original algorithm implemented on GPU and FPGA

    View Slide

  40. Setup and Evaluation
    8
    Energy Savings(%)
    0
    20
    40
    60
    RC Elephant NYC Rhino Paris Venice
    Saving over FPGA Savings over GPU
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2
    ▸ Real User Trace Evaluation
    ▸ Baseline: Original algorithm implemented on GPU and FPGA

    View Slide

  41. Setup and Evaluation
    8
    Energy Savings(%)
    0
    20
    40
    60
    RC Elephant NYC Rhino Paris Venice
    Saving over FPGA Savings over GPU
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2
    ▸ Real User Trace Evaluation
    ▸ Baseline: Original algorithm implemented on GPU and FPGA

    View Slide

  42. Conclusion
    9

    View Slide

  43. Conclusion
    9
    ▸ Virtual reality popularity is
    growing rapidly

    View Slide

  44. Conclusion
    9
    ▸ 360-degree video rendering
    consumes excessive power
    ▸ Virtual reality popularity is
    growing rapidly

    View Slide

  45. Conclusion
    9
    ▸ 360-degree video rendering
    consumes excessive power
    ▸ Our co-design achieves on
    average 26.4% and 40.0%
    energy savings over baselines
    ▸ Virtual reality popularity is
    growing rapidly

    View Slide