Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Energy-Efficient 360-Degree Video Rendering on FPGA via Algorithm- Architecture Co-Design

HorizonLab
February 24, 2020

Energy-Efficient 360-Degree Video Rendering on FPGA via Algorithm- Architecture Co-Design

FPGA 2020 presentation. Presented by Qiuyue Sun.

HorizonLab

February 24, 2020
Tweet

More Decks by HorizonLab

Other Decks in Technology

Transcript

  1. Energy-Efficient 360-Degree Video
    Rendering on FPGA via Algorithm-
    Architecture Co-Design
    Qiuyue Sun
    Amir Taherin
    Yawo Siatitse
    Yuhao Zhu

    View full-size slide

  2. Virtual Reality

    View full-size slide

  3. 360-Degree Video Delivery Pipeline

    View full-size slide

  4. 360-Degree Video Delivery Pipeline
    Original Frame

    View full-size slide

  5. 360-Degree Video Delivery Pipeline
    Rendering
    Original Frame

    View full-size slide

  6. 360-Degree Video Delivery Pipeline
    Rendering
    Original Frame

    View full-size slide

  7. 360-Degree Video Delivery Pipeline
    Rendering
    Field of View

    (FOV)
    Original Frame

    View full-size slide

  8. 360-Degree Video Delivery Pipeline
    Rendering
    Field of View

    (FOV)
    Consumes over 4 W power
    Exceeds TDP of typical mobile devices
    Original Frame

    View full-size slide

  9. Rendering
    4
    Current Rendering Algorithm

    View full-size slide

  10. Rendering
    4
    Current Rendering Algorithm
    Mapping
    Perspective
    Update
    Filtering

    View full-size slide

  11. Rendering
    4
    Current Rendering Algorithm
    Mapping
    Perspective
    Update
    Filtering
    Matrix Multiplication

    View full-size slide

  12. Rendering
    4
    Current Rendering Algorithm
    Mapping
    Perspective
    Update
    Filtering
    Matrix Multiplication
    Cartesian Coordinates

    View full-size slide

  13. Rendering
    4
    Current Rendering Algorithm
    Mapping
    Perspective
    Update
    Filtering
    Matrix Multiplication
    Cartesian Coordinates
    Linear Interpolation

    View full-size slide

  14. Current Implementation
    Field of View

    (FOV)
    Original Frame

    View full-size slide

  15. Current Implementation
    (x, y)
    (x’, y’)
    Field of View

    (FOV)
    Original Frame

    View full-size slide

  16. Current Implementation
    (x, y)
    (x’, y’)
    Field of View

    (FOV)
    Original Frame

    View full-size slide

  17. Challenges: Memory Accesses

    View full-size slide

  18. Challenges: Memory Accesses
    ▸ Irregular Access Pattern

    View full-size slide

  19. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential

    View full-size slide

  20. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential

    View full-size slide

  21. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential
    ▹Severely hurts the efficiency of hardware acceleration

    View full-size slide

  22. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential
    ▹Severely hurts the efficiency of hardware acceleration
    ▸ Large Footprint

    View full-size slide

  23. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential
    ▹Severely hurts the efficiency of hardware acceleration
    ▸ Large Footprint
    ▹1080P is ~5.9 MB and 4K is ~23.7 MB

    View full-size slide

  24. Challenges: Memory Accesses
    ▸ Irregular Access Pattern
    ▹Accesses are not sequential
    ▹Severely hurts the efficiency of hardware acceleration
    ▸ Large Footprint
    ▹1080P is ~5.9 MB and 4K is ~23.7 MB
    ▹Cannot be fully captured by a typical on-chip memory

    View full-size slide

  25. Our Design
    (x’, y’)
    (x, y)

    View full-size slide

  26. Our Design
    ▸ Enforce a streaming data access
    (x’, y’)
    (x, y)

    View full-size slide

  27. Our Design
    ▸ Enforce a streaming data access
    (x’, y’)
    (x, y)

    View full-size slide

  28. Our Design
    ▸ Enforce a streaming data access
    (x’, y’)
    (x, y)

    View full-size slide

  29. Our Design
    ▸ Enforce a streaming data access
    ▸ Reduce unnecessary computations
    (x’, y’)
    (x, y)

    View full-size slide

  30. Our Design
    ▸ Enforce a streaming data access
    ▸ Reduce unnecessary computations
    ▹Perform boundary checking

    View full-size slide

  31. Our Design
    ▸ Enforce a streaming data access
    ▸ Reduce unnecessary computations
    ▹Perform boundary checking
    ▸ Fully pipeline pixel rendering

    View full-size slide

  32. Setup and Evaluation
    8

    View full-size slide

  33. Setup and Evaluation
    8
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104

    View full-size slide

  34. Setup and Evaluation
    8
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2

    View full-size slide

  35. Setup and Evaluation
    8
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2
    ▸ Real User Trace Evaluation

    View full-size slide

  36. Setup and Evaluation
    8
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2
    ▸ Real User Trace Evaluation
    ▸ Baseline: Original algorithm implemented on GPU and FPGA

    View full-size slide

  37. Setup and Evaluation
    8
    Energy Savings(%)
    0
    20
    40
    60
    RC Elephant NYC Rhino Paris Venice
    Saving over FPGA Savings over GPU
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2
    ▸ Real User Trace Evaluation
    ▸ Baseline: Original algorithm implemented on GPU and FPGA

    View full-size slide

  38. Setup and Evaluation
    8
    Energy Savings(%)
    0
    20
    40
    60
    RC Elephant NYC Rhino Paris Venice
    Saving over FPGA Savings over GPU
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2
    ▸ Real User Trace Evaluation
    ▸ Baseline: Original algorithm implemented on GPU and FPGA

    View full-size slide

  39. Setup and Evaluation
    8
    Energy Savings(%)
    0
    20
    40
    60
    RC Elephant NYC Rhino Paris Venice
    Saving over FPGA Savings over GPU
    ▸ Xilinx Zynq UltraScale+ MPSoC ZCU104
    ▸ Pascal GPU on the Nvidia Jetson TX2
    ▸ Real User Trace Evaluation
    ▸ Baseline: Original algorithm implemented on GPU and FPGA

    View full-size slide

  40. Conclusion
    9
    ▸ Virtual reality popularity is
    growing rapidly

    View full-size slide

  41. Conclusion
    9
    ▸ 360-degree video rendering
    consumes excessive power
    ▸ Virtual reality popularity is
    growing rapidly

    View full-size slide

  42. Conclusion
    9
    ▸ 360-degree video rendering
    consumes excessive power
    ▸ Our co-design achieves on
    average 26.4% and 40.0%
    energy savings over baselines
    ▸ Virtual reality popularity is
    growing rapidly

    View full-size slide