$30 off During Our Annual Pro Sale. View Details »

An FPGA-Based Fully Pipelined Bilateral Grid for Real-Time Image Denoising (FPL 2021)

An FPGA-Based Fully Pipelined Bilateral Grid for Real-Time Image Denoising (FPL 2021)

Nobuho Hashimoto

August 15, 2021
Tweet

More Decks by Nobuho Hashimoto

Other Decks in Research

Transcript

  1. An FPGA-Based Fully Pipelined Bilateral Grid
    for Real-Time Image Denoising
    Nobuho Hashimoto, Shinya Takamaeda-Yamazaki
    The University of Tokyo
    FPL 2021 (Sep. 2nd, 2021)
    Session 3A: Application Acceleration

    View Slide

  2. Outline
    ❖Bilateral Filter (BF)
    ❖Our Approach
    Ø Algorithm-Level Contribution
    l Enhanced Bilateral Grid (BG) for BF
    Ø Hardware-Level Contribution
    l Fully Pipelined Design
    l Memory Access Optimization
    ❖Experiment on Actual FPGA
    Ø Qualitative Evaluation
    Ø Quantitative Evaluation
    ❖Conclusion
    1

    View Slide

  3. What is BF??
    ❖Edge-preserving smoother
    ❖Wide variety of applications
    Ø Denoising
    Ø Tone mapping
    Ø Stylization
    Ø Upsampling
    Ø Optical-flow estimation
    2
    Before filtering After filtering
    Horse image

    View Slide

  4. Definition of BF
    ❖Neighboring (space and range) pixels have larger weights
    ➜Edge-preserving characteristics
    3
    C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” ICCV, 1998
    The pixel of interest
    Pixel-wise
    product

    View Slide

  5. Computational Complexity
    ❖Calculations increase in accordance with window radius 𝒓
    Ø 𝑶(𝒓𝟐) per pixel
    ❖Real-time processing of large-scale and high-resolution images is
    difficult
    Ø Large number of pixels and window radius
    4
    High-resolution Low-resolution

    View Slide

  6. Our Approach
    ❖Goal
    Ø Large-scale and high-resolution image processing on small-scale
    hardware and in real-time
    ❖Method
    Ø Suppression of resources when window radius is large
    ➜ Enhanced BG
    Ø High throughput and low latency
    ➜ II = 1 pipeline and sequential processing
    5
    one pixel
    next pixel
    clock
    II (Initiation Interval)

    View Slide

  7. Bilateral Grid
    1. Grid Creation
    ØStore input on “grid” by discretization in space and range direction
    2. Gaussian Filter
    ØBlur grid using only spatial kernel
    3. Trilinear Interpolation
    ØCalculate output
    6
    J Chen, S Paris, and F Durand, “Real-time edge-aware image processing with the bilateral grid,” ACM Trans. Graph., 2007

    View Slide

  8. Enhanced BG
    ❖Window radius on grid is variable in original BG
    Ø Increase in resources is larger in 3D grid than in 2D input
    ➜Fix radius on grid at 1 and change radius on input
    Ø Radius on input does not greatly affect resource usage
    7
    Radius on input Radius on grid
    Original BG Not considered Variable
    Enhanced BG Variable 1 (Fixed)

    View Slide

  9. Proposed Algorithm
    1. Grid Creation: Project input image onto grid
    Ø Per image pixel
    2. Gaussian Filter: Blur grid using Gaussian Filter
    Ø Per grid element
    3. Trilinear Interpolation: Interpolate values using input image
    Ø Per image pixel
    8

    View Slide

  10. Fully Pipelined Design
    ❖Macro Pipeline
    Ø Pipeline between
    colored areas
    ❖Micro Pipeline
    Ø Pipeline within
    colored areas
    9

    View Slide

  11. Memory Access Optimization
    ❖Read-Modify-Write operation is performed in Grid Creation
    Ø Blue area is projected onto same grid element
    Ø II=1 is basically impossible
    ❖𝑟 times sequential accesses in y direction (red area) are utilized
    ❖1.5 to 2 times faster
    10
    Input image
    Accesses to
    each element

    View Slide

  12. Experiment
    ❖Implementation on ZCU 104 FPGA board
    ❖Tools
    Ø Vivado HLS 2019.2
    l Generate Verilog codes (High-Level-Synthesis)
    Ø Vivado 2019.2
    l Generate the bitstream
    Ø PYNQ v2.6
    l Exchange data with the board
    11
    ZCU 104 FPGA board

    View Slide

  13. Denoising Quality
    12
    Original image Image with Gaussian noise
    Image processed by BF Image processed by BG

    View Slide

  14. Comparison with Different Radius
    ❖Each index does not change greatly when window radius is
    enlarged
    13
    4 8 12 16
    (MHz) 214 214 214 214
    (fps) 95.15 100.13 99.24 98.36
    Slice 1955 (6.79 %) 2214 (7.69 %) 1611 (5.59 %) 1986 (6.90 %)
    LUT 10449 (4.54%) 11490 (4.99 %) 9013 (3.91 %) 9877 (4.29 %)
    FF 8682 (1.88 %) 7654 (1.66 %) 7438 (1.61 %) 6923 (1.50 %)
    DSP 19 (1.10 %) 15 (0.87 %) 15 (0.87 %) 15 (0.87 %)
    BRAM 22 (7.05 %) 23 (7.37 %) 26.5 (8.49 %) 28 (8.97 %)
    Comparison of the speed and resources of our design by changing window radius

    View Slide

  15. Comparison with Other Designs
    ❖High speed processing with
    large image and large
    window radius while
    suppressing resources
    Ø Faster than GPU A100 PCIe
    implementation
    14
    (2) A. Gabiger-Rose, M. Kube, R. Weigel, and R. Rose, “An FPGA-based
    fully synchronized design of a bilateral filter for real-time image
    denoising,” Transactions on Industrial Electronics, 2014
    (3) S. D. Dabhade, G. N. Rathna, and K. N. Chaudhury, “A reconfigurable
    and scalable FPGA architecture for bilateral filtering,” Transactions on
    Industrial Electronics, 2018
    Comparison of speed and resources between our design,
    GPU implementation of the BF, and other existing designs

    View Slide

  16. Conclusion
    ❖Enhance Bilateral Grid (BG) so that window size can be varied
    Ø BG is used to accelerate Bilateral Filter (BF)
    ❖Propose fully pipelined FPGA implementation for BG
    ❖Verify that our design outperforms others in speed and resources
    on actual FPGA
    15

    View Slide