The Possibilities of FPGA for Deep Learning

Bf527d0cc0376f104c772a94bdf38889?s=47 k-mats
January 18, 2017

The Possibilities of FPGA for Deep Learning

A survey on use of FPGA to Deep Learning



January 18, 2017


  1. The Possibilities of FPGA for Deep Learning Kohei Matsumoto @kmats_

  2. Challenges of Hardware for Deep Learning • Performance • Power

    efficiency • Hardware cost • Memory bandwidth • It is required to pass data from layer to layer • Processing bandwidth • How many data are processed simultaneously? • etc. (Re-programmability, Ease of use, …)
  3. FPGA? • Field Programmable Gate Array • A “reconfigurable” hardware

    by Hardware Description Language • Pros • Can re-program any kind of logics • Cons • Lack of resources (processing elements, memory, etc) • Hardware cost (compared to mass-produced devices)
  4. Use-cases of GPU/FPGA • GPU: Massive parallel operations • Graphic

    processing, a sort of scientific simulations, etc. • FPGA: Prototyping of ASICs, Hardware-wise speed is needed and yet logics can be changed, etc. • Search engine accelerator, financial simulation, high frequency trading, etc.

  6. GPU: De facto standard of Deep Learning… why? • Deep

    Learning ~= a variation of Convolutional Neural Network (CNN) • CNN ~= Massive parallel product-accumulate operations GPU! Yay! • The learning phase needs enormous computing resources (FPGA cannot provide enough resources)
  7. FPGA over GPU in terms of Deep Learning • Pros

    % • Power Efficiency (Performance per Watt) • Cons & • Difficult implementation • Lack of memory bandwidth • Lack of processing elements for training • Most papers discuss only the inference phase?
  8. Example: CNN Accelerator by Microsoft • “Single-node deep CNN accelerator

    on a mid-range FPGA” (only the inference phase) • “Respectable performance relative to prior FPGA designs and high-end GPGPUs at a fraction of the power”

  10. Binarized Neural Network: Highly optimized on FPGA? • Binarizes input,

    output and weights deterministically • Stored/Updated weights retain precision • “At test phase, BDNNs are fully binarized and can be implemented in hardware with low circuit complexity” • which means the learning phase is not yet fully binarized

  12. Wrap-up • FPGA: a re-programmable hardware • Power-efficient with optimal

    logic • Lack of computing resources • CNN is too big to be implemented - needs to be simplified • An approach: Binarized Neural Network • It is yet hard to binarize the learning phase fully
  13. References • Efficient Implementation of Neural Network Systems Built on

    FPGAs, and Programmed with OpenCL • • FPGAs on Mars • • FPGAs Challenge GPUs as a Platform for Deep Learning • learning/ • Accelerating Deep Convolutional Neural Networks Using Specialized Hardware • • Banalized Neural Networks •