Estimating depth from stereo vision cameras, i.e., "depth from stereo", is critical to emerging intelligent applications deployed in energy- and performance-constrained devices, such as augmented reality headsets and mobile autonomous robots. While existing stereo vision systems make trade-offs between accuracy, performance and energy-efficiency, we describe ASV, an accelerated stereo vision system that simultaneously improves both performance and energy-efficiency while achieving high accuracy.
The key to ASV is to exploit unique characteristics inherent to stereo vision, and apply stereo-specific optimizations, both algorithmically and computationally. We make two contributions. Firstly, we propose a new stereo algorithm, invariant-based stereo matching (ISM), that achieves significant speedup while retaining high accuracy. The algorithm combines classic “hand-crafted” stereo algorithms with recent developments in Deep Neural Networks (DNNs), by leveraging the correspondence invariant unique to stereo vision systems. Secondly, we observe that the bottleneck of the ISM algorithm is the DNN inference, and in particular the deconvolution operations that introduce massive compute-inefficiencies. We propose a set of software optimizations that mitigate these inefficiencies. We show that with less than 0.5% hardware area overhead, these algorithmic and computational optimizations can be effectively integrated within a conventional DNN accelerator. Overall, ASV achieves 5× speedup and 85% energy saving with 0.02% accuracy loss compared to today’s DNN-based stereo vision systems.