PCL Developers blog

Jilliam Diaz

project:Stereo module improvement
mentor:Federico Tombari

About me

I am a master student in computer vision at the University of Burgundy. I am currently a GSoC participant who is improving the stereo module, adding new algorithms to solve the stereo correspondence problem.

Project description

The main objective in the present project is to implement additional stereo matching algorithms, based on local methods for cost aggregation.

The algorithms are based on the following papers:

[1] Yoon et al., “Locally Adaptive Support-Weight Approach for Visual Correspondence Search”. [2]Tombari et al., “Segmentation-based adaptive support for accurate stereo correspondence”. [3] Min et al., “A revisit to cost aggregation in stereo matching: how far can we reduce its computational redundancy”.

Recent status updates

2. Implementation of local-based approach for stereo matching
Saturday, June 28, 2014


In this post, I will briefly describe the current state of the stereo module and the new features added.

Currently, the stereo module encompass two matching local-based algorithms: 1. Block-based algorithm, which is programed using the Box-Filtering algorithm proposed in [McDonnell81]. 2. Adaptive Cost 2-pass Scanline Optimization, presented in [Wang06]. Both methods use the Sum of Absolute Differences (SAD) as the dissimilarity measure.

As mentioned in the previous blog, the first objective of the present project is to implement the local-based approach proposed in [Min1], for dense correspondence estimation in a pair of grayscale rectified images with an efficient cost aggregation step. Additionally, the cost aggregation step in based on the method presented in [Yoon06], where the weighting function uses a similarity measure based on the color and spatial distances.


In order to do so, a new class CompactRepresentationStereoMatching was created in the stereo module. This class inherits from class GrayStereoMatching, which in turns inherits from class StereoMatching, since some pre and post-processing methods are re-implemented. The new class has five member functions with public access: setRadius, set FilterRadius and setNumDispCandidates, setGammaS, setGammaC, which set three data members of type int (radius, filter_radius and num_disp_candidates) and two of type double (gamma_c and gamma_s) with private access, as well as implementing the virtual method compute_impl.

radius corresponds to the radius of the cost aggregation window, with default value equal to 5.

filter_radius corresponds to the radius of the box filter used for the computation of the likelihood function. The default value is 5.

num_disp_candidates is the number of the subset of the disparity hypotheses used for the cost aggregation. The default value is 60.

gamma_c is the spatial bandwidth used for cost aggregation based on adaptive weights. The default value is 15.

gamma_s is the color bandwidth used for cost aggregation based on adaptive weights. The default value is 25.

Similarly to the previous methods, the current class is based on the SAD matching function, and it estimates the per-pixel cost efficiently using the Box-Filtering algorithm.

To test the algorithm, the Middlebury stereo benchmark (http://vision.middlebury.edu/stereo/) dataset is going to be used.

[McDonnell81]McDonnell, M. J. “Box-filtering techniques”. Computer Graphics and Image Processing 17.1, 65-70, 1981.
[Wang06]Wang, Liang, et al. “High-quality real-time stereo using adaptive cost aggregation and dynamic programming.” 3D Data Processing, Visualization, and Transmission, Third International Symposium on. IEEE, 2006.
[Yoon06]K.-J. Yoon and I.-S. Kweon. “Locally Adaptive Support-Weight Approach for Visual Correspondence Search”. In Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 924–931, 2005.
1. Reduction of computational redundancy in cost aggregation in stereo matching.
Saturday, June 21, 2014


A stereo image pair can be used to estimate the depth of a scene. To do so, it is necessary to perform pixel matching and find the correspondences in both images. Different methods for stereo correspondence have been proposed and they are classified in two classes:

  • Correlation-based algorithms: Produce a dense set of correspondences.
  • Feature-based algorithms: Produce a sparse set of correspondences.

Additionally, correlation-based algorithms are usually classified in two main groups, local (window-based) or global algorithms. However, some methods do not fit into any group, and are classified in between them.

The current work is based on correlation-based algorithms, more espefically local and window based-methods, intended for applications where a dense and fast output is required.

The input of the algorithm are two calibrated images, i.e. the camera geometry is known. The images are also rectified in order to limit the correspondence to a 1D search.


The general methodology for stereo vision local approaches can be summarized as follows. An energy cost is computed for every pixel p by using the reference and d-shifted right images:

(1)e \left(p,d \right) = min \left(|I_{l}(x,y)-I_{r}(x-d,y)|, \sigma \right)

Then, the aggregated cost is computed by an adaptive sum of the per-pixel cost:

(2)E(p,d) = \dfrac{\displaystyle \sum_{q \in N(p)}w(p,q)e(q,d)}{\displaystyle \sum_{q \in N(p)}w(p,q)}

Finally, a Winner-Takes-All method is used to find the best of all the disparity hypothesis:

(3)d(p) = argmin\{ E(p,d), d \in [ 0,..,D-1 ] \}

This whole process is complex and time consuming since it is repeated for every hypothesis d. A representation of the conventional approaches can be observed in next figure [Min1].


Min et al. [Min1] introduced a new methodology to reduce the complexity, by finding a compact representation of the per-pixel likelihood, assuming that low values do not provide really informative support. In this case, only a pre-defined number of disparity candidates per pixel are selected to perform the cost aggregation step. The subset of disparity hypotheses correspond to the local maxima points in the profile of the likelihood function, previously pre-filtered to reduce the noise, as shown in the following example:


The disparity hypotheses estimation and cost aggregation processes proposed by Min et al. are depicted in the next figure, where Sc is the subset of disparity hypothesis with size Dc:

[Min1]Min, D., Lu, J., & Do, M. N. “A revisit to cost aggregation in stereo matching: How far can we reduce its computational redundancy?.” In IEEE International Conference on Computer Vision (ICCV), 2011 (pp. 1567-1574).