## Saturday, May 24, 2014

### Fast Cost Volume Filtering for Stereo Matching

This is about a stereo matching procedure described in Fast Cost-Volume Filtering for Visual Correspondence and Beyond by Christoph Rhemann, Asmaa Hosni, Michael Bleyer, Carsten Rother, and Margrit Gelautz. Before talking about what's in the paper, let's try to understand how it (probably) came about.

The method described in Weighted Windows for Stereo Correspondence Search (it's called "Adaptive Support-Weight" or ASW in short) created quite a stir in the computer vision community because it was the first time somebody had implemented a local method that was on par with the best global methods. Unfortunately, the problem with ASW (and that's a big one) is that it can be terribly slow especially when the window radius gets large (needed when dealing with larger images). The crux of the ASW method is the use of a filter that's quite similar to the joint bilateral filter, a covolution filter that gives larger weights to the pixels that are similar in color and closer in space to the center pixel (around which the convolution is made). If only there existed a filter similar to the joint bilateral filter that could be implemented so that the running time is independent of the convolution window radius! Well, there is one, it's Guided Image Filtering. Now, how on Earth do you use the guided image filter for stereo correspondence? We'll get to this a bit later.

Usually, in local methods, for each pixel in the left image, one slides a window across the disparity range in the right image and compute some kind of aggregated matching cost at each disparity. When you aggregate the matching costs over a window, you are basically smoothing or filtering the matching costs. A Winner Takes All (WTA) approach is usually employed to get the most appropriate disparity. Another way to do this whole thing is to, for each disparity, compute the raw matching costs at each pixel (no aggregation) and then smooth the raw matching costs that you have just computed. Once you have smoothed all the slices of the cost volume (Each slice corresponds to a disparity.), the same WTA approach can be used to get the most appropriate disparity for each pixel. The advantage of the latter is that you can use any image filter you want to do the smoothing. Since we know that the guided filter can be implemented in such a way that its running time doesn't depend on the window radius, it is a very attractive way of doing local stereo correspondence. This is exactly what Christoph Rhemann et al. did with their fast cost volume filtering. Since the guided filter is quite similar to the joint bilateral filter (used in the adaptive support weights method), we can expect results that are similar and get there fast.

The following describes a bit better the duality between sliding a window across the right image and smoothing the cost volume:

The pseudo-code for the cost-volume filtering shown above is exactly what Christoph Rhemann et al. describe in their paper. Each slice of the cost volume is smoothed by the guided filter which behaves like the joint bilateral filter but is much faster (if the implementation is done right).

This is the raw matching cost Christoph Rhemann et al. use in the paper:

This shows how the weights for the guided filter look like (they look very similar to weights one would get with a bilateral filter):

A couple of disparity maps obtained by the fast cost-volume filtering procedure:

This methodology gets all the accuracy previously obtained by Kuk-Jin Yoon et al. in their adaptive support-weight (ASW) approach and the speed associated with the guided filter. This is, in my opinion, the best local stereo matching algorithm. It also competes quite well with the global stereo correspondence methods like those based on graph cuts.