## Sunday, September 15, 2013

### Stereo Matching and Graph Cuts (Part 2)

This is a follow-up to Stereo Matching and Graph Cuts.

We are gonna look at how to build the (proper) graph to solve the problem of stereo correspondence with occlusions. This is basically an explanation of the graph construction process in the source code that accompanies "Kolmogorov and Zabih's Graph Cuts Stereo Matching Algorithm" by Vladimir Kolmogorov, Pascal Monasse, and Pauline Tan. That code is a clever implementation of "Computing Visual Correspondence with Occlusions using Graph Cuts" by Vladimir Kolmogorov and Rabin Zabih. This is probably the best Graph Cuts stereo matching algorithm out there in terms of graph construction because it handles occlusions directly. As a reminder, occlusion maps are usually obtained by generating left and right disparity maps and performing a left-right consistency check between the disparities. Here, the occlusion map is obtained with the disparity map without doing a left-right consistency check. The paper that accompanies the code is great in its own right but the binary variable setup is completely different from what is actually in the code. I don't think it's a bad idea to go through the code and (try to) explain what's going on.

In the code, if p is a pixel in the left image with disparity d, the matching pixel in the right image is p+d. Here, I use a different convention for the disparity, so the matching pixel in the right image is p-d. Basically, I use a different sign for the disparity. It really doesn't make any difference whatsoever.

Let's first look at the energy that needs to be minimized (this corresponds to Match::ComputeEnergy() in the code):

The data_occlusion_penalty function looks something like this:

data_occlusion_penalty(p,d)= 0 if d is occluded
data_occlusion_penalty(p,d)= data_penalty(p,d)-K otherwise

where data_penalty(p,d) can be any data penalty energy function (=0 when p and p-d are considered exact matches in terms of intensity). Now, you might think it would probably make more sense to define the data_occlusion_penalty function as:

data_occlusion_penalty(p,d)= K if d is occluded
data_occlusion_penalty(p,d)= data_penalty(p,d) otherwise

but no, that wouldn't work unless you force var0=1 and varA=1 to be incompatible, which is not something you wanna do (trust me, I tried).

The smoothness_penalty function looks something like this:

The smoothness penalty is higher when the intensities are close to each other, which makes a lot of sense since one would expect two neighboring pixels of similar intensities to have the same disparities.

Now that we have clearly defined our energy to be minimized by Graph Cuts, we need to define the graph nodes (the binary variables) and construct the graph so that, whatever the states of the binary variables (0 or 1), the energy of the cuts in the graph is always equal to the energy computed with ComputeEnergy(). This is referred to as a sanity check in the code, for good reasons. We are gonna minimize the energy using the alpha expansion method which considers a switch to the disparity alpha, where alpha can be any possible disparity between the minimum and maximum disparities.

In the following, the function add_constant(E) adds the energy E to the graph. It doesn't depend on any binary variable. The function add_term1(var,E0,E1) adds the energy E0 if var=0, E1 if var=1. It depends on one binary variable (var). The function add_term2(varA,varB,E00,E01,E10,E11) adds the energy E00 if varA=0 and varB=0, E01 if varA=0 and varB=1, E10 if varA=1 and varB=0, E11 if varA=1 and varB=1. It depends on two binary variables (varA and varB).

We are gonna use two graph nodes per pixel:

Node (binary variable) var0 with var0=0 when the pixel p keeps its disparity d and var0=1 if the pixel becomes occluded (d=occluded)
Node (binary variable) varA with varA=0 when the pixel p keeps its disparity d and varA=1 if the pixel changes its disparity to alpha

The binary variable var0 can be active or inactive, present or non-present. Var0 is active when d is already alpha, inactive otherwise. Var0 is present if d is not occluded, non-present otherwise.
The binary variable varA can be active or inactive, present or non-present. VarA is active when d is already alpha, inactive otherwise. VarA is present if p-alpha is in bounds, non-present otherwise.

Let's build the graph nodes and take care of adding data_occlusion_penalty(p,d) (this corresponds to Match::build_nodes() in the code):

This is done for each pixel p. It's pretty clear we can't allow the pixel p to have a disparity d and a disparity alpha, therefore we need to prevent the case (var0=0,varA=1) from ever happening. This is enforced in Match::build_uniqueness_LR():

Now, the fun part is to make sure that whatever the state of the graph node variables (0 or 1), the graph energy always matches the energy computed with ComputeEnergy(). This is what we are looking at here for the most general case (other cases should be looked at in a similar fashion for good measure):

We can clearly see that the energy obtained with build_nodes() matches the energy that we would obtain with ComputeEnergy(), which is a really good thing.

Let's take care of the smoothness terms (this corresponds to Match::build_smoothness() in the code):

This is done for each pixel p and its neighboring pixel np. What's going on regarding the smoothness terms smoothness_penalty(p,np,d) and smoothness_penalty(p,np,nd) is not that clear, to say the least. Here's the explanation for smoothness_penalty(p,np,d):

Here's the explanation for smoothness_penalty(p,np,nd):

We need to force some things in order for the occlusions to show up properly. You don't want to allow a situation where you have, at the same time, a pixel p with disparity d (with matching pixel p-d in the right image) and another pixel p' switching to disparity alpha where p'-alpha happens to be equal to p-d. Indeed, the two pixels would end up having the same matching pixel in the right image, which violates the matching uniqueness. The uniqueness of matches is being enforced in Match::build_uniqueness_RL() in the code and it's reproduced here:

This is done for each pixel in the right image. Note that the code always keeps track of disparities in the right image as expansion moves are performed (if pixel p in the left image has disparity d, pixel p-d in the right image has disparity -d.)

That's about it, really. Now, if you think that having two sets of binary variables seems unnecessary and that having just one set of binary variables and letting expansion moves go to the occluded state would work just about the same, well, you'd be wrong (I know, I've tried). What would happen is that even though you would reach a lower energy state, a good number of pixels would be given a disparity even though they should really be occluded. The key to proper occlusion detection is enforcing the right to left uniqueness, and this can only be done with two sets of variables (I think.)