Sunday, September 1, 2013

Stereo Matching and Graph Cuts

The practical use of the "graph cuts" methodology to solve the stereo matching problem was pioneered by Vladimir Kolmogorov. We are gonna try to explain in this very post how graph cuts can be used effectively to solve the stereo matching problem.

If we associate each pixel of image 1 (left image) with a (variable) disparity, figure out an energy formulation for the stereo matching problem and minimize it, we have pretty much solved the stereo matching problem. Whether or not this is gonna be a good stereo match depends on the energy formulation, knowing that there's not one single energy formulation that's 100 percent satisfactory for stereo matching. The problem at hand here is not the energy formulation but the minimization of the energy. This is where the "graph cuts" methodology comes in.

Let's start with a quite generic energy formulation (comes from Kolmogorov):


That energy has the classic data and smoothness terms. One could add an occlusion term (to penalize half-occluded pixels) but this is not discussed here.

Assuming that each pixel has received an initial disparity assignment (could be anything, really), let's consider what is called (by Kolmogorov) an expansion move, that is, any pixel can either keep its current disparity or change its disparity to alpha (chosen among the possible disparities, that is, any disparity between the minimum and maximum disparities, which are known). For each pixel p, let's consider a binary variable var that can either take the value 0 (keep disparity) or 1 (change disparity to alpha).

Looking at the data term, for a given pixel p with binary variable var, the energy can either be E0 (var=0) or E1 (var=1). For the smoothness term, for a given pixel p with binary variable var and a neighboring pixel np with binary variable nvar, the energy can either be E00 (var=0,nvar=0), E01 (var=0,nvar=1), E10 (var=1,nvar=0), or E11 (var=1,nvar=1). This means that our energy can be put in the form:


The energy described above is graph-representable according to Kolmogorov. If you want to use a different energy formulation, you sure can but you have to make sure the smoothness term (or any other term that depends on two variables) verifies the inequality shown above.

Maybe it's time to talk about graph cuts in the context of energy functions of binary variables, which is exactly what we are dealing with, here. The graph we are gonna construct has for nodes all the pixels whose disparities are not already alpha plus two special nodes: the source (S) and the sink (T). The edges of the graph are directed and carry non-negative weights. They are built for each pixel looking at the data and smoothness terms. We'll get to the graph edges later.

When you cut a graph, you partition the nodes into two sets, the source set and the sink set. The cost of the cut is equal to the sum of the weights carried by the edges that have their start node in the source set and their end node in the sink set. Any node in the source set has its binary variable set to 0 (because the source's binary variable is always 0) and any node in the sink set has its binary variable set to 1 (because the sink's binary variable is always 1). This state of binary variables defines an energy (recall that 0 means "keep disparity", 1 means "change disparity to alpha"). The cost of the cut is equal to that very energy. The minimum cut is the cut with the smallest cost, therefore it corresponds to the minimum energy for the expansion move under consideration. Any min cut/max flow algorithm will give you the minimum cut (not discussed here).

All that's left is to figure out how to add edges to the graph. This is done by looking (carefully) at each energy term.

Graph edge construction for the data term:


I also explain why you need to add an extra energy to the energy of the cut. It is done so that the energy of the cut always matches the energy of the system defined by the binary variables (that's the energy you compute using the energy formulation given above with the data term and smoothness term).

Graph edge construction for the smoothness term:


Note that it's not the only way to decompose the smoothness energy term. This particular decomposition comes from "Kolmogorov and Zabih's Graph Cuts Stereo Matching Algorithm" by Vladimir Kolmogorov, Pascal Monasse, and Pauline Tan.

One could look at all the possible cases (there are 8 possible cases) and check to make sure that the energy of the cut is exactly equal to the energy of the system defined by the binary variables. In the following, I am considering just one case but they all can be worked out in a similar fashion.


All that's left to do is find the minimum cut. Once that's accomplished, any node in the graph is either in the source set (var=0) or in the sink set (var=1). For any node in the sink set (var=1), the disparity of the corresponding pixel is changed to alpha. If you were to compute the new energy of the system, you would find that the new energy is exactly equal to the cost of the minimum cut (no surprise there since we have made sure that would be the case when building the graph). This is actually how you check your graph construction.

Recommended reading: "What Energy Functions Can Be Minimized via Graph Cuts" by Vladimir Kolmogorov and Ramin Zabih. Also, "Kolmogorov and Zabih's Graph Cuts Stereo Matching Algorithm" by Vladimir Kolmogorv, Pascal Monasse, and Pauline Tan.

No comments:

Post a Comment