Wednesday, September 25, 2013

Frame Sequence Generator 3 (FSG3)

Frame Sequence Generator 3 (FSG3) generates synthesized frames given a left image (image 1), a disparity/depth map, and the min and max disparities (needed to compute the actual disparity from the depth map). The basic idea is, for each frame, to shift the pixels in the left image according to the disparity map. This shifted image contains "holes" (areas made up of pixels that have become disoccluded or disoccluded areas, in short) which need to be inpainted (inpainting is a common theme in computer vision. It's pretty easy to understand why you have "holes" showing up when you shift a reference image according to a disparity/depth map. Consider a picture with an object in the foreground with disparity 0 and a background. When you shift the background, "holes" will form on the immediate right of the foreground object. FSG4 fills the "holes" by propagating the pixels to the immediate right of the disoccluded areas using a methodology similar to Semi-Automatic 2D to 3D Image Conversion using Random Walks. This type of inpainting is quite a bit fancier than what's being done in Frame Sequence Generator 2 (FSG2) and therefore, FSG3 should be used in lieu of FSG2.

If imageL.xxx is the left image (image 1), the frames generated will be named frame1.xxx, frame2.xxx, etc. The last frame corresponds to the right image. Animated gifs or lenticular prints (see Lenticular Printing - Interlacing with SuperFlip) can be made with imageL.xxx, frame1.xxx, frame2.xxx, etc (don't use the actual right image). The program automatically creates an animated gif of those images after generating and saving the frames.

The disparity/depth map should be a grayscale image where white indicates closest foreground object and black indicates furthest background object. The darkest value in the depthmap corresponds to the minimum disparity and the brightest value corresponds to the maximum disparity. The disparity/depth map doesn't need to come from any of my programs.

The quality of the inpainting heavily depends on the accuracy of the depth map at object boundaries.

Here's an example:


Left image of the "Art" stereo pair (courtesy of Middlebury).


Depth/disparity map.


Disoccluded areas ("holes") which result from the shifting of the reference image (left image) to get the "right" image (generate 1 frame).


Right image obtained with FSG3.

The windows executable (guaranteed to be virus free) is available for free via the 3D Software Page. Please, refer to the 'Help->About' page in the actual program for how to use it.

Monday, September 23, 2013

Depth Map Automatic Generator 3 (DMAG3)

DMAG3 is an automatic depth/disparity map generator based on the Graph Cuts methodology. It is an implementation of the algorithm described in "Computing Visual Correspondence with Occlusions using Graph Cuts" by Vladimir Kolmogorov and Rabin Zabih. Please refer to Stereo Matching and Graph Cuts and Stereo Matching and Graph Cuts (Part 2) for details about the implementation of such an algorithm, which, by the way, would have not been possible without the help of "Kolmogorov and Zabih's Graph Cuts Stereo Matching Algorithm" by Vladimir Kolmogorov, Pascal Monasse, and Pauline Tan and the accompanying code. I didn't code the actual mincut/maxflow solver. It comes straight from Vladimir Kolmogorov's software library (big thanks to him for providing that).

I recommend having a look at Depth Map Generation using Graph Cuts to get a better grasp of what DMAG3 does and, if nothing else, the meaning of the parameters that control DMAG3's behavior.

The input to DMAG3 is a pair of rectified images as well as the minimum and maximum disparities. The minimum and maximum disparities can be obtained manually with DF2 or automatically with ER9b. I always recommend using ER9b because it rectifies the images but you can also use Stereo PhotoMaker to align the images (although it won't be as good). The output of DMAG3 is the left and right depth maps as well as the left and right occlusion maps.

I think the only parameters to worry about are K and lambda. K is the occlusion penalty. Lambda is the multiplier for the smoothness penalty. For starters, I recommend letting DMAG3 compute K and lambda by setting them both to 0. These 2 parameters should then be modified (if needed) based upon how the left depth map and left occlusion map look. If the occlusion maps look a bit too busy (occluded pixels should only show up at object boundaries), increase K. If the depth maps looks a bit too noisy, increase lambda. If the images are large, I recommend using ds_factor (downsampling factor) = 2 (or more) in order to speed up DMAG3 (tremendously).

Here is an example:


Rectified left image of stereo pair called IMAG0013.


Left depth map obtained by DMAG3.


Left occlusion map obtained by DMAG3.

Here's another example:


Rectified left image of stereo pair called IMAG0020.


Left depth map obtained by DMAG3.


Left occlusion map obtained by DMAG3.

The windows executable (guaranteed to be virus free) is available for free via the 3D Software Page. Please, refer to the 'Help->About' page in the actual program for how to use it.

Saturday, September 21, 2013

Weighted Windows for Stereo Correspondence Search

We have previously looked into local methods to solve the stereo correspondence problem using windows (blocks) but we have kind of assumed that the contribution of a pixel around a reference pixel always carry the same weight, in other words, any pixel in the window contributes the same whether it's near or far from the reference pixel and/or its intensity is similar to or different from the reference pixel. This is something we are gonna delve into next thanks to "Adaptive Support-Weight Approach for Correspondence Search" by Kuk-Jin Yoon and In So Kweon. This is akin to adaptive-window methods but handled in a much simpler and natural way.

The idea is to give pixels in a window centered on a reference pixel a weight that depends on spatial proximity and color similarity. This allows the use of a fixed window size unlike the usual adaptive-window methods. You can find the very same methodology in bilateral filtering which is commonly used as an edge preserving blurring/smoothing tool in image processing.

Given a reference pixel p and a neighboring pixel q, this is how the support weight w(p,q) is computed:


Clearly, support weights will vary between 0 and 1. A weight is closer to the value 1 (maximum weight) if its color is similar to that of the reference pixel and it is near the reference pixel. A weight is closer to the value 0 (minimum weight) if its color is not similar to that of the reference pixel and/or it is far from the reference pixel.

Let's look at how the weight distribution looks like in real-life examples (those are from the paper):


As you can see, the brighter pixels are the pixels which are similar in color to the reference pixel (marked by a rectangle). As you move away from the reference pixel, pixels that are similar in color get darker. Recall that brighter pixels have the heaviest weight and contribute the most to the matching cost. It's like adapting the window contour to the object the reference pixel is a part of. As the color similarity parameter (gamma_c) increases, the bright part of the window gets larger, in other words, more of the window carries a significant weight. It's a bit like increasing the window size in the usual non weight-based methods.

Given a pixel p and a disparity d, the matching cost for a given window size is given by:


This matching cost based on weighted correlation windows can be used in local methods (Winner Takes All approach) but also in global methods (as the data term).

As an update to this little write-up (updated 01/14/2014), I would like to go a bit deeper into the color similarity parameter gamma_c and the proximity parameter gamma_p. When looking at the neighborhood of a given pixel in the context of stereo matching, it is kind of assumed that the neighboring pixels should all be at the same depth, in other words, the pixels within a window/block should all be at the same depth. Obviously, we don't know that beforehand since that's we are trying to determine (the depth). So, it is often assumed that neighboring pixels with different colors are likely to be at different depths. It is clearly not always the case (neighboring pixels with different colors could be at the same depth and neighboring pixels with similar colors could be at different depths). It is however not a bad assumption to make when you want to avoid the possible fattening effect ("foreground-fattening") at boundaries between neighboring zones at different depths, a pretty common occurrence in stereo matching algorithms. So, if you look into a neighborhood or window, you want to see weights that are high (close to 1) when the neighboring pixel is similar in color to the center pixel and weights that are low (close to 0) when the neighboring pixel is not similar in color to the center pixel. This is controlled by the color similarity parameter gamma_c. What you also want is a reduction of the weight as you go away from the center pixel. This is controlled by the proximity parameter gamma_p.

Let's look at the following neighborhood window/block (the reference pixel is at the center), play around with gamma_c and gamma_p, and see what kind of effects it has on the pixel weights:


Let's first focus on the color similarity parameter gamma_c. If you set gamma_p to some very large value, it has no effect on the weights (the weights solely come from the color similarity term.) In the following series of pictures, I am only changing gamma_c and see the effect it has on the weights (black means weight equals 0, white means weight equals 1.) What you wanna see is white where there is red and various shades of gray anywhere else (the further away the color is from red, the blacker it should be):


radius=12, gamma_p=1,000,000, gamma_c=200


radius=12, gamma_p=1,000,000, gamma_c=100


radius=12, gamma_p=1,000,000, gamma_c=50


radius=12, gamma_p=1,000,000, gamma_c=20


radius=12, gamma_p=1,000,000, gamma_c=5

Let's now focus on the proximity parameter gamma_p. If you set gamma_c to some very large value, it has no effect on the weights (the weights come solely from the proximity term.) In the following series of pictures, I am only changing gamma_p and see the effect it has on the weights (black means weight equals 0, white means weight equals 1.) What you wanna see is white at the reference pixel and darker shades of gray as you move away from the center:


radius=12, gamma_c=1,000,000, gamma_p=200


radius=12, gamma_c=1,000,000, gamma_p=100


radius=12, gamma_c=1,000,000, gamma_p=50


radius=12, gamma_c=1,000,000, gamma_p=20


radius=12, gamma_c=1,000,000, gamma_p=5

It is worth noting that if both gamma_c and gamma_p are set to large values, all weights are equal to 1 and the method is basically a classic window-based or block-based stereo matching method.

This last picture shows what happens when you combine the color similarity and proximity effects (color similarity weights multiplied by the proximity weights):


radius=12, gamma_c=100, gamma_p=20

Sunday, September 15, 2013

Stereo Matching and Graph Cuts (Part 2)

This is a follow-up to Stereo Matching and Graph Cuts.

We are gonna look at how to build the (proper) graph to solve the problem of stereo correspondence with occlusions. This is basically an explanation of the graph construction process in the source code that accompanies "Kolmogorov and Zabih's Graph Cuts Stereo Matching Algorithm" by Vladimir Kolmogorov, Pascal Monasse, and Pauline Tan. That code is a clever implementation of "Computing Visual Correspondence with Occlusions using Graph Cuts" by Vladimir Kolmogorov and Rabin Zabih. This is probably the best Graph Cuts stereo matching algorithm out there in terms of graph construction because it handles occlusions directly. As a reminder, occlusion maps are usually obtained by generating left and right disparity maps and performing a left-right consistency check between the disparities. Here, the occlusion map is obtained with the disparity map without doing a left-right consistency check. The paper that accompanies the code is great in its own right but the binary variable setup is completely different from what is actually in the code. I don't think it's a bad idea to go through the code and (try to) explain what's going on.

In the code, if p is a pixel in the left image with disparity d, the matching pixel in the right image is p+d. Here, I use a different convention for the disparity, so the matching pixel in the right image is p-d. Basically, I use a different sign for the disparity. It really doesn't make any difference whatsoever.

Let's first look at the energy that needs to be minimized (this corresponds to Match::ComputeEnergy() in the code):


The data_occlusion_penalty function looks something like this:

data_occlusion_penalty(p,d)= 0 if d is occluded
data_occlusion_penalty(p,d)= data_penalty(p,d)-K otherwise

where data_penalty(p,d) can be any data penalty energy function (=0 when p and p-d are considered exact matches in terms of intensity). Now, you might think it would probably make more sense to define the data_occlusion_penalty function as:

data_occlusion_penalty(p,d)= K if d is occluded
data_occlusion_penalty(p,d)= data_penalty(p,d) otherwise

but no, that wouldn't work unless you force var0=1 and varA=1 to be incompatible, which is not something you wanna do (trust me, I tried).

The smoothness_penalty function looks something like this:


The smoothness penalty is higher when the intensities are close to each other, which makes a lot of sense since one would expect two neighboring pixels of similar intensities to have the same disparities.

Now that we have clearly defined our energy to be minimized by Graph Cuts, we need to define the graph nodes (the binary variables) and construct the graph so that, whatever the states of the binary variables (0 or 1), the energy of the cuts in the graph is always equal to the energy computed with ComputeEnergy(). This is referred to as a sanity check in the code, for good reasons. We are gonna minimize the energy using the alpha expansion method which considers a switch to the disparity alpha, where alpha can be any possible disparity between the minimum and maximum disparities.

In the following, the function add_constant(E) adds the energy E to the graph. It doesn't depend on any binary variable. The function add_term1(var,E0,E1) adds the energy E0 if var=0, E1 if var=1. It depends on one binary variable (var). The function add_term2(varA,varB,E00,E01,E10,E11) adds the energy E00 if varA=0 and varB=0, E01 if varA=0 and varB=1, E10 if varA=1 and varB=0, E11 if varA=1 and varB=1. It depends on two binary variables (varA and varB).

We are gonna use two graph nodes per pixel:

Node (binary variable) var0 with var0=0 when the pixel p keeps its disparity d and var0=1 if the pixel becomes occluded (d=occluded)
Node (binary variable) varA with varA=0 when the pixel p keeps its disparity d and varA=1 if the pixel changes its disparity to alpha

The binary variable var0 can be active or inactive, present or non-present. Var0 is active when d is already alpha, inactive otherwise. Var0 is present if d is not occluded, non-present otherwise.
The binary variable varA can be active or inactive, present or non-present. VarA is active when d is already alpha, inactive otherwise. VarA is present if p-alpha is in bounds, non-present otherwise.

Let's build the graph nodes and take care of adding data_occlusion_penalty(p,d) (this corresponds to Match::build_nodes() in the code):


This is done for each pixel p. It's pretty clear we can't allow the pixel p to have a disparity d and a disparity alpha, therefore we need to prevent the case (var0=0,varA=1) from ever happening. This is enforced in Match::build_uniqueness_LR():


Now, the fun part is to make sure that whatever the state of the graph node variables (0 or 1), the graph energy always matches the energy computed with ComputeEnergy(). This is what we are looking at here for the most general case (other cases should be looked at in a similar fashion for good measure):


We can clearly see that the energy obtained with build_nodes() matches the energy that we would obtain with ComputeEnergy(), which is a really good thing.

Let's take care of the smoothness terms (this corresponds to Match::build_smoothness() in the code):



This is done for each pixel p and its neighboring pixel np. What's going on regarding the smoothness terms smoothness_penalty(p,np,d) and smoothness_penalty(p,np,nd) is not that clear, to say the least. Here's the explanation for smoothness_penalty(p,np,d):


Here's the explanation for smoothness_penalty(p,np,nd):


We need to force some things in order for the occlusions to show up properly. You don't want to allow a situation where you have, at the same time, a pixel p with disparity d (with matching pixel p-d in the right image) and another pixel p' switching to disparity alpha where p'-alpha happens to be equal to p-d. Indeed, the two pixels would end up having the same matching pixel in the right image, which violates the matching uniqueness. The uniqueness of matches is being enforced in Match::build_uniqueness_RL() in the code and it's reproduced here:


This is done for each pixel in the right image. Note that the code always keeps track of disparities in the right image as expansion moves are performed (if pixel p in the left image has disparity d, pixel p-d in the right image has disparity -d.)

That's about it, really. Now, if you think that having two sets of binary variables seems unnecessary and that having just one set of binary variables and letting expansion moves go to the occluded state would work just about the same, well, you'd be wrong (I know, I've tried). What would happen is that even though you would reach a lower energy state, a good number of pixels would be given a disparity even though they should really be occluded. The key to proper occlusion detection is enforcing the right to left uniqueness, and this can only be done with two sets of variables (I think.)

Sunday, September 1, 2013

Stereo Matching and Graph Cuts

The practical use of the "graph cuts" methodology to solve the stereo matching problem was pioneered by Vladimir Kolmogorov. We are gonna try to explain in this very post how graph cuts can be used effectively to solve the stereo matching problem.

If we associate each pixel of image 1 (left image) with a (variable) disparity, figure out an energy formulation for the stereo matching problem and minimize it, we have pretty much solved the stereo matching problem. Whether or not this is gonna be a good stereo match depends on the energy formulation, knowing that there's not one single energy formulation that's 100 percent satisfactory for stereo matching. The problem at hand here is not the energy formulation but the minimization of the energy. This is where the "graph cuts" methodology comes in.

Let's start with a quite generic energy formulation (comes from Kolmogorov):


That energy has the classic data and smoothness terms. One could add an occlusion term (to penalize half-occluded pixels) but this is not discussed here.

Assuming that each pixel has received an initial disparity assignment (could be anything, really), let's consider what is called (by Kolmogorov) an expansion move, that is, any pixel can either keep its current disparity or change its disparity to alpha (chosen among the possible disparities, that is, any disparity between the minimum and maximum disparities, which are known). For each pixel p, let's consider a binary variable var that can either take the value 0 (keep disparity) or 1 (change disparity to alpha).

Looking at the data term, for a given pixel p with binary variable var, the energy can either be E0 (var=0) or E1 (var=1). For the smoothness term, for a given pixel p with binary variable var and a neighboring pixel np with binary variable nvar, the energy can either be E00 (var=0,nvar=0), E01 (var=0,nvar=1), E10 (var=1,nvar=0), or E11 (var=1,nvar=1). This means that our energy can be put in the form:


The energy described above is graph-representable according to Kolmogorov. If you want to use a different energy formulation, you sure can but you have to make sure the smoothness term (or any other term that depends on two variables) verifies the inequality shown above.

Maybe it's time to talk about graph cuts in the context of energy functions of binary variables, which is exactly what we are dealing with, here. The graph we are gonna construct has for nodes all the pixels whose disparities are not already alpha plus two special nodes: the source (S) and the sink (T). The edges of the graph are directed and carry non-negative weights. They are built for each pixel looking at the data and smoothness terms. We'll get to the graph edges later.

When you cut a graph, you partition the nodes into two sets, the source set and the sink set. The cost of the cut is equal to the sum of the weights carried by the edges that have their start node in the source set and their end node in the sink set. Any node in the source set has its binary variable set to 0 (because the source's binary variable is always 0) and any node in the sink set has its binary variable set to 1 (because the sink's binary variable is always 1). This state of binary variables defines an energy (recall that 0 means "keep disparity", 1 means "change disparity to alpha"). The cost of the cut is equal to that very energy. The minimum cut is the cut with the smallest cost, therefore it corresponds to the minimum energy for the expansion move under consideration. Any min cut/max flow algorithm will give you the minimum cut (not discussed here).

All that's left is to figure out how to add edges to the graph. This is done by looking (carefully) at each energy term.

Graph edge construction for the data term:


I also explain why you need to add an extra energy to the energy of the cut. It is done so that the energy of the cut always matches the energy of the system defined by the binary variables (that's the energy you compute using the energy formulation given above with the data term and smoothness term).

Graph edge construction for the smoothness term:


Note that it's not the only way to decompose the smoothness energy term. This particular decomposition comes from "Kolmogorov and Zabih's Graph Cuts Stereo Matching Algorithm" by Vladimir Kolmogorov, Pascal Monasse, and Pauline Tan.

One could look at all the possible cases (there are 8 possible cases) and check to make sure that the energy of the cut is exactly equal to the energy of the system defined by the binary variables. In the following, I am considering just one case but they all can be worked out in a similar fashion.


All that's left to do is find the minimum cut. Once that's accomplished, any node in the graph is either in the source set (var=0) or in the sink set (var=1). For any node in the sink set (var=1), the disparity of the corresponding pixel is changed to alpha. If you were to compute the new energy of the system, you would find that the new energy is exactly equal to the cost of the minimum cut (no surprise there since we have made sure that would be the case when building the graph). This is actually how you check your graph construction.

Recommended reading: "What Energy Functions Can Be Minimized via Graph Cuts" by Vladimir Kolmogorov and Ramin Zabih. Also, "Kolmogorov and Zabih's Graph Cuts Stereo Matching Algorithm" by Vladimir Kolmogorv, Pascal Monasse, and Pauline Tan.