Saturday, October 11, 2014

View Interpolation (forward and backward mapping/warping)

Given a stereo pair and two depth maps, the problem of getting an intermediate frame is known as view interpolation. The following blurb is very inspired by this academic paper: Fast View Interpolation from Stereo: Simpler can be Better by N. Martin and S. Roy. We are gonna look at two ways to get the interpolated image: forward mapping (warping) and backward mapping (warping). In both cases, an interpolated image using the left image and depth map and an interpolated image using the right image and depth map are built and they are combined to form the final interpolated image. The interpolated image is defined by the parameter alpha (alpha = 0 corresponds to the left image and alpha = 1 corresponds to the right image).

Let's look at how the left and right interpolated images are defined:

Maybe it's easier to grasp if you think about shifting pixels. To get the left interpolated image, you shift the left image's pixels to the left according to the left depth map. To get the right interpolated image, you shift the right image's pixels to the right according to the right depth map. It's as simple as that.

Forward Mapping (Warping)

Here's the typical pseudo code to get the left interpolated image using forward mapping:

Clearly, the term xL-alpha*dL(xL) is not an integer in most cases. The easiest way to deal with this problem is to round it to the nearest integer. The hardest way is probably to add color contributions to the 2 nearest pixels on either side of xM'. This is way beyond the scope of this blurb but it is known as "splatting" if you want to delve into it. The resulting interpolated image will have cracks and holes. The cracks come from that nearest integer business (no big deal) and the holes come from scenes in the image that are now revealed.

The right interpolated image can be obtained in a similar fashion. As for the left interpolated image, the right interpolated image will exhibit cracks and holes. When the two are combined, it is hoped that all holes will be filled. In reality, it can be a little bit more complicated than that as, for a a given pixel of the interpolated image, one can decide whether the color information should come from the left image only, the right image only, or both.

As a side note, if you don't have a right depth map and therefore there is no right interpolated image, holes (coming from a left image and a left depth map) are usually filled by getting the first non-empty pixel to the right and using its color to fill the hole, line by line. It's easy to spot as it produces very distinctive streaks. Another option is to propagate its color toward the left but considering the whole image (as opposed to line by line).

Backward Mapping (Warping)

The idea behind backward mapping is that, given a pixel xM in the intermediate image, you want to be able to get its color by using linear interpolation on the left (right) image. Because of this, the interpolated image will be guaranteed to have no cracks or holes. It doesn't mean the interpolated image will be perfect. The holes that you would get with a forward mapping won't be there but their filling (inpainting) might not be the best.

Here's some pseudo code to get the left interpolated image using backward mapping (warping):

This above can be done scanline by scanline (Scanline is a fancy way of saying horizontal line.) There might be more than one segment that contains xM. In that case, it's a good idea to consider the segment corresponding to the largest disparity (the most foreground object). Also, the segment search needs only be done within the minimum and maximum disparity (times alpha) that corresponds to the stereo pair.

The right interpolated image can be obtained in a similar fashion. The two interpolated images are then combined to give the final interpolated image.


  1. Can you elaborate on the backward mapping approach?

    I get the first for loop: I create a new picture by shifting the pixels with the disparity to the other camera and save it as xM'.

    Then, for my empty image IM I go through every pixel xM, and check if my pixel xM, which is empty, is in some kind of range?
    That does not make sense; I only know the index of the pixel.
    Also, what is xL doing down there?

    1. The 1st loop creates some kind of lookup table that's gonna be used in the 2nd loop. x'M[] is an array linking xL (pixel in left image) to corresponding pixel in intermediate view called x'M (same name as the array which was probably unfortunate). The xL in the 2nd loop is the index of the array x'M that was created in the 1st loop.