## Saturday, October 11, 2014

### View Interpolation (forward and backward mapping/warping)

Given a stereo pair and two depth maps, the problem of getting an intermediate frame is known as view interpolation. The following blurb is very inspired by this academic paper: Fast View Interpolation from Stereo: Simpler can be Better by N. Martin and S. Roy. We are gonna look at two ways to get the interpolated image: forward mapping (warping) and backward mapping (warping). In both cases, an interpolated image using the left image and depth map and an interpolated image using the right image and depth map are built and they are combined to form the final interpolated image. The interpolated image is defined by the parameter alpha (alpha = 0 corresponds to the left image and alpha = 1 corresponds to the right image).

Let's look at how the left and right interpolated images are defined:

Maybe it's easier to grasp if you think about shifting pixels. To get the left interpolated image, you shift the left image's pixels to the left according to the left depth map. To get the right interpolated image, you shift the right image's pixels to the right according to the right depth map. It's as simple as that.

# Forward Mapping (Warping)

Here's the typical pseudo code to get the left interpolated image using forward mapping:

Clearly, the term xL-alpha*dL(xL) is not an integer in most cases. The easiest way to deal with this problem is to round it to the nearest integer. The hardest way is probably to add color contributions to the 2 nearest pixels on either side of xM'. This is way beyond the scope of this blurb but it is known as "splatting" if you want to delve into it. The resulting interpolated image will have cracks and holes. The cracks come from that nearest integer business (no big deal) and the holes come from scenes in the image that are now revealed.

The right interpolated image can be obtained in a similar fashion. As for the left interpolated image, the right interpolated image will exhibit cracks and holes. When the two are combined, it is hoped that all holes will be filled. In reality, it can be a little bit more complicated than that as, for a a given pixel of the interpolated image, one can decide whether the color information should come from the left image only, the right image only, or both.

As a side note, if you don't have a right depth map and therefore there is no right interpolated image, holes (coming from a left image and a left depth map) are usually filled by getting the first non-empty pixel to the right and using its color to fill the hole, line by line. It's easy to spot as it produces very distinctive streaks. Another option is to propagate its color toward the left but considering the whole image (as opposed to line by line).

# Backward Mapping (Warping)

The idea behind backward mapping is that, given a pixel xM in the intermediate image, you want to be able to get its color by using linear interpolation on the left (right) image. Because of this, the interpolated image will be guaranteed to have no cracks or holes. It doesn't mean the interpolated image will be perfect. The holes that you would get with a forward mapping won't be there but their filling (inpainting) might not be the best.

Here's some pseudo code to get the left interpolated image using backward mapping (warping):

This above can be done scanline by scanline (Scanline is a fancy way of saying horizontal line.) There might be more than one segment that contains xM. In that case, it's a good idea to consider the segment corresponding to the largest disparity (the most foreground object). Also, the segment search needs only be done within the minimum and maximum disparity (times alpha) that corresponds to the stereo pair.

The right interpolated image can be obtained in a similar fashion. The two interpolated images are then combined to give the final interpolated image.

1. Can you elaborate on the backward mapping approach?

I get the first for loop: I create a new picture by shifting the pixels with the disparity to the other camera and save it as xM'.

Then, for my empty image IM I go through every pixel xM, and check if my pixel xM, which is empty, is in some kind of range?
That does not make sense; I only know the index of the pixel.
Also, what is xL doing down there?

1. The 1st loop creates some kind of lookup table that's gonna be used in the 2nd loop. x'M[] is an array linking xL (pixel in left image) to corresponding pixel in intermediate view called x'M (same name as the array which was probably unfortunate). The xL in the 2nd loop is the index of the array x'M that was created in the 1st loop.

2. Thank you for your articles, very informative!

I'm wondering what you think of the optical flow approach (for example, used by slowmo - https://github.com/slowmoVideo/slowmoVideo/wiki). Is that a good approach for generating intermediate views? How does it compare with the approach described in this article?

2. This an answer to the question below ...
For generating intermediate frames between left and right images of stereo pair, and assuming you have the left and right depthmaps, i don't think you can do much better than what is presented here. Optical flow is much more general approach and a much more difficult problem, which means a lot of errors can be generated in the various processes involved. I have used optical flow to create depth map from stereo pair and it's not exactly easy (see DMAG).

And that was the question:
I'm wondering what you think of the optical flow approach (for example, used by slowmo - https://github.com/slowmoVideo/slowmoVideo/wiki). Is that a good approach for generating intermediate views? How does it compare with the approach described in this article?

A separate question. Does the algorithm discussed in this article lend itself to parallel processing? Say I have the left/right depth maps generated ahead of time. Can the algorithm be run at 20-30fps so intermediate views can be dynamically generated in real time?

2. it looks easy to parallelize. i have implemented the sequential version under the name FSG6 if you want to try it.

3. Hi Ugo:

I tried both DMAG6 and FSG6. The intermediate frames generated by FSG6 seem to be cross-dissolving between the left/right views rather than a view from a novel point. This is quite apparent near the center (alpha = 0.5), where you can see the "ghosting" of the both original views. I'm wondering if I'm doing something wrong, or maybe I'm not setting parameters correctly? The last sentence of your article says "The right interpolated image can be obtained in a similar fashion. The two interpolated images are then combined to give the final interpolated image." How are the interpolated images combined? For each pixel, are you picking it from one of the interpolated images (based on some criteria), or are you literally mixing/combining them based on alpha? If you are doing the latter, it might explain the "ghosting" behavior?

3. Dear Anonymous:
it would probably be better to continue the conversation via email. If you are using DMAG6, the right depth map that your are getting needs to be color inverted for FSG6 to work correctly, in other words, white should be near and black should be far for FSG6 to work.

1. Anonymous:
With backward mapping (I think that's what i use in FSG6), you compute an intermediate image considering the left image and the right images. So you get 2 intermediate images. You fuse them using weights: if you are closer to the left image than the right, you put more weight on the left image. I don't remember the details whether you always fuse the 2 images. Also, keep in mind that it is assumed the depth maps are perfect, which is never the case in real time. So no matter how sophisticated the alg, you'll never get perfect intermediate images. In any case, I recommend improving the depth maps using FMAG9b after DMAG6. See http://3dstereophoto.blogspot.com/2016/11/3d-photos-pumpkins.html