Wednesday, September 6, 2017

3D Image Conversion - Top Gun

In this post, we are gonna look at 2d to 3d image conversion using DMAG4.


That's our good friend Tom Cruise in Top Gun. That's the 2d reference image we are going to "3d-fy".

As you probably know, the input to DMAG4 is the 2d image, a sparse depth map, and possibly what I call an "edge image". The purpose of the edge image is to separate different objects in the scene. DMAG4 will not propagate depths past an edge in the edge image, which means you don't have to worry about the beta parameter.


This is the "edge image". White is actually transparent background.


"Edge image" shown on top of the 2d image.

I drew the "edge image" in Gimp using the "Paths" tool and stroking the path once done. Use the "Stroke Line" option, turn the anti-aliasing off, and choose 1 pixel for the width when stroking the path. It takes about 5 minutes to get the "edge image". Because I use an "edge image", DMAG4 can be allowed to propagate regardless of color similarity and therefore it is ok to use a small beta (makes the bilateral filter in DMAG4 behave like a regular Gaussian filter). Having an "edge image" makes things much easier when it is time to draw the sparse depth map and give the depth clues to DMAG4.


This is the sparse depth map. White is actually transparent background.


Sparse depth map shown on top of the "edge image" and 2d image.

I took a pretty minimalist approach when drawing the sparse depth map, which means it is very sparse. To draw the sparse depth map in Gimp, I use the "Pencil" tool with an hard edge brush (no anti-aliasing). It takes another 5 minutes to draw the sparse depth map. When creating the sparse depth map, the important thing is the relationship between the various depths, not the actual depths. Make sure that when you zoom on the sparse depth map brushed on areas, the edges look jagged, which means there is no anti-aliasing applied, which is what we want.

We now have everything we need to launch DMAG4 and get the dense depth map.


Dense depth map obtained by DMAG4.

I used the following parameters in DMAG4:
beta = 10
maxiter = 5000
scale_nbr = 1
con_level = 1
con_level2 = 1

Again, because I used an "edge image", I really don't have to worry about depths propagating across objects. This means that I can use a low beta (here, equal to 10). When beta is low, DMAG4 behaves pretty much like a classic Gaussian filter. If beta is large, DMAG4 behaves like a bilateral filter, in other words, it propagates depths only along similar colors, which can be a real problem in some cases. The idea behind using a bilateral filter is that things that are not of the same color probably don't belong to the same object and are probably at different depths. Of course, this is not ideal because you can clearly have different colors within an object. Because of that, if you don't use an "edge image", the sparse depth map may need to be not so sparse at all and a lot of time is wasted drawing the sparse depth map, running DMAG4, and fixing the sparse depth map before going through another iteration. Conclusion: use an "edge image" and a low beta!


Wiggle/wobble created with wigglemaker.

Thursday, August 24, 2017

Case Study - DMAG5+DMAG9b vs DMAG5b+DMAG9b

In this post, I am gonna try to show that one can use DMAG5b instead of DMAG5 when the baseline is relatively small. I will also show the effects of selected parameters when using DMAG9b which is used to smooth (and sharpen) depth maps.


Left image after rectification by ER9b.


Right image after rectification by ER9b.

I took the stereo pair with an HTC Evo 3d cell phone which has a baseline of 35mm, I believe. The stereo pair is 1920x1080 pixels.


Left depth map obtained with DMAG5.

I used the following parameters for DMAG5:
min disparity for image 1 = -23
max disparity for image 1 = 27
disparity map for image 1 = depthmap_l.png
disparity map for image 2 = depthmap_r.png
occluded pixel map for image 1 = occmap_l.png
occluded pixel map for image 2 = occmap_r.png
radius = 16
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
disparity tolerance = 0
radius to smooth occlusions = 9
sigma_space = 9
sigma_color = 25.5
downsampling factor = 1

Since I am gonna use DMAG9b to smooth and sharpen the depth maps obtained by DMAG5b, it makes sense to also smooth and sharpen the depth map obtained by DMAG5 with DMAG9b.


Left depth map generated by DMAG5 and sharpened by DMAG9b.

I used the following parameters for DMAG9b:
sample_rate_spatial = 32
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 32

Now that we have a point of reference, we can go ahead and see what happens when we use DMAG5b instead of DMAG5. As a reminder, DMAG5b is a very simplistic depth map generator that uses basic SAD (Sum of Absolute Differences) to find matches. This type of depth map generator always fatten object boundaries. The effect is more intense as the baseline increases and/or the radius increases.


Left depth map generated by DMAG5b.

I used the following parameters for DMAG5b:
min disparity for image 1 = -23
max disparity for image 1 = 27
disparity map for image 1 = depthmap_l.tiff
disparity map for image 2 = depthmap_r.tiff
occluded pixel map for image 1 = occmap_l.tiff
occluded pixel map for image 2 = occmap_r.tiff
radius = 16
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
disparity tolerance = 0
radius to smooth occlusions = 9
sigma_space = 9
sigma_color = 25.5

The important parameter here is the radius. DMAG5b smoothes the disparities over a 2*radius+1 block window. To reduce the fattening of object boundaries, the only thing one can do is to reduce the radius but at the expense of increased noise.


Left depth map generated by DMAG5b.

I used the same parameters as before except:
radius = 8

I think it's tighter (than with radius = 16) so we are gonna use that depth map and sharpen it with DMAG9b.


Left depth map generated by DMAG5b and sharpened by DMAG9b.

I used the following parameters for DMAG9b:
sample_rate_spatial = 32
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 32

Now, compare that depth map with the depth map we got with the combo DMAG5+DMAG9b and you will see that there's not a whole lot of difference between the two. So, I think it's fair to say that DMAG5b can safely be used in lieu DMAG5 as long as the obtained depth map is post-processed by DMAG9b.

Let's see what happens in the depth map produced by DMAG9b when we change the spatial sample rate and the range (color) sample rate.


Left depth map generated by DMAG5b and sharpened by DMAG9b.

I used the same parameters as before for DMAG9b except:
sample_rate_spatial = 32
sample_rate_range = 4


Left depth map generated by DMAG5b and sharpened by DMAG9b.

I used the same parameters as before for DMAG9b except:
sample_rate_spatial = 16
sample_rate_range = 8


Left depth map generated by DMAG5b and sharpened by DMAG9b.

I used the same parameters as before for DMAG9b except:
sample_rate_spatial = 16
sample_rate_range = 4

Now, it might be relatively fun to see what happens when we change the smoothing multiplier lambda. I am going back to sample_rate_spatial = 32 and sample_rate_range = 8 changing lambda only. We are gonna make the depth map less smooth first and then more smooth.


Left depth map generated by DMAG5b and sharpened by DMAG9b.

I used the following parameters for DMAG9b:
sample_rate_spatial = 32
sample_rate_range = 8
lambda = 0.025
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 32


Same parameters as before except:
lambda = 2.5

Yeah, clearly, that's oversmoothed. We now know that lambda = 2.5 is way too aggressive and should be dialed back.

Tuesday, June 27, 2017

Interstellar (2d to 3d image conversion)

In this example, I tried to be as minimalist as possible when drawing the sparse depth map used by DMAG4. This is possible only if beta is set quite low. The problem is that when beta is low, bleeding is very likely to occur across object boundaries, which is what we don't want. To circumvent that, you really have to draw an "edge image" which is a trace over the object boundaries.


Reference image.


Sparse depth map.

For the spare depth map, I used a gradient over the whole image which I then erase to reveal the main subject. Then, I drew as little as possible to suggest the depths on the main subject.


Edge image.

To draw the edge image, I simply used the "Paths Tool" left-clicking all along the main subject.


Dense depth map produced by DMAG4.

I used:
beta = 10
number of iterations = 5000
scale number = 1
connection level = 1
connection level 2 = 1

Because I used an "edge image" which prevents bleeding across object boundaries, I was able to use a relatively low value for beta to facilitate depth propagation.


Wobble gif created with wigglemaker.

To create a wobble/wiggle, you may also want to use FSG11. The cool thing about FSG11 is that you can provide the background that's gonna be revealed when creating the frames in the form of a second reference image and depth map.


Second reference image.

To create the second reference image, I just took the reference image and use the cloning tool to extend the background into the main subject.


Second reference depth map.

The second reference depth map is the gradient that I used to create the sparse depth map.


Wobble/wiggle gif created with FSG11.

Here's a video tutorial for the whole thing:

Tuesday, March 14, 2017

Frame Sequence Generator 11 (FSG11)

FSG11 generates synthetic views given input images and depth maps. If a single image and depth map is provided, it behaves very much like FSG4 but a with a subtle difference: FSG11 gets rid of outliers prior to inpainting. If more than one image and depth map is provided, the additional images and depth maps are directly used to inpaint, avoiding the all too familiar blurring effect of most frame sequence generators (FSG4 included).

Generating the synthetic views using a single (main) image and depth map:


First (main) image


First (main) depth map.


Animated gif showing the synthetic views generated by FSG11.

Yep, it's definitively blurry but I actually do think it looks pretty good (because it's silky smooth). Does inpainting the actual background gives better results? To do that, you need a background image and its associated depth map, which can be fed to FSG11.

Generating the synthetic views using an additional image and depth map (for the background):


Second image.


Second depth map.


Animated gif showing the synthetic views produced by FSG11.

I felt the need to write FSG11 in the context of 2d to 3d image conversion. Since you have to spend time to generate the (main) depth map, why not spend a tad more time generating a second image and a depth map for the background. I use the clone tool and eat away at the foreground to generate the background image. The associated background depth map is usually quite simple since the foreground objects are supposed to be gone. In some cases, it can even be pure black for a background at infinity.

The windows executable (guaranteed to be virus free) is available for free via the 3D Software Page.

Friday, March 10, 2017

Lenticular Creation From Stereo Pairs Using Free Software

I have written a technical report which explains how to create a lenticular (assuming you have you already have the lenticular lenses at your disposal) when the starting point is either a stereo pair taken by a stereo camera, a couple of images of a static scene taken with a regular camera (using the very cool cha-cha method, for exanple), or an image and a depth map (perhaps resulting from a 2d to 3d image conversion).

Here's the link: Lenticular Creation From Stereo Pairs Using Free Software.

Bonus gif that goes with the paper:


Animated gif consisting of 10 frames produced by FSG4.

Sunday, March 5, 2017

3D Photos - Posing in front of the big column

The original stereo pair was 3603x2736 pixels (provided by my good friend Mike). I chose to reduce it by 50% (for convenience) to end up with a stereo pair of size 1802x1368 pixels. First step is to rectify the images in order to end up with matching pixels on horizontal lines, a requirement for most automatic depth map generators. Here, I am using ER9b but it's probably ok to rectify/align with StereoPhoto Maker.


Left image of stereo pair rectified by ER9b.


Right image of stereo pair rectified by ER9b.

ER9b gives:
min disparity = -53
max disparity = 1

We are gonna use those as input to the automatic depth map generator. The min and max disparities may also be obtained manually with DF2.

We are gonna use DMAG5 (first using a large radius and then using a small radius) followed by DMAG9b to get the depth map. I could have used other automatic depth map generators but I kinda like DMAG5 because it's fast and usually pretty good.

Let's start by using a large radius (equal to 32). Parameters used in DMAG5 (Note that I use a downsampling factor equal to 2 instead of 1 to speed things up.):

radius = 32
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
disparity tolerance = 0
radius to smooth occlusions = 9
sigma_space = 9
sigma_color = 25.5
downsampling factor = 2


Left depth map generated by DMAG5.

Let's follow up with DMAG9b to improve the depth map. Parameters used in DMAG9b:

sample_rate_spatial = 16
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 4


Left depth map generated by DMAG9b.

It's time now to use a small radius in DMAG5 (equal to 4). Parameters used in DMAG5:

radius = 4
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
disparity tolerance = 0
radius to smooth occlusions = 9
sigma_space = 9
sigma_color = 25.5
downsampling factor = 2


Left depth map generated by DMAG5.

Let's follow up with DMAG9b to improve the depth map. Parameters used in DMAG9b (same as before):

sample_rate_spatial = 16
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 4


Left depth map generated by DMAG9b.

I am gonna go with the depth map obtained using the small radius. Is it the best depth map that could be obtained automatically? Probably not because one could have tweaked further the parameters used in DMAG5 and DMAG9b. Also, one could have tried using DMAG2, DMAG3, DMAG5b, DMAG5c, DMAG6, or DMAG7 instead of DMAG5 to get the initial depth map. That's a whole lot of variables to worry about. Anyways, now is time to generate synthetic frames with FSG4 using the left image and the left depth map (and going on either side).

Parameters used for FSG4:

stereo window (grayscale value) = 128
stereo effect = 5
number of frames = 12
radius = 2
gamma proximity = 12
maximum number iterations = 200


Synthetic frames generated by FSG4 (in animated gif form).

Inpainting is typically done by applying a Gaussian blur, which explains why inpainted areas look blurry. FSG6 produces synthetic frames of better quality because the right image and depth map are also used to inpaint. However, with FSG6, the synthetic frames are limited to be between the left and right images.

Now, if the object of the game was to create a lenticular, those synthetic views would be now fed to either SuperFlip or LIC (Lenticular Image Creator) to create an interlaced image. The fun would not stop here however as this interlaced image would have to be printed on paper and then glued to a lenticular lens. Yes, it is indeed a whole lot of work!

3D Photos - Summer Palace

In this post, we are gonna try to get the best possible depth map for a stereo pair provided by my good friend Gordon. Size of the images is 1200x917 pixels, so about 1 mega pixels.


Left image (after rectification by ER9b).


Right image (after rectification by ER9b).

ER9b gives us:
min disparity = -82
max disparity = 7

Let's turn to our favorite automatic depth map generator, DMAG5, to get the depth map. Here, we are gonna use a downsampling factor of 2 to speed things up.

Let's start with the following parameters for DMAG5:

radius = 16
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
disparity tolerance = 0
radius to smooth occlusions = 9
sigma_space = 9
sigma_color = 25.5
downsampling factor = 2


Left depth map generated by DMAG5.


Left occlusion map generated by DMAG5.

Not a very good depth map! Unfortunately, we have occluded pixels on the right of Gordon and at the top of its head. The occluded pixels on the left are totally expected.

Let's call on DMAG9b to shake things up and improve the depth map.

Parameters we are gonna use in DMAG9b:

sample_rate_spatial = 16
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 4


Left depth map generated by DMAG9b.


Confidence map generated and used by DMAG9b. Black is low confidence and white is high confidence.

Better but it looks likes it is gonna be a tough one. Let's try something else by reducing the radius used in DMAG5 and post-process again with DMAG9b.

Let's use the following parameters for DMAG5:

radius = 4
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
disparity tolerance = 0
radius to smooth occlusions = 9
sigma_space = 9
sigma_color = 25.5
downsampling factor = 2


Left depth map generated by DMAG5.


Left occlusion map.

Clearly, there is a lot more noise but we are hoping the less smoothed and more accurate depths will give better results in DMAG9b.

Parameters we are gonna use in DMAG9b (same as before):

sample_rate_spatial = 16
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 4


Depth map generated by DMAG9b.


Confidence map used and generated by DMAG9b.

I think it might be possible to improve the depth map further either by tweaking further the parameters used in DMAG5 or by using another automatic depth map generator like DMAG2, DMAG3, DMAG5b, DMAG5c, or DMAG6.