Sunday, May 12, 2019

Case Study - DMAG5/DMAG9b vs DMAG5b/DMAG9b

This post kinda compares a depth map produced by the combo DMAG5/DMAG9b vs the combo DMAG5b/DMAG9b. Thanks to my good friend Katsuhiko Inoue for providing the stereo pair (taken in portrait mode with an iphone X).


Left image of stereo pair after rectification by ER9b.


Right image of stereo pair after rectification by ER9b.

I do not know how the right image was obtained. It certainly was not obtained from a portrait mode stereo photo using the dual lens as it's not possible to extract the right image from an iphone X stereo photo. Even if you could extract the right image, it would not be the same focal length as the left image, meaning you would need specialized depth map generation software to get the depth map. Here, I am talking about the dual lens iphone X (back-facing camera system), not the TrueDepth sensor (front-facing camera system). I think the depth map produced by the iphone X was used here to create a synthetic right image using 3dsteroid pro or stereophoto maker. Basically, what I am gonna be doing here is see if I can recover the original depth map from the left image and a synthetic right image.

The dimensions of the original stereo pair are 3024x4032. I reduced the dimensions to 1800x2400 so that DMAG9b would run faster. The only reason I ran ER9b was to get the min and max disparities. It looks like the original stereo pair was very well aligned. Note that because the baseline is so small, you don't want to reduce the image size too much otherwise you are going to get a depth map with few depth levels (shades of gray) as far as DMAG5 and DMAG5b are concerned. Note that the number of depth levels is equal to the difference between the min and max disparities. So, for example, if the min disparity is -44 and the max disparity is 10, you are gonna get 55 depth levels (shades of gray) in the depth map produced by DMAG5 or DMAG5b. Something to consider.

Now, let's run DMAG5 using the following input file:

image 1 = ../er9b/image_l.png
image 2 = ../er9b/image_r.png
min disparity for image 1 = -44
max disparity for image 1 = 10
disparity map for image 1 = depthmap_l.png
disparity map for image 2 = depthmap_r.png
occluded pixel map for image 1 = occmap_l.png
occluded pixel map for image 2 = occmap_r.png
radius = 16
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
disparity tolerance = 0
radius to smooth occlusions = 9
sigma_space = 9
sigma_color = 25.5
downsampling factor = 2

I believe those are the default values in StereoPhoto Maker.


Left depth map obtained by DMAG5.

If you want to experiment, you could change the value for the radius. Maybe try 8 or 32 instead of 16 and see what happens. Also, you may want to change the downsampling factor to 1 instead of 2. It will take longer but you will get more levels of depth in the depth map (shades of gray).

Let's run DMAG9b using the following input file:

reference image = ../../er9b/image_l.png
input disparity map = ../depthmap_l.png
sample_rate_spatial = 32
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 2
output depth map image = depthmap_l_dmag9b.png

I believe those are the default in StereoPhoto Maker except for sigma. Here, I am using sigma = 2.0, SPM uses 32.0. I don't think it matters much to be honest. Recall that the lower the sigma, the less confidence is given to the depth in the input depth map.


Confidence map. White means very confident in input depth, black means little confidence. Since sigma is relative low, the black streaks (poor confidence) are quite prominent.


Depth map produced by DMAG9b.

Let's change sigma from 2.0 to 32.0 and run DMAG9b using the following input file:

reference image = ../../er9b/image_l.png
input disparity map = ../depthmap_l.png
sample_rate_spatial = 32
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 32
output depth map image = depthmap_l_dmag9b.png


Confidence map. Since sigma is relatively high, the black streaks (poor confidence) are pretty narrow.


Depth map produced by DMAG9b.

Not a whole lot of difference so I am gonna continue with sigma = 2.0. Let's change sample_rate_spatial from 32 to 16 and run DMAG9b using the following input file:

reference image = ../../er9b/image_l.png
input disparity map = ../depthmap_l.png
sample_rate_spatial = 16
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 2
output depth map image = depthmap_l_dmag9b.png


Depth map produced by DMAG9b.

I think it's a bit better so let's continue the trend and change sample_rate_spatial from 16 to 8. Let's run DMAG9b using the following input file:

reference image = ../../er9b/image_l.png
input disparity map = ../depthmap_l.png
sample_rate_spatial = 8
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 2
output depth map image = depthmap_l_dmag9b.png


Depth map produced by DMAG9b.

I think I hit the sweet spot so I am gonna stop here. Note that as sample_rate_spatial goes down, the cpu time for DMAG9b goes up.

Because the interocular distance is small, it can be worthwhile to use DMAG5b instead of DMAG5 to get the initial depth map. DMAG5b is a very simple algorithm but it will not perform well at object boundaries if the baseline used to take the stereo pair was (relatively) large. Here, it should perform ok since the pair was taken with an iphone with dual cameras.

Let's run DMAG5b using the following input file:


Depth map produced by DMAG5b.

The depth map produced by DMAG5b is actually better (I think) than the depth map produced by DMAG5. In this particular case. Personally, I would stop here and not even bother with DMAG9b but let's see how the best DMAG5/DMAG9b combo (as seen right above) compares with DMAG5b/DMAG9b.

Let's try to improve this depth map using DMAG9b and the following input file (same as the one right above):

reference image = ../../er9b/image_l.png
input disparity map = ../depthmap_l.png
sample_rate_spatial = 8
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 2
output depth map image = depthmap_l_dmag9b.png


Depth map produced by DMAG9b.

Here, it does not really matter how the initial depth map was obtained as DMAG9b is quite aggressive. To make DMAG9b less aggressive, lambda is probably the parameter to change. The lower lambda is, the less aggressive DMAG9b is going to be.