Tuesday, June 21, 2016

Structure from Motion (SfM) + Multi View Stereo (MVS) vs Stereo


Set of 11 images extracted from a video taken with iphone 4s. Dimensions are 1080x1920 pixels.

We are gonna use Multi View Stereo 10 (MVS10) to reconstruct the dense 3D scene (Multi View Stereo). Before that, we ran Structure from Motion 10 (SfM10) to get the camera poses and feature matches between the views (Structure from Motion).

Input to MVS10 ("mvs10_input.txt"):

duh.nvm = name of nvm file (generated by SfM10)
100 = minimum number of matches (camera pair selection)
0.5 = minimum average separation angle (camera pair selection)
32 = window radius
0.9 = alpha
20.0 = truncation cost (color)
10.0 = truncation cost (gradient)
4 = epsilon
0 = disparity tolerance
4 = downsampling factor
1 = sampling step
0.5 = minimum separation angle (removal of low-confidence 3d points)
3 = minimum number of image points (removal of low-confidence 3d points)
2.0 = maximum reprojection error (removal of low-confidence 3d points)
1 = radius (animated gif files)


Animated gif of 3d dense reconstruction.

Using Structure from Motion (SfM) and Multi View Stereo (MVS) may seem kind of overkill just to produce an animated gif of the reconstructed 3D scene (recall that MVS10 primarily outputs a ply file of the 3D scene which can be used for whatever purpose). So, it might be a good idea to see if we can produce a good animated gif from just two frames (say, the first two of the sequence) using Epipolar Rectification 9b (ER9b), Depth Map Automatic Generator 5 (DMAG5) or Depth Map Automatic Generator 6 (DMAG6) (we'll try both), and finally Wiggle Maker.

Prior to using DMAG5 or DMAG6 on the first two frames of the sequence, they must be rectified (aligned). Rectification is used to limit the finding of matches to the x direction (along the width).


Rectified image 00 in the 11 image sequence (used as left image in stereo pair).


Rectified image 01 in the 11 image sequence (used as right image in stereo pair).

Input to DMAG5:

-89 = min disparity
68 = max disparity
16 = window radius
0.9 = alpha
20.0 = truncation cost (color)
10.0 = truncation cost (gradient)
4 = epsilon
0 = disparity tolerance
9 = occlusion smoothing radius
9.0 = occlusion smoothing sigma space
25.5 = occlusion smoothing sigma color
4 = downsampling ratio


Depth/disparity map produced by DMAG5.


Depth/disparity map smoothed out by EPS9.


Animated gif produced by "Wiggle Maker".

Input to DMAG6:

-89 = min disparity
68 = max disparity
0.9 = alpha
20. = truncation cost (color)
10. = truncation cost (gradient)
10000. = truncation cost (discontinuity)
5 = level number
5 = iteration number
0.5 = I don't remember what that is and I am too lazy to look it up
0 = disparity tolerance
9 = occlusion smoothing radius
9.0 = occlusion smoothing sigma space
25.5 = occlusion smoothing sigma color
4 = downsampling ratio


Depth/disparity map produced by DMAG6.


Depth/disparity map smoothed out by EPS9.


Animated gif produced by "Wiggle Maker".

Using a downsampling ratio of 4, it takes seconds (instead of minutes) to generate depth maps with DMAG5 or DMAG6 and the depth map quality is quite acceptable. In "Wiggle Maker", I used "Inpainting Method = None" so that the animation look is similar to that produced by MVS10. Try not to pay any attention to what's happening at the borders (I know it's distracting and I should have cropped the animated gifs). Clearly, there's not that much of a difference between the depth maps produced by DMAG5 and DMAG6 (maybe DMAG6 is a bit better, especially after the smoothing step), so I don't think it makes a whole lot of difference which automatic depth map generator is used. The animation produced from the output of MVS10 (3d dense reconstruction) is much better than the animation produced from the output of either DMAG5 or DMAG6. Hmmm, it better be since MVS10 uses 11 views while DMAG5 or DMAG6 only use 2 and it takes a boatload more time to run SfM10+MVS10 than DMAG5 or DMAG6 (something like 1 hour vs seconds).

No comments:

Post a Comment