Thursday, April 28, 2016

Multi View Stereo 10 (MVS10)

MVS10 builds a dense reconstruction of a 3D scene from a set of pictures (taken with a camera in photo mode or extracted from a video). Its input is a nvm file which comes either from Structure from Motion 10 (SfM10) or VisualSFM : A Visual Structure from Motion System and the images. The nvm file contains the camera poses (location and orientation) as well as a sparse 3D reconstruction of the scene. MVS10 is loosely based upon "Bundled Depth-Map Merging for Multi-View Stereo" by Jianguo Li, Eric Li, Yurong Chen, Lin Xu, Yimin Zhang. MVS10 pairs selected images/cameras, computes the corresponding disparity maps, and merges them as best as possible.

MVS10's workflow is as follows:
- Initialize the dense with 3D reconstruction with the sparse 3D reconstruction contained in the nvm file
- For each pair of images/cameras
-- Rectify the images using code similar to Epipolar Rectification 9b (ER9b)
-- Compute the disparity map using code similar to Depth Map Automatic Generator 5 (DMAG5)
-- Add the matches (pairs of image points) coming from the disparity map to the 3D reconstruction
-- Remove image points for which the reprojection error is too large
- Remove 3D points (also known as tracks in the literature) for which the number of image points is too low

The output of MVS10 is:
- A set of frames named duh_XX.jpg where XX varies from 01 to 11 that can be used to create an animated gif of the reconstructed 3D scene.
- A point cloud in ply format names duh.ply that describes the 3D scene which can be loaded and possibly meshed (I recommend using the Poisson method) in MeshLab.

MVS10 requires the presence of an input file called "mvs10_input.txt" containing the following parameters:
- Filename for nvm file. The nvm file is obtained from SfM10 or VisualSFM. The images referenced by the nvm file must of course also be present in the directory where MVS10 is run from.
- Number of trials used in rectification. The higher the number, the more accurate the rectification is and the more accurate the (good) matches are.
- Minimum average separation angle (camera pair selection). The separation angle for a given 3D point and a pair of images/cameras is the angle between the rays that go from the 3D point to the camera centers. If the average separation angle is too small, the disparity map is likely to be poor because there's not enough parallax between the two rectified images. If the average separation angle is below the given threshold, the disparity map is not computed and will obviously not participate in the 3D reconstruction.
- Radius used to smooth the cost. This parameter is used to compute the disparity map. It is explained in Depth Map Automatic Generator 5 (DMAG5).
- Alpha. This parameter is used to compute the disparity map. It is explained in Depth Map Automatic Generator 5 (DMAG5).
- Truncation value for color cost. This parameter is used to compute the disparity map. It is explained in Depth Map Automatic Generator 5 (DMAG5).
- Truncation value for gradient cost. This parameter is used to compute the disparity map. It is explained in Depth Map Automatic Generator 5 (DMAG5).
- Epsilon. This parameter is used to compute the disparity map. It is explained in Depth Map Automatic Generator 5 (DMAG5).
- Disparity tolerance used to detect occlusions. This parameter is used to compute the disparity map. It is explained in Depth Map Automatic Generator 5 (DMAG5).
- Downsampling factor. Because computing disparity maps takes time, downsampling images, that is reducing the size of images, is quite attractive because it can considerably reduce the computation time. At the expense of accuracy, of course. If the images are in the one megapixel range, downsampling is probably not needed but beyond that, downsampling by a factor of two or more can be much beneficial in terms of computation time.
- Sampling step. Each time a disparity map is computed, each and every pixel in one image can be matched to a pixel in the other image. Well, that can be very taxing computationally wise. If the sampling step is set to two, every other pixel is considered. If the sampling step is set to three, one out of every three pixels is considered. The less sampling, the more dense the 3D reconstruction is going to be.
- Minimum separation angle. This parameter is used to remove 3D points (aka tracks) that are low-confidence. If the separation angle is too low (below the given threshold), the 3D point is probably not accurate and is discarded. The lower the number, the more 3D points will remain in the 3D reconstruction and the denser it will be.
- Minimum number of image points per 3D point. This parameter is used to remove 3D points (aka tracks) that are low-confidence. The lower the number, the more 3D points will remain in the 3D reconstruction and the denser it will be.
- Maximum reprojection error. This parameter is used to remove for a given 3D point the image points that are low-confidence. The lower the reprojection error threshold, the more image points are gonna be discarded and the more 3D points will be discarded as a result.
- Radius for the animated gif frames. Instead of using a single pixel for each 3D point when computing the frames, a square is used whose dimensions are two times the radius plus one. The denser the 3D reconstruction, the lower the radius should be. Note that if the radius is zero, a single pixel is used for each 3D point in the frames.

MVS10 writes a fair amount of files to disk, in particular, files with suffix mvs (referred to as mvs files). One has to know whether or not to delete the mvs files before relaunching MVS10 after having changed parameters in "mvs10_input.txt".

If the following parameters
- Filename for nvm file
- Number of trials used in rectification
- Minimum average separation angle (camera pair selection)
- Radius used to smooth the cost
- Alpha
- Truncation value for color cost
- Truncation value for gradient cost
- Epsilon
- Disparity tolerance used to detect occlusions
- Downsampling factor
have been changed, the mvs files (files with suffix mvs) must be deleted because the disparity maps must be recomputed. If any other parameter is changed, the mvs files should not be deleted as the disparity maps do not need to be recomputed.

Here are a few examples:


Set of 3 images (1280x720) extracted from a video taken with iphone4s.


Sparse 3D reconstruction obtained with SfM10.


Dense 3D reconstruction obtained with MVS10.

The parameters used for MVS10 were:
- Number of trials used in rectification = 10000
- Minimum average separation angle (camera pair selection) = 1.5
- Radius used to smooth the cost = 32
- Alpha = 0.9
- Truncation value for color cost = 30.0
- Truncation value for gradient cost = 10.0
- Epsilon = 4
- Disparity tolerance used to detect occlusions = 0
- Downsampling factor = 2
- Sampling step = 4
- Minimum separation angle = 1.5
- Minimum number of image points per 3D point = 3
- Maximum reprojection error = 2.0
- Radius for the animated gif frames = 2


Set of 8 images (1280x720) extracted from a video taken with iphone4s.


Sparse 3D reconstruction obtained with SfM10.


Dense 3D reconstruction obtained with MVS10.

The parameters used for MVS10 were:
- Number of trials used in rectification = 10000
- Minimum average separation angle (camera pair selection) = 1.5
- Radius used to smooth the cost = 32
- Alpha = 0.9
- Truncation value for color cost = 30.0
- Truncation value for gradient cost = 10.0
- Epsilon = 4
- Disparity tolerance used to detect occlusions = 0
- Downsampling factor = 2
- Sampling step = 4
- Minimum separation angle = 1.5
- Minimum number of image points per 3D point = 3
- Maximum reprojection error = 2.0
- Radius for the animated gif frames = 2


Set of 8 images (1080x1920) extracted from a video taken with iphone4s.


Sparse 3D reconstruction obtained with SfM10.


Dense 3D reconstruction obtained with MVS10.

The parameters used for MVS10 were:
- Number of trials used in rectification = 10000
- Minimum average separation angle (camera pair selection) = 0.5
- Radius used to smooth the cost = 32
- Alpha = 0.9
- Truncation value for color cost = 30.0
- Truncation value for gradient cost = 10.0
- Epsilon = 4
- Disparity tolerance used to detect occlusions = 0
- Downsampling factor = 4
- Sampling step = 4
- Minimum separation angle = 0.5
- Minimum number of image points per 3D point = 3
- Maximum reprojection error = 2.0
- Radius for the animated gif frames = 2

The windows executable (guaranteed to be virus free) is available for free via the 3D Software Page. In the directory where you have extracted the archive, there should be a manual for MVS10 called "mvs10_manual.pdf" and a sub-directory called "mvs10_test" that contains all the data needed to run MVS10 on a sample test case.