Saturday, March 24, 2018

Multi View Stereo - Dino


Set of 5 images for which we want to build a 3d reconstruction. Images courtesy of Bernd.

First this to do is get the camera extrinsics, that is, the camera positions and orientations. For this, I am gonna use sfm10.

Input to sfm10:

Number of cameras = 5
Image name for camera 0 = D1kl.JPG
Image name for camera 1 = D2kl.JPG
Image name for camera 2 = D3kl.JPG
Image name for camera 3 = D4kl.JPG
Image name for camera 4 = D5kl.JPG
Focal length = 1440
initial camera pair = 2 4
Number of trials (good matches) = 10000
Max number of iterations (Bundle Adjustment) = 1000
Min separation angle (low-confidence 3D points) = 0
Max reprojection error (low-confidence 3D points) = 10000
Radius (animated gif frames) = 5
Angle amplitude (animated gif frames) = 10

For the focal length, I didn't really think about it too much and simply use the width of the images. It should be close enough. For the initial camera pair, I also didn't think too much about it and chose two images that seemed to have been taken from not too far apart viewpoints. I am sure choosing "0 1" as the initial stereo pair would have been just fine too. I don't care about the min separation angle allowed (giving it a value of 0). I don't care either about the max reprojection error allowed (giving it a value of 10000). I usually worry those later, looking at the output of sfm10.

I have got to admit that sfm10 outputs a lot of verbose. The important part is at the end, in particular, the last computed average reprojection error (I like to have it under 1.0 and if possible under 0.5) and the min and max depths (I do not like to see a max depth that's very large). Had those been out of specs, I would have gone back to the input file and tweak things around, like the min separation angle allowed and the max reprojection error allowed.

This is the last bits of info in the output from sfm10:

Number of 3D points = 1506
Average reprojection error = 0.432055
Max reprojection error = 4.94969
Adding camera 1 to the 3D reconstruction ... done.
Looking for next camera to add to 3D reconstruction ...
Looking for next camera to add to 3D reconstruction ... done.
No more cameras to add to the 3D reconstruction
Average depth = 18.8279 min depth = 8.99441 max depth = 40.4922

We are ready to build the dense reconstruction using mvs10.

This is the input to mvs10:

nvm file = duh.nvm
Min match number (camera pair selection) = 100
Max mean vertical disparity error (camera pair selection) = 1
Min average separation angle (camera pair selection) = 0.1
radius (disparity map) = 32
alpha (disparity map) = 0.9
color truncation (disparity map) = 30
gradient truncation (disparity map) = 10
epsilon = 255^2*10^-4 (disparity map)
disparity tolerance (disparity map)= 0
downsampling factor (disparity map)= 4
sampling step (dense reconstruction)= 1
Min separation angle (low-confidence 3D points) = 0.1
Max reprojection error (low-confidence image points) = 10
Min image point number (low-confidence 3D points) = 3
Radius (animated gif frames) = 1
Angle amplitude (animated gif frames) = 1

The min match number, the max mean vertical disparity error, and the min average separation angle are used to determine whether or not a camera/image pair should be added to the 3d reconstruction. Obviously, you don't want to add a camera/image pair if the 3d points are not gonna be accurate. But you don't want to be too picky either otherwise you won't have too many points in the 3d reconstruction.

The radius, alpha, the color truncation, the gradient truncation, epsilon, the disparity tolerance, and the downsampling factor are used to generate the depth map for each camera/image pair. See dmag5 if you want to know more about these parameters.

The min separation angle, the max reprojection error, and the min image point number are used to determine whether a 3d point should remain in the 3d reconstruction. The min image point number is probably the most important. If set to 3, like here, it means that a 3d point should appear in at least 3 views to be accepted. A whole lot of points get rejected if they only appear in 2 images.

Just like sfm10, mvs10 spews out a lot of stuff as it rectifies the stereo pairs, compute the depth maps, and adds/removes points to the 3d scene. At the end, you have the final number of points generated and the average reprojection error. You want a good number of points and a relatively low average reprojection error.

This is the end of the mvs10 output:

Number of 3D points = 716633
Average reprojection error = 0.858998
Max reprojection error = 9.97793

What I do next is load up the duh_xx.jpg in the Gimp and make an animated gif out of them. That's probably the best way to check if the 3d reconstruction is decent. You could also load the ply file in Meshlab or CloudCompare to actually see the point cloud in 3d.


Animated gif showing the dense 3d reconstruction.

Now, is it possible to simply create a depth map out of that sequence of images instead of building a dense 3d reconstruction? Yes, it's possible but it's not easier than building a dense 3d reconstruction. I have a tool to do that, it's called dmag8b (dmag8 can also be used if you wnt). To use dmag8b, you need to run sfm10 first to get the camera extrinsics. We have already done that so let's go straight to dmag8b.

Input to dmag8b:

nvm file = duh.nvm
image ref = ?
near plane = 8.99
far plane = 40.49
nbr of planes = 200
radius = 16
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png

If you don't want the program to automatically determine which image should be the reference image, I suggest putting the name of the image that's first in the nvm file. Since sfm10 outputs the min and max depths, that's what I use for near and far planes. I believe that if you put 0 for the min and max depths, dmag8b will compute them from the nvm file and you'll end up with the exact same values (that sfm10 came up with). For the number of planes, I usually start with something small and then increase it. Here, I started at 20 planes and finished with 200.

The radius, alpha, the truncation color, the truncation gradient, and epsilon are the dmag5 parameters you should now be familiar with.


Depth map produced by dmag8b.

Yes, not a good depth map. Because dmag8b is not exactly a race horse, tweaking the parameters is not an option. Maybe the radius is too small. Maybe the viewpoints are too different. I don't know. Anyways, I think one would get a better depth map from just 2 images using er9b (or er9c) as the rectifier, dmag5 (or dmag5b or dmag6 or whatever) as the depth map generator, and then dmag9b as the depth map optimizer.

When you have more than 2 views, I think it's better to either build the 3d reconstruction or get a depth map from just 2 views. Building a depth map from more than 2 views is quite difficult. It's usually easier to build the 3d scene. It makes more sense too. This is probably why you don't see too many programs that can do depth map generation from multiple views. Then again, Google's "Lens Blur" does just that and pretty well too. Will have to investigate ... if there's enough interest.

Let's see if we can get a better depth map with dmag8b ...

From the original set of 5 views, I am gonna remove 2 views so that the remaining views have similar viewpoints.


Sequence of 3 views (instead of 5).

This is my new input to sfm10:

3
D3kl.JPG
D4kl.JPG
D5kl.JPG
1440.
0 2
10000
1000
0.0
10000.0
5
10.

Now, let's run dmag8b using different input decks and see if the depth map gets better.

Input to dmag8b:

nvm file = duh.nvm
image ref = D3kl.JPG
near plane = 10.247
far plane = 32.037
nbr of planes = 200
radius = 16
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png

The only thing I changed from previous run is the near and far planes: I used what sfm10 gave me in the output.


Depth map obtained by dmag8b.

I think it's much better. I still think the image viewpoints are a bit too far apart as the occlusions give rise to wrong depths, especially around the dinosaur's boundary. Unlike dmag5 and the likes, dmag8b doesn't explicitly deals with occlusions, so it's very likely you will get wrong depths at object boundaries (in the foreground). You can reduce this problem by not moving the camera too much when shooting the sequence.

Let's change alpha from 0.9 to 0 ...

Input to dmag8b:

nvm file = duh.nvm
image ref = D3kl.JPG
near plane = 10.247
far plane = 32.037
nbr of planes = 200
radius = 16
alpha = 0
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png


Depth map obtained by dmag8b.

Not much of a change. Let's increase the radius keeping alpha = 0 ...

Input to dmag8b:

nvm file = duh.nvm
image ref = D3kl.JPG
near plane = 10.247
far plane = 32.037
nbr of planes = 200
radius = 32
alpha = 0
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png


Depth map obtained by dmag8b.

Not much of a change either. Let's keep the radius at 32 and set alpha to 0.9 ...

Input to dmag8b:

nvm file = duh.nvm
image ref = D3kl.JPG
near plane = 10.247
far plane = 32.037
nbr of planes = 200
radius = 32
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png


Depth map obtained by dmag8b.

Again, not much of a change. Let's see what happens when we reduce the number of planes from 200 to 200.

Input to dmag8b:

nvm file = duh.nvm
image ref = D3kl.JPG
near plane = 10.247
far plane = 32.037
nbr of planes = 100
radius = 32
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png


Depth map obtained by dmag8b.

Again, not much of change. Let's stop here.

So, I think that to get the best results with dmag8b, the viewpoints should be close to each other, in other words, the camera should not be moved too much between shots. Note that this requirement makes the job of sfm10 a little bit harder.