Thursday, May 24, 2018

Non Photorealistic Rendering - Watercolor rendering (watercolorization)

Bousseau et al. pioneered the process of watercolorizing a photograph in "Interactive watercolor rendering with temporal coherence and abstraction" by Adrien Bousseau, Matthew Kaplan, Joelle Tholot, Francois X. Sillion. The idea of darkening/lightening a color image given a grayscale texture image enables to simulate a bunch of watercolor effects including paper texture, turbulent flow, pigment dispersion, and edge darkening. There is another paper that I found quite useful in implementing my own version of Bousseau's watercolorizer, "Expressive Rendering with Watercolor" by Patrick J. Doran and John Hughes, as it discusses Bousseau's algorithm.


Input image. Comes from the NPR benchmark database.

The first thing to do is to abstract the image in order to reduce the amount of details. Bousseau et al. uses Mean Shift to color segment the image followed by the application of morphological smoothing operators like dilation and erosion. Since I have already implemented software to abstract and stylize images (for cartoon rendering) in Non Photorealistic Rendering - Image Abstraction by Structure Adaptive Filtering, I simply use that to get the abstracted image. I think it works real good too.


Abstracted image.

Let's apply a watercolor paper texture to the abstracted image to simulate the grain of the watercolor paper.


Image after having applied a paper texture.

Let's apply a turbulent flow texture to the current image to simulate watercolor color variation due to how water moves and carries pigments. The turbulent flow texture comes from the sum of Perlin noise at various frequencies. It's mostly a low frequency coherent noise.


Image after having applied a turbulent flow texture.

Let's apply an edge darkening texture to the current image to simulate how pigments accumulate at the boundaries of washes. The edge darkening texture is obtained by computing the gradient magnitude of the original abstracted image.


Image after having applied an edge darkening texture.

Here's a video:


Bousseau et al. also use a grayscale texture to simulate pigment dispersion, the high frequency version of turbulent flow. It's supposed to be implemented as a sum of Gaussian noises. I don't really like that effect, so I simply did not implement it.

Clearly, this simulates the wet-on-dry watercolor technique, not the wet-on-wet technique. "Towards Photo Watercolorization with Artistic Similitude" by Wang et al. proposes a wet-on-wet effect which I will probably implement at some point.

Wednesday, May 16, 2018

Case Study - How to improve depth map quality with DMAG9b and DMAG4

Another stereo pair from the Peter Simcoe collection. As usual, first thing to do is to rectify the stereo pair using either er9b or er9c. This time I chose er9c.


Left image after rectification.


Right image after rectification.

Output from er9c:
Mean vertical disparity error = 0.362055
Min disp = -17 Max disp = 38
Min disp = -21 Max disp = 43

Time to generate the depth map. I am liking dmag6 more and more even though it is a memory hog and quite a bit slower than dmag5. For the min and max disparities, I will use min disp = -21 and max disp = 43.


Depth map obtained by dmag6.

Input to dmag6:
min disparity for image 1 = -21
max disparity for image 1 = 43
disparity map for image 1 = depthmap_l.png
disparity map for image 2 = depthmap_r.png
occluded pixel map for image 1 = occmap_l.png
occluded pixel map for image 2 = occmap_r.png
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
truncation (discontinuity) = 10000
iteration number = 5
level number = 5
data cost weight = 0.5
disparity tolerance = 0
radius to smooth occlusions = 9
sigma_space = 9
sigma_color = 25.5
downsampling factor = 1

Let's improve the depth map by calling on our good friend dmag9b. For sure, dmag9b will sharpen the depth map at the object boundaries.


Depth map obtained by dmag9b.

Input to dmag9b:
sample_rate_spatial = 16
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 32

I don't like the depth map around the bags in the foreground and I really want the 2 thin straps near the bags to be included in the depth map. So, time to do some manual labor before calling on dmag4.


Sparse/scribbled depth map to be fed to dmag4. White areas are actually transparent.

If, when you start erasing with the eraser tool, you get white instead of the checkerboard pattern, it's because you need to add the alpha channel to the depth map.


Edge image to be fed to dmag4. White areas are actually transparent.

I use the paths tool to generate the edge image. It's easy as pie.


Depth map obtained by dmag4.

Input to dmag4:
beta = 0
maxiter = 5000
scale_nbr = 1
con_level = 1
con_level2 = 1


3d wiggle produced by wigglemaker.

Tuesday, May 15, 2018

Non Photorealistic Rendering - Stroke-Based Rendering (SBR)

I have combined elements from two academic papers to create my own Stroke-Based Rendering (SBR) software. I used "Painterly Rendering with Curved Brush Strokes of Multiple Sizes" by Aaron Hertzmann for the general framework but I didn't particularly like his curved brush strokes. I prefer straight strokes with an oil paint texture. So, I turned to "An Algorithm For Automatic Painterly Rendering Based On Local Source Image Approximation" by Michio Shiraishi and Yasushi Yamaguchi to handle the brush strokes.

The following 2 images show the pseudo-code for the framework. They come straight from the Hertzmann paper.


The input is an RGB image (sourceImage) and a sequence of brush radii of decreasing size (R1 to Rn). The output is an RGB image (canvas) which is initialized to some middle gray color. For each brush radius, the source image is convolved by a Gaussain blur of variance f_sigma * Ri where Ri is the current brush radius. The parameter f_sigma is some constant factor that enables to increase or decrease the blurring. If f_sigma is set to a very small value, no blurring takes place. A layer of paint is then laid on the reference image (blurred source image) using the paintLayer function described below.


A grid is virtually constructed with a grid cell size equal to f_grid * R where f_grid is some constant factor and R is the current brush radius. Then, for each grid cell, he computes the average error within the grid cell, the error being defined as the difference in color between the current canvas color and the reference image color. If the average difference in color is greater than T (error_threshold), the pixel with the largest difference in color is chosen as the center of the brush stroke and that brush stroke is added to the list of brush strokes (the color for the brush stroke comes from the color of the pixel in the reference image). Once all the brush strokes have been created, they are randomized, and applied to the canvas.

As mentioned previously, I don't like curved brush strokes as I don't think it reflects particularly well what most fine art painters do. Straight brush strokes are in my opinion better and they are much easier to implement. Let's turn now our attention to the Shiraishi paper to make and paint the brush strokes.

The pixel with the largest error in the grid cell (its color is the brush stroke color) and the radius define a square window in the reference image. What Shiraishi does is create a grayscale difference image considering the brush stroke color as the reference color. He then uses image moments to define the equivalent rectangle of that square difference image. The center of the equivalent rectangle defines the brush stroke center. The angle theta between the longer edge of the equivalent rectangle and the x-axis defines the angle of the brush stroke. The width and length of the equivalent rectangle define the width and length of the brush stroke. This completely defines the brush stroke. Recall though that all brush strokes are made before they are applied onto the canvas.

To paint a given brush stroke, a rectangular grayscale texture image (where white means fully opaque and black means fully transparent) is scaled so that it matches the equivalent rectangle in terms of width and height, rotated by theta, translated so that its center matches the brush stroke center, and then painted onto the canvas using alpha blending. If you want to be real fancy and somehow simulate the impasto technique where thick layers of oil paint are applied, you may also use a rectangular grayscale bump map image alongside the texture image and a bump map alongside the canvas.

Here's an example:


Input RGB image.


Output canvas without bump mapping.


Output canvas image with bump mapping.

Parameters used:
brush radius = 32 16 8 4 2
f_sigma = 1e-05
error_threshold = 60

A few notes:
- I use a very small f_sigma so that the reference image never gets blurred. Because of that, the input image needs to be slightly blurred as high frequency artifacts could be a problem when evaluating the image moments.
- I always use f_grid = 1.0. That's why it's not a parameter.
- To render the bump mapping, I use Gimp as it has a very convenient bump map filter.

Here's a quick video:


At the moment, the software is sitting on my linux box but it is not available for download. If you like this type of painterly rendering, feel free to send me your photographs and it will be my pleasure to "paint" them for you.

Non Photorealistic Rendering - Image Abstraction by Structure Adaptive Filtering

This post describes all the parameters that impact the rendered image in "Image Abstraction by Structure Adaptive Filtering" by Jan Eric Kyprianidis and Jürgen Döllner. Note that this paper was seriously influenced by "Real-Time Video Abstraction" by Holger Winnemöller, Sven C. Olsen, and Bruce Gooch. IF you are a bit confused by the title of the paper, "Image Abstraction by Structure Adaptive Filtering", you are not the only one. In layman terms, it's simply a cartoon filter.


Overview of the method (picture comes from the paper cited above).

There are 2 parameters to control the number of iterations in "Separated OABF":
- n_e. That's the number of iterations before edges are detected. Kyprianidis uses n_e = 1 while Winnemöller uses n_e = 1 or 2.

- n_a. That's the total number of iterations.

Step 1: Local Orientation Estimation

This is to establish the Edge Tangent Flow (ETF) vector field. See Non Photorealistic Rendering - Edge Tangent Flow (ETF).

Parameters used:
- Variance of the Gaussian used to blur the structure tensors. What value should be used is not really discussed in the paper.

Step 2: Separated Orientation-aligned Bilateral Filter

See Non Photorealistic Rendering - Separated Orientation-Aligned Bilateral Filter (Separated OABF).

Parameters used:
- sigma_d. That's the variance of the spatial Gaussian function that's part of the bilateral filter. Both Kyprianidis and Winnemöller use sigma_d = 3.0.
- sigma_r. That's the variance of the color Gaussian function that's part of the bilateral filter. Both Kyprianidis and Winnemöller use sigma_r = 4.25.

For this step, the number of iterations used is the number of iterations before the edges are detected, that is, n_e.

Step 3: Separated Flow-based Difference-of-Gaussians Filter (Separated FDoG)

See Non Photorealistic Rendering - Separated Flow-based Difference of Gaussians (Separated FDoG).

Parameters used for the DoG filter that is applied in the gradient direction:
- sigma_e. That's the variance of the spatial Gaussian. The variance of the other spatial Gaussian is set to 1.6*sigma_e so that DoG approximates LoG (Laplacian of Gaussians). Kyprianidis uses sigma_e = 1.0. Don't know about Winnemöller.
- tau. That's the sensitivity of edge detection. Kyprianidis uses tau = 0.99 while Winnemöller uses tau = 0.98.

Parameters used to smooth the edges in the direction of the flow curves:
- sigma_m. That's the variance of the Gaussian used to smooth the edges. Kyprianidis uses sigma_m = 3.0. Don't know about Winnemöller.

Parameters used to threshold the edges:
- phi_e. Kyprianidis uses phi_e = 2.0 while Winnemöller uses phi_e between 0.75 and 5.0.

There is another parameter that can be used:
- n. That's the number of iterations of "Separated FDoG". Kyprianidis uses n = 1 most of the times. Don't know about Winnemöller.

Step 4: Separated Orientation-aligned Bilateral Filter

Parameters used:
- sigma_d. Same as before.
- sigma_r. Same as before.

For this step, the number of iterations used is the number of iterations that remain, that is, n_a-n_e.

Step 5: Color Quantization

See Non Photorealistic Rendering - Pseudo-Quantization.

Parameters used:
- quant_levels. That's the number of levels used to quantize the luminance. Winnemöller uses nbins = 8 to 10.
- phi_q. Controls the softness of the quantization. Winnemöller uses phi_q = 3.0 to 14.0.

Here's an example:


Input RGB image.


Output blended image.

Parameters used:
tensor_sigma = 3
n_e = 1
n_a = 1
sigma_d = 3
sigma_r = 4.25
fdog_n = 2
fdog_sigma_e = 1
fdog_tau = 0.99
fdog_sigma_m = 3
fdog_phi = 2
phi_q = 3
quant_levels = 8

Here's a quick video:


At the moment, the software is sitting on my linux box but it is not available for download. If you like this cartoon rendering, feel free to send me your photographs and it will be my pleasure to cartoonify them for you.

Non Photorealistic Rendering - Pseudo-Quantization

This is called pseudo-quantization instead of quantization because it is based solely upon the luminance channel (of the CIE-Lab color space). The goal here is to simulate cell-shading (just like it is done in cartoons). Of course, you'd better be not too picky otherwise you are going to be quite disappointed by the results. The main issue is how to handle the discontinuities between the quantized values: you don't want big color jumps especially in areas where the luminance gradient is small. If the quantized values are not smoothed in any way, then you have in your hands a hard quantization. If some efforts are made to smooth the transitions, it is soft quantization we are talking about.

A pioneer in this technique is Holger Winnemöller. So, let's take a look at "Real-Time Video Abstraction" and see how he handles quantization.


Quantization formula used by Winnemöller.


Quantized values for the luminance using a very small phi_q, a relatively small phi_q, and a relatively large phi_q.

Note that if phi_q is set to 0, the quantized values are q0, q1, q2, etc. As phi_q increases though, the quantized values change to q0+delta_q/2, q1+delta_q/2, q2+delta_q/2, etc. Winnemöller suggests using between 8 to 10 bins and a phi_q between 3.0 and 14.0. The transitions between quantized values are much smoother when phi_q is smaller (good!) but the (horizontal) steps are much shorter (not so good!).

Let's see the results of Winnemöller soft quantization on a real image ...


Image we want to color quantize.


Image quantized using 8 bins and phi_q = 3.0.


Image quantized using 8 bins and phi_q = 14.0.

Yeah, there's definitely quantization happening but not sure you are gonna be able to see the difference between the quantized images. Let's zoom in the upper right!


Image quantized using 8 bins and phi_q = 3.0 (zoomed in the upper right).


Image quantized using 8 bins and phi_q = 14.0 (zoomed in the upper right).

Clearly, the quantized image with phi_q = 3.0 has a smoother transition between flat areas of color than the quantized image with phi_q = 14.0. Kinda looks like the quantized image with phi_q = 3.0 is an anti-aliased version of the quantized image with phi_q = 14.0.

In the paper, Winnemöller is not satisfied with a uniform phi_q. Clearly, phi_q should be a function of the luminance gradient. Indeed, you kinda want phi_q to be relatively small when the luminance gradient is small and you want phi_q to be relatively large when the luminance gradient is large. All you have to do is compute the magnitude of the luminance gradient, clamp it on either end (min and max), and linearly interpolate phi_q between phi_q_min = 3.0 (corresponds to the min luminance gradient magnitude) and phi_q_max = 14.0 (corresponds to the max luminance gradient magnitude).

Non Photorealistic Rendering - Separated Flow-based Difference of Gaussians (Separated FDoG)

One can use the Difference of Gaussians (DoG) to extract edges from an intensity image. As you know, the Gaussian filter is a low-pass filter, meaning that it removes high frequency artifacts in an image (e.g. noise). When you take the Gaussian of an image and you subtract a larger sigma Gaussian from it, you essentially have a band-pass filter that rejects high frequency as well as low frequency intensities. That's perfect to detect edges! It should be noted that the edges we want when stylizing images are not necessarily the same edges one might want in other Computer Vision field. Indeed, we want relatively thick well-defined edges, not skinny edges like those you usually get with the Canny edge detector, for example.

To get started with computing the edge image, a one-dimensional DoG filter is applied along the gradient direction for every pixel in the input image (in CIE-Lab color space). It is customary to choose the variance of the 2nd Gaussian filter to be 1.6 times the variance of the 1st Gaussian. This is done so that the Difference of Gaussians (DoG) approximates the Laplacian of Gaussians (LoG).


Difference of Gaussians (DoG) filter applied in the gradient direction.

Now, in order to have nice flowing edges that would not be out of place in a Sunday morning paper funnies section, you need to smooth the edges along ... well, the edges. The problem is that it is those very nice flowing edges that we are trying to define. A chicken and egg problem, no doubt. Kang et al. in "Flow-based Image Abstraction" are the first ones to come up with a solution. They use the Edge Tangent Flow (ETF) vector field to smooth the edges. What you do is take the edge image coming from the DoG filter and convolve it along the flow curves of the ETF using a bilateral filter. It is very similar to the convolution used to visualize the ETF in Line Integral Convolution (LIC). For more info about ETF and LIC, check Non Photorealistic Rendering - Edge Tangent Flow (ETF).

We are not quite done yet as thresholding is applied to the edge image so that the grayscale values are (pretty much) either 0 (black) or 1 (white). A smooth step function function is used to control how sharp the edges are going to be.


Thresholding via smoothed step function.


Input rgb image.


Edge image obtained with Separated Flow-based Difference of Gaussians (Separated FDoG).

Non Photorealistic Rendering - Separated Orientation-Aligned Bilateral Filter (Separated OABF)

Bilateral filtering is a staple in the Computer Vision diet. It is used a lot to smooth images while preserving edges. The problem is that it is, in its naive implementation at least, quite slow especially if it needs to be iterated. As you probably know, the bilateral filter is a mix of two Gaussian filters, one that considers the spatial distance between the pixel under consideration and the neighboring pixel (in the convolution window) as the argument to the Gaussian function and one that considers the color distance. In contrast, the Gaussian filter is quite fast because it is separable, that is, you can apply a one-dimensional Gaussian filter along the horizontal and then apply another one-dimensional Gaussian filter along the vertical instead of applying a true two-dimensional Gaussian filter.

So, the bilateral filter is not separable. What can we do about that? Well, you can still separate the bilateral filter if you want to but it won't be the same. For some applications, it doesn't really matter if the implementation of the bilateral filter is not exact. So, you can separate the two-dimensional bilateral filter by applying a one-dimensional bilateral filter along the horizontal and then a one-dimensional bilateral filter along the vertical. If you have an Edge Tangent Flow (ETF) vector field handy, you can separate the bilateral filter by applying the one-dimensional bilateral filter along the gradient direction and the tangent direction alternatively. Recall that the gradient direction is perpendicular to the tangent direction and goes across the edge. This separation along the gradient and tangent is referred to as "Separated Orientation-Aligned Bilateral Filter" (Separated OABF) in "Image Abstraction by Structure Adaptive Filtering" by Jan Eric Kyprianidis and Jürgen Döllner. It is supposed to behave better at preserving shape boundaries than the non-oriented version (See "Flow-Based Image Abstraction"by Henry Kang et al.)

Before showing some examples of Separated OABF, it is probably a good idea to give the formula for the bilateral filter. You control the way the bilateral filter behaves by the variance "sigma_d" of the spatial Gaussian function and the variance "sigma_r" of the color (aka tonal or range) Gaussian function. The larger sigma_d, the larger the influence of far away (in the spatial sense) sampling points. The larger sigma_r, the less edge-preserving the filter is going to be.


The bilateral filter weights and formula.


Input rgb image (640x422).


Output from the "Separated Orientation-Aligned Bilateral Filter" (Separated OABF) using sigma_d = 16 and sigma_r = 16.

Non Photorealistic Rendering - Edge Tangent Flow (ETF)

Given an input rgb image, given a pixel of that input rgb image, the eigenvectors of the so-called "structure tensor" (aka "2nd moment tensor") define the directions of extreme (minimum and maximum) rate of change (intensity rate of change). The eigenvector associated with the smallest eigenvalue defines the direction of minimum rate of change. The eigenvector associated with the largest eigenvalue defines the direction of maximum rate of change. If there is an edge at a given pixel, it is the eigenvector associated with the smallest eigenvalue that gives the direction of the edge. Indeed, if you go along the edge, the rate of change (in intensity) is minimum. So, for each pixel of the image, if you consider the eigenvector associated with the smallest eigenvalue, you have got yourself a vector field which is referred to as the Edge Tangent Flow (ETF) vector field. The name Edge Tangent Flow (ETF) comes from "Flow-Based Image Abstraction" by Henry Kang, Seungyong Lee, and Charles K. Chui. To get good results, the structure tensor should be smoothed a little using the usual Gaussian filter.

To visualize an Edge Tangent Flow (ETF) vector field, you can use Line Integral Convolution (LIC). What you do is start with a noise image where each pixel is given a random grayscale value. Then, for each pixel of the noise image, you follow the flow/stream line that passes through the pixel going forward and backward so that the pixel is in the middle of the flow/stream line under consideration. You convolve (with a one-dimensional bilateral filter) the values along that stream line and assign the resulting average value to the pixel. In the output grayscale image, the actual grayscale values have no meaning, that is, whiter does not mean a stronger edge.


Input rgb image.


Edge Tangent Flow (ETF) visualized by Line Integral Convolution (LIC).

With a good imagination, one can kinda see a Vincent Van Gogh or Edvard Munch style in the LIC visualization of an ETF. ETFs are used quite a bit in Non-Photorealistic Rendering (NPR), in particular, in Stroke-Based Rendering (SBR) or Painterly Rendering as they provide a simple way to orient brush strokes. ETFs are also used in image abstraction and stylization.

Sunday, May 13, 2018

Case Study - How to improve depth map quality with DMAG4

The original mpo file comes from Peter Simcoe at Design-Design. It was taken with a Fuji W3. The original dimensions are 3477x2016 pixels but I reduced the width to 1200 pixels mostly for my own convenience.


Left image (1200 pixels wide).


Right image (1200 pixels wide).

It is always a good idea to rectify the stereo pair before attempting to create depth maps. I use er9b (or er9c if er9b is too aggressive) to do so mainly because it also outputs the min and max disparities, which are usually needed by the automatic depth map generators. You can use StereoPhoto Makers to align but you will have to use df2 to manually get the min and max disparities.


Left image after rectification by er9b.


Right image after rectification by er9b.

Output from er9b (there's a whole lot more verbose from er9b but that's the important bit):
Mean vertical disparity error = 0.367249
Min disp = -21 Max disp = 16
Min disp = -23 Max disp = 15

Don't worry about the white areas in the left and right images as they will be cropped out after the depth map is obtained. They result from the camera rotations needed to align the stereo pair.

I am gonna use dmag6 to get the depth map using the min and max disparities coming from er9b. I could have used dmag5 with very similar results. We are only gonna consider the left depth map (associated with the left image) even though dmag6 outputs the left and right depth maps.

Input to dmag6:
min disparity for image 1 = -23
max disparity for image 1 = 16
disparity map for image 1 = depthmap_l.png
disparity map for image 2 = depthmap_r.png
occluded pixel map for image 1 = occmap_l.png
occluded pixel map for image 2 = occmap_r.png
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
truncation (discontinuity) = 10000
iteration number = 5
level number = 5
data cost weight = 0.5
disparity tolerance = 0
radius to smooth occlusions = 9
sigma_space = 9
sigma_color = 25.5
downsampling factor = 1


Depth map obtained by dmag6.

As promised, let's get rid of the artifacts coming from the rectification process by cropping the depth map and the reference image (left image).


Depth map obtained by dmag6 after cropping.


Reference image (left image) after cropping.

There are some areas in the foreground where the depth map is not that great. I am gonna use dmag4 to semi-automatically improve those areas. The input to dmag4 is the reference image which we obviously already have (that's the left image after cropping), the sparse/scribbled depth map, and the edge image.

To get the sparse/scribble depth map, you simply start with the cropped depth map and use the eraser tool (with anti-aliasing off, that is, hard edge on) to remove the areas you don't like, revealing the checkerboard pattern underneath. Of course, it's not the usual sparse/scribble depth map dmag4 usually takes for 2d to 3d conversion but it works just the same.


Input to dmag4 (sparse/scribbled depth map). The pure white areas are the areas that I erased. In gimp, they are transparent (checkerboard pattern) but blogger shows them as white.

To get the edge image, you simply trace the object boundaries in the areas that you erased in the sparse/scribbled depth map using the paths tool. The methodology in gimp is quite simple: create a path with the paths tool and then stroke it with anti-aliasing checked off and choosing a width of 1 pixel. Here, I use red for the color but you can use whatever color you want.


Input to dmag4 (edge image). What is shown as white in blogger is actually fully transparent (checkerboard pattern) in gimp.

Input to dmag4:
beta = 10
maxiter = 5000
scale_nbr = 1
con_level = 1
con_level2 = 1


Output from dmag4.

As you can see, dmag4 fills the areas that were erased without spilling over the segmentation in the edge image. Because an edge image is used, beta needs to be relatively low (10 is a good value).


Wiggle 3d gif created by wigglemaker.

Any questions? Feel free to email me using the email address in the "About Me" box.

Saturday, March 24, 2018

Multi View Stereo - Dino


Set of 5 images for which we want to build a 3d reconstruction. Images courtesy of Bernd.

First thing to do is to get the camera extrinsics, that is, the camera positions and orientations. For this, I am gonna use sfm10.

Input to sfm10:

Number of cameras = 5
Image name for camera 0 = D1kl.JPG
Image name for camera 1 = D2kl.JPG
Image name for camera 2 = D3kl.JPG
Image name for camera 3 = D4kl.JPG
Image name for camera 4 = D5kl.JPG
Focal length = 1440
initial camera pair = 2 4
Number of trials (good matches) = 10000
Max number of iterations (Bundle Adjustment) = 1000
Min separation angle (low-confidence 3D points) = 0
Max reprojection error (low-confidence 3D points) = 10000
Radius (animated gif frames) = 5
Angle amplitude (animated gif frames) = 10

For the focal length, I didn't really think about it too much and simply use the width of the images. It should be close enough. For the initial camera pair, I also didn't think too much about it and chose two images that seemed to have been taken from not too far apart viewpoints. I am sure choosing "0 1" as the initial stereo pair would have been just fine too. I don't care about the min separation angle allowed (giving it a value of 0). I don't care either about the max reprojection error allowed (giving it a value of 10000). I usually worry those later, looking at the output of sfm10.

I have got to admit that sfm10 outputs a lot of verbose. The important part is at the end, in particular, the last computed average reprojection error (I like to have it under 1.0 and if possible under 0.5) and the min and max depths (I do not like to see a max depth that's very large). Had those been out of specs, I would have gone back to the input file and tweak things around, like the min separation angle allowed and the max reprojection error allowed.

This is the last bits of info in the output from sfm10:

Number of 3D points = 1506
Average reprojection error = 0.432055
Max reprojection error = 4.94969
Adding camera 1 to the 3D reconstruction ... done.
Looking for next camera to add to 3D reconstruction ...
Looking for next camera to add to 3D reconstruction ... done.
No more cameras to add to the 3D reconstruction
Average depth = 18.8279 min depth = 8.99441 max depth = 40.4922

We are ready to build the dense reconstruction using mvs10.

This is the input to mvs10:

nvm file = duh.nvm
Min match number (camera pair selection) = 100
Max mean vertical disparity error (camera pair selection) = 1
Min average separation angle (camera pair selection) = 0.1
radius (disparity map) = 32
alpha (disparity map) = 0.9
color truncation (disparity map) = 30
gradient truncation (disparity map) = 10
epsilon = 255^2*10^-4 (disparity map)
disparity tolerance (disparity map)= 0
downsampling factor (disparity map)= 4
sampling step (dense reconstruction)= 1
Min separation angle (low-confidence 3D points) = 0.1
Max reprojection error (low-confidence image points) = 10
Min image point number (low-confidence 3D points) = 3
Radius (animated gif frames) = 1
Angle amplitude (animated gif frames) = 1

The min match number, the max mean vertical disparity error, and the min average separation angle are used to determine whether or not a camera/image pair should be added to the 3d reconstruction. Obviously, you don't want to add a camera/image pair if the 3d points are not gonna be accurate. But you don't want to be too picky either otherwise you won't have too many points in the 3d reconstruction.

The radius, alpha, the color truncation, the gradient truncation, epsilon, the disparity tolerance, and the downsampling factor are used to generate the depth map for each camera/image pair. See dmag5 if you want to know more about these parameters.

The min separation angle, the max reprojection error, and the min image point number are used to determine whether a 3d point should remain in the 3d reconstruction. The min image point number is probably the most important. If set to 3, like here, it means that a 3d point should appear in at least 3 views to be accepted. A whole lot of points get rejected if they only appear in 2 images.

Just like sfm10, mvs10 spews out a lot of stuff as it rectifies the stereo pairs, compute the depth maps, and adds/removes points to the 3d scene. At the end, you have the final number of points generated and the average reprojection error. You want a good number of points and a relatively low average reprojection error.

This is the end of the mvs10 output:

Number of 3D points = 716633
Average reprojection error = 0.858998
Max reprojection error = 9.97793

What I do next is load up the duh_xx.jpg in the Gimp and make an animated gif out of them. That's probably the best way to check if the 3d reconstruction is decent. You could also load the ply file in Meshlab or CloudCompare to actually see the point cloud in 3d.


Animated gif showing the dense 3d reconstruction.

Now, is it possible to simply create a depth map out of that sequence of images instead of building a dense 3d reconstruction? Yes, it's possible but it's not easier than building a dense 3d reconstruction. I have a tool to do that, it's called dmag8b (dmag8 can also be used if you wnt). To use dmag8b, you need to run sfm10 first to get the camera extrinsics. We have already done that so let's go straight to dmag8b.

Input to dmag8b:

nvm file = duh.nvm
image ref = ?
near plane = 8.99
far plane = 40.49
nbr of planes = 200
radius = 16
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png

If you don't want the program to automatically determine which image should be the reference image, I suggest putting the name of the image that's first in the nvm file. Since sfm10 outputs the min and max depths, that's what I use for near and far planes. I believe that if you put 0 for the min and max depths, dmag8b will compute them from the nvm file and you'll end up with the exact same values (that sfm10 came up with). For the number of planes, I usually start with something small and then increase it. Here, I started at 20 planes and finished with 200.

The radius, alpha, the truncation color, the truncation gradient, and epsilon are the dmag5 parameters you should now be familiar with.


Depth map produced by dmag8b.

Yes, not a good depth map. Because dmag8b is not exactly a race horse, tweaking the parameters is not an option. Maybe the radius is too small. Maybe the viewpoints are too different. I don't know. Anyways, I think one would get a better depth map from just 2 images using er9b (or er9c) as the rectifier, dmag5 (or dmag5b or dmag6 or whatever) as the depth map generator, and then dmag9b as the depth map optimizer.

When you have more than 2 views, I think it's better to either build the 3d reconstruction or get a depth map from just 2 views. Building a depth map from more than 2 views is quite difficult. It's usually easier to build the 3d scene. It makes more sense too. This is probably why you don't see too many programs that can do depth map generation from multiple views. Then again, Google's "Lens Blur" does just that and pretty well too. Will have to investigate ... if there's enough interest.

Let's see if we can get a better depth map with dmag8b ...

From the original set of 5 views, I am gonna remove 2 views so that the remaining views have similar viewpoints.


Sequence of 3 views (instead of 5).

This is my new input to sfm10:

3
D3kl.JPG
D4kl.JPG
D5kl.JPG
1440.
0 2
10000
1000
0.0
10000.0
5
10.

Now, let's run dmag8b using different input decks and see if the depth map gets better.

Input to dmag8b:

nvm file = duh.nvm
image ref = D3kl.JPG
near plane = 10.247
far plane = 32.037
nbr of planes = 200
radius = 16
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png

The only thing I changed from previous run is the near and far planes: I used what sfm10 gave me in the output.


Depth map obtained by dmag8b.

I think it's much better. I still think the image viewpoints are a bit too far apart as the occlusions give rise to wrong depths, especially around the dinosaur's boundary. Unlike dmag5 and the likes, dmag8b doesn't explicitly deals with occlusions, so it's very likely you will get wrong depths at object boundaries (in the foreground). You can reduce this problem by not moving the camera too much when shooting the sequence.

Let's change alpha from 0.9 to 0 ...

Input to dmag8b:

nvm file = duh.nvm
image ref = D3kl.JPG
near plane = 10.247
far plane = 32.037
nbr of planes = 200
radius = 16
alpha = 0
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png


Depth map obtained by dmag8b.

Not much of a change. Let's increase the radius keeping alpha = 0 ...

Input to dmag8b:

nvm file = duh.nvm
image ref = D3kl.JPG
near plane = 10.247
far plane = 32.037
nbr of planes = 200
radius = 32
alpha = 0
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png


Depth map obtained by dmag8b.

Not much of a change either. Let's keep the radius at 32 and set alpha to 0.9 ...

Input to dmag8b:

nvm file = duh.nvm
image ref = D3kl.JPG
near plane = 10.247
far plane = 32.037
nbr of planes = 200
radius = 32
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png


Depth map obtained by dmag8b.

Again, not much of a change. Let's see what happens when we reduce the number of planes from 200 to 200.

Input to dmag8b:

nvm file = duh.nvm
image ref = D3kl.JPG
near plane = 10.247
far plane = 32.037
nbr of planes = 100
radius = 32
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
grayscale depth map = depth.png


Depth map obtained by dmag8b.

Again, not much of change. Let's stop here.

So, I think that to get the best results with dmag8b, the viewpoints should be close to each other, in other words, the camera should not be moved too much between shots. Note that this requirement makes the job of sfm10 a little bit harder.