Friday, September 6, 2019

Justin Johnson's Neural Style Torch Implementation Explained

If you have found yourself scratching your head trying to understand the torch implementation of the paper "A Neural Algorithm of Artistic Style" by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge written by Justin Johnson in github, help is on the way in the form of the following technical note:

Justin Johnson's Neural Style Torch Implementation Explained

It assumes you have read (a few times) either "A neural algorithm of artistic style" by Gatys, Leon A and Ecker, Alexander S and Bethge, Matthias or "Image style transfer using convolutional neural networks" by Gatys, Leon A and Ecker, Alexander S and Bethge, Matthias. Note that the latter is actually the better paper.

By the way, here is an example of what Neural Style can do to transfer the texture from style image to content image:


This is the content image.


This is the style image.


This is the generated image.

Note that I am mostly interested in texture transfer without color transfer, in other words, I always want the colors of the content image to remain. The only thing that I want to be transferred is the texture of the style image. This is personal preference as I really like to see the brush strokes.

Wednesday, August 7, 2019

Photogrammetry - Gordon's family in front of garage

My good friend Gordon sent me a set of 5 pictures taken with an iphone 4s and asked for a 3d reconstruction with as many points as possible using Structure from Motion 10 (SfM10) and Multi View Stereo 10 (MVS10).


IMG_0071.JPG


IMG_0076.JPG


IMG_0078.JPG


IMG_0083.JPG


IMG_0100.JPG

step 1 is structure from motion using sfm10. The purpose of sfm10 is to compute the camera positions and orientations corresponding to each input image. It also output a very coarse 3d construction.

Input file for sfm10:

Number of cameras = 5
Image name for camera 0 = IMG_0071.JPG
Image name for camera 1 = IMG_0076.JPG
Image name for camera 2 = IMG_0078.JPG
Image name for camera 3 = IMG_0083.JPG
Image name for camera 4 = IMG_0100.JPG
Focal length = 2800
initial camera pair = 0 2
Number of trials (good matches) = 10000
Max number of iterations (Bundle Adjustment) = 1000
Min separation angle (low-confidence 3D points) = 0.1
Max reprojection error (low-confidence 3D points) = 10
Radius (animated gif frames) = 5
Angle amplitude (animated gif frames) = 1

It's the same as the input file that's in the sfm10_test directory except for the focal length. The focal length needs to be adjusted depending on the size of the images, here 2448x3264. I chose a focal length equal to 2800 (something that's about the same as the width/height). You could use different focal lengths in the same ballpark and it wouldn't make much of a difference (until you go too far and end up with sfm10 issuing errors about the initial camera pair).

Output from sfm10 (the last few bits):

Number of 3D points = 2547
Average reprojection error = 1.13272
Max reprojection error = 13.0476
Adding camera 3 to the 3D reconstruction ... done.
Looking for next camera to add to 3D reconstruction ...
Looking for next camera to add to 3D reconstruction ... done.
No more cameras to add to the 3D reconstruction
Average depth = 8.197 min depth = 1.43428 max depth = 44.7613


3d reconstruction after sfm10.

Step 2 is multi-view stereo with mvs10 to actuall build the dense 3d reconstruction using the results from sfm10.

Input for mvs10:

nvm file = duh.nvm
Min match number (camera pair selection) = 100
Max mean vertical disparity error (camera pair selection) = 1
Min average separation angle (camera pair selection) = 0.1
radius (disparity map) = 16
alpha (disparity map) = 0.9
color truncation (disparity map) = 30
gradient truncation (disparity map) = 10
epsilon = 255^2*10^-4 (disparity map)
disparity tolerance (disparity map)= 0
downsampling factor (disparity map)= 4
sampling step (dense reconstruction)= 1
Min separation angle (low-confidence 3D points) = 0.1
Max reprojection error (low-confidence image points) = 10
Min image point number (low-confidence 3D points) = 3
Radius (animated gif frames) = 1
Angle amplitude (animated gif frames) = 1

This is pretty much the same as the input file in mvs10_test directory. I did change a few things though. For the radius (disparity map), I used 16 instead of 32 for no good reason really. For the sampling step (dense reconstruction), I used 1 instead of 2 so that the number of 3d points would be as large as possible.

Output from mvs10 (the last bits):

Number of 3D points = 1626906
Average reprojection error = 1.59341
Max reprojection error = 12.6571


3d wobble of the dense 3d reconstruction.

Now, having a 3d wobble that looks alright doesn't mean the 3d construction is as good as it could be since you may have bad 3d points, mostly outliers that stretch the depth of the 3d scene way too much. You can see that when loading the point cloud (duh.ply) into sketchfab (make sure to zip the file before uploading to sketchfab in order not to hit the max upload limit which is 50 mb), meshlab, or cloudcompare. If you can't even zoom onto what you think the 3d scene should look like, you have outliers and it's time to tighten the parameters that relate to the low-confidence 3d points or image points. First, you need to make sure "Min image point number (low-confidence 3D points)" is greater or equal to 3. I made sure of that so let's move on. You may want to increase "Min separation angle (low-confidence 3D points)", say, from 0.1 to 0.5. You may also want to decrease "Max reprojection error (low-confidence image points)", say, from 10.0 to 2.0. Let's do just that and see what happens and rerun mvs10. Note that mvs10 should run much faster as it doesn't have to compute the depth maps (saved in the .mvs files).

Input to mvs10:

nvm file = duh.nvm
Min match number (camera pair selection) = 100
Max mean vertical disparity error (camera pair selection) = 1
Min average separation angle (camera pair selection) = 0.1
radius (disparity map) = 16
alpha (disparity map) = 0.9
color truncation (disparity map) = 30
gradient truncation (disparity map) = 10
epsilon = 255^2*10^-4 (disparity map)
disparity tolerance (disparity map)= 0
downsampling factor (disparity map)= 4
sampling step (dense reconstruction)= 1
Min separation angle (low-confidence 3D points) = 0.5
Max reprojection error (low-confidence image points) = 2
Min image point number (low-confidence 3D points) = 3
Radius (animated gif frames) = 1
Angle amplitude (animated gif frames) = 1

Output to mvs10 (the last bits):

Number of 3D points = 1114293
Average reprojection error = 0.837404
Max reprojection error = 2.58606

Of course, the number of 3d points has decreased but the 3d reconstruction should be more accurate.


View of the model in sketchfab.

If you see stepping in the 3d reconstruction (depth jumps), it's probably because of the "downsampling factor (disparity map)" is too large. You can clearly see that when you load up the cloud point that's in the mvs10_test directory (assuming the "downsampling factor (disparity map)" was not changed and is still equal to 4). If you change it from 4 to 2, the depth maps should have more grayscale values (and therefore there will be more possible depth values for the 3d points) but it's gonna take 4 times longer to run mvs10. Note that if you change the "downsampling factor (disparity map)", you need to delete the *.mvs files from your directory to force mvs10 to recompute the depth maps. Once mvs10 completes and you want to later change parameters related to the low-confidence 3d or image points, mvs10 will run much faster as it doesn't have to recompute the depth maps. Anyways, let's rerun mvs10 using "downsampling factor (disparity map) = 2" ans see what happens (remember to remove the .sms files prior).

Input to mvs10:

nvm file = duh.nvm
Min match number (camera pair selection) = 100
Max mean vertical disparity error (camera pair selection) = 1
Min average separation angle (camera pair selection) = 0.1
radius (disparity map) = 16
alpha (disparity map) = 0.9
color truncation (disparity map) = 30
gradient truncation (disparity map) = 10
epsilon = 255^2*10^-4 (disparity map)
disparity tolerance (disparity map)= 0
downsampling factor (disparity map)= 2
sampling step (dense reconstruction)= 1
Min separation angle (low-confidence 3D points) = 0.5
Max reprojection error (low-confidence image points) = 2
Min image point number (low-confidence 3D points) = 3
Radius (animated gif frames) = 1
Angle amplitude (animated gif frames) = 1

Output of mvs10 (the last bits):

Number of 3D points = 1346440
Average reprojection error = 0.789732
Max reprojection error = 2.7325


View of the model in sketchfab.

Just for fun, let's increase "Min separation angle (low-confidence 3D points)" from 0.5 to 1.0 and rerun mvs10 (without deleting the .mvs files, of course).

Input to mvs10:

nvm file = duh.nvm
Min match number (camera pair selection) = 100
Max mean vertical disparity error (camera pair selection) = 1
Min average separation angle (camera pair selection) = 0.1
radius (disparity map) = 16
alpha (disparity map) = 0.9
color truncation (disparity map) = 30
gradient truncation (disparity map) = 10
epsilon = 255^2*10^-4 (disparity map)
disparity tolerance (disparity map)= 0
downsampling factor (disparity map)= 2
sampling step (dense reconstruction)= 1
Min separation angle (low-confidence 3D points) = 1
Max reprojection error (low-confidence image points) = 2
Min image point number (low-confidence 3D points) = 3
Radius (animated gif frames) = 1
Angle amplitude (animated gif frames) = 1

Output of mvs10 (the last bits):

Number of 3D points = 1333404
Average reprojection error = 0.78909
Max reprojection error = 2.73256


View of the model in sketchfab.


3d wobble of the dense 3d reconstruction.


Note that the parameters for sfm10 and mvs10 are explained in sfm10_manual.pdf and mvs10_manual.pdf, respectively, which should be sitting in the directory where you extracted the archive ugosoft3d-10-x64.rar.

Tuesday, May 21, 2019

A quick guide on using StereoPhoto Maker (SPM) to generate depth maps

This post describes how to automatically generate depth maps with StereoPhoto Maker. Masuji Suto, the author of StereoPhoto Maker, has integrated two of my tools, DMAG5 and DMAG9b, into StereoPhoto Maker in order for StereoPhoto Maker to be able to generate depth maps. If you don't want to use StereoPhoto Maker at all to generate depth maps, you can certainly use my tools directly: Epipolar Rectification 9b (ER9b) or Epipolar Rectification 9c (ER9c) to rectify/align the stereo pair, Depth Map Automatic Generator 5 (DMAG5) or Depth Map Automatic Generator 5b (DMAG5b) to generate the (initial) depth map, and Depth Map Automatic Generator 9b (DMAG9b) to improve the depth map. All those programs can be downloaded through the 3D Software Page.

Alrighty then, let's get back to the business hand, that is, generating depth maps with StereoPhoto Maker. I am assuming that you have installed StereoPhoto Maker and the combo DMAG5/DMAG9b on your computer. If you haven't done so yet, follow the instructions in How to make Facebook 3D Photo from Stereo pair. I am assuming you have downloaded and extracted the DMAG5/DMAG9b stuff in a directory called dmag5_9. That directory should look like so:


Contents of the dmag5_9 directory.

Because alignment of the stereo pair is of prime importance in depth map generation, I recommend going into "Edit->Preferences->Adjustment" and checking the box that says "Better Precision (slow)". As I don't particularly like large images, the first thing I do after loading the stereo pair is to resize the images to something smaller. In this guide, the stereo pair is an mpo coming from a Fuji W3 camera. The initial dimensions are 3441x2016. You could generate the depth maps using those dimensions but everything is gonna take longer. Clicking on "Edit->Resize", I change the width to 1200. I guess I could have resized to something larger but I wouldn't resize to anything larger than 3000 pixels. Once the stereo pair has been resized, I click on "Adjust->Auto-alignment" to align/rectify the images. As an alternative to SPM's alignment tool, you can use my rectification tools: Epipolar Rectification 9b (ER9b) or Epipolar Rectification 9c (ER9c).

To generate the depth map, I click on "Edit->Depth Map->Create depth map from stereo pair" where I am presented with this window:


Default "Create depth map from stereo pair" window.

What I recommend doing is getting the background and foreground disparity values manually first. To get the background value, use the arrow keys so that the two red/cyan views come into focus (merge) for a background point. Same idea for the foreground value. I keep track of those two values and then click on "Get Values (automatic)". If the automatic values make sense when compared to the ones obtained manually, I just leave them be. If they don't, I edit them and put back the values obtained manually. Since my image width is less than 3000 pixels, I don't bother with the "maximum image width" box. I don't like the idea of SPM resizing the images automatically so I always make sure that my image width is less than what it's in the "maximum image width" box. The reason I don't like the idea of SPM resizing my images is because, if you change the image dimensions, you are supposed to also change the min and max disparity, and I am not sure SPM does it. So, beware! As an alternative to having SPM compute the min and max disparities, you can use my rectification tools: Epipolar Rectification 9b (ER9b) or Epipolar Rectification 9c (ER9c). They both give you the min and max disparities in their output. I only want the left depth map so I keep the "Create left/right depth maps" unchecked. Since I want white to be for the foreground and black for the background (I always use white for the foreground in my depth maps), I click on "Create depth map (front: white, back: black)". Note that because I clicked on "Create depth map (front: white, back: black)", when it is time to save the depth map by going into "Edit->Depth map->Save as Facebook 3d photo (2d photo+depth)", I will need to click on "white" for the "Displayed depth map front side->white" so that the depth map does not get inverted when saved by SPM. If you don't mind a depth map where the foreground is black, then leave the "Create depth map (front: white, back: black)" radio button alone and don't click on it.

I recommend using the default parameters in DMAG5 (on the left) and DMAG9b (on the right) to get the first depth map (as you may need to tweak the parameters to get the best possible depth map). To be sure I have the default settings (SPM stores the latest used settings), I click on "Default settings". To get the depth map, I click on "Create depth map". This is the result:


Left image and depth map produced by SPM.

To get the left image on the left and the depth map on the right, click on the "Side-by-side" icon in the taskbar. For more info on DMAG5, check Depth Map Automatic Generator 5 (DMAG5). For more info on DMAG9b, check Depth Map Automatic Generator 9b (DMAG9b) and/or have a look at dmag9b_manual.pdf in the dmag5_9 directory.

Now, if you go into the dmag5_9 directory, you will find some very interesting intermediate images:
- 000_l.tif. That's the left image as used by DMAG5.
- 000_r.tif. That's the right image as used by DMAG5.
- con_map.tiff. That's the confidence map computed by DMAG9b. White means high confidence, black means low confidence. If you clicked on "Create left/right depth maps", the confidence map gets overwritten when DMAG9b is called to optimize the right depth map.
- dps_l.tif. That's the left depth map produced by DMAG5.
- dps_r.tif. That's the right depth map produced by DMAG5. Even if the "Create left/right depth maps" box is unchecked, the right depth map is always generated by DMAG5 in order to detect left/right inconsistencies in the left depth map.
- out.tif. That's the depth map produced by DMAG9b. DMAG9b uses the depth map generated by DMAG5 as input and improves it. If you clicked on "Create left/right depth maps", there should be out.tif (left depth map) and out_r.tif (right depth map).

The depth map "out.tif" is basically the same as the depth map that you get by clicking on "Edit->Depth map->Save as Facebook 3D photo (image+depth)".

It should be noted that once you have created a depth map, you cannot re-click on "Edit->Depth map->Create depth map from stereo pair", possibly change parameters, and create another depth map. If you do that, the new depth map is basically going to be garbage because the right image has been replaced by the previously created depth map (take a look at 000_r.tif in the dmag5_9 directory to convince yourself). To re-generate the depth map, you need to undo or reload the stereo pair.

At this point, you may want to play with the DMAG5/DMAG9b parameters to see if you can improve the depth map. I think it's a good idea but be aware there may be areas in your image where the depth map cannot be improved upon. For instance, if you have an area that has no texture (think of a blue sky or a white wall) or an area that has a repeated texture, it is quite likely the depth map is gonna be wrong no matter what you do. So, improving the depth map is not always easy as, usually, some areas will get better while others will get worse. Some parameters like DMAG5 radius or DMAG9b sample rate spatial depend on the image dimensions and should kind of be tailored to the image width. I think it's best to change one parameter at a time and see the effect it has on the depth map. If things get better, keep changing that parameter (in the same direction) until things get worse. Then, you tweak the next parameter. Of course, it is up to you whether or not you want to spend the time tinkering with parameters. If you always shoot with the same camera setup, this tinkering may only need to be done once.

Let's see which parameters used by DMAG5 and DMAG9b are worth tinkering with in the quest for a better depth map. I believe changing the sample rate spatial used by DMAG9b from the default 32 to 16 or even 8 should be the first thing to change when trying to improve the depth map. I think the default 32 is probably too large especially if the image width is not that large (like here for our test case).

What DMAG5 parameters (on the left side of the window) give you the most bang for your bucks when trying to improve the mesh quality?

- radius. The larger the radius, the smoother the depth map generated by DMAG5 is going to be but the less accurate. As the radius goes down, more noise gets introduced into the depth map.
- downsampling factor. The larger the downsampling factor, the faster DMAG5 will run but the less accurate. Running DMAG5 using a downsampling factor of 2 is four times faster than running DMAG5 using a downsampling factor of 1 (no downsampling).

Note that those observations concern DMAG5 only. So, you should look at the dps_l.tif file to see the effects of changing DMAG5 parameters. Also, you may find that sometimes the depth map generated by DMAG5 is actually better than the one generated by the combo DMAG5/DMAG9b. Note that DMAG9b can be so aggressive that variations in the depth map produced by DMAG5 do not matter much.


Left depth map generated by DMAG5 (dps_l.tif) using default values.

What happens to the left depth map generated by DMAG5 if you change the radius from 16 to 8 (every other parameter set to default)? Let's find out!


"Create depth map from stereo pair" window. Changed DMAG5 radius from 16 to 8 (every other parameter set to default).


Left depth map generated by DMAG5 (dps_l.tif). Changed DMAG5 radius from 16 to 8 (every other parameter set to default).

What happens to the left depth map generated by DMAG5 if you change the downsampling factor from 2 to 1 (every other parameter set to default)? Let's find out!


"Create depth map from stereo pair" window. Changed DMAG5 downsampling factor from 2 to 1 (every other parameter set to default).


Left depth map generated by DMAG5 (dps_l.tif). Changed DMAG5 downsampling factor from 2 to 1 (every other parameter set to default).

What DMAG9b parameters (on the right side of the window) give you the most bang for your bucks when trying to improve the mesh quality?

- sample rate spatial. The larger the sample rate spatial, the more aggressive DMAG9b will be. I recommend going from 32 (default value) to 16, 8, and even 4. If you can clearly see "blocks" in your depth map, the sample rate spatial is probably too large and should be reduced (by a factor of 2).
- sample rate range. The larger the sample rate range, the more aggressive DMAG9b will be. The default value is 8 but you can try 4 or 16 and see if it's any better.
- lambda. The larger the lambda, the smoother the depth map is going to be (in other words, the more aggressive DMAG9b will be). The default value is 0.25 but you can certainly try larger or smaller values. If you don't want the output depth map to be too different from the depth map generated by DMAG5 (dps_l.tiff), use smaller values for lambda (you will probably also need to use smaller values for the sample rate spatial and the sample rate range).


Depth map generated by DMAG9b (out.tif) using default values.

What happens to the depth map generated by DMAG9b if you change the sample rate spatial from 32 to 16 (every other parameter set to default)? Let's find out!


"Create depth map from stereo pair" window. Changed DMAG9b sample rate spatial from 32 to 16 (every other parameter set to default).


Depth map generated by DMAG9b (out.tif). Changed DMAG9b sample rate spatial from 32 to 16 (every other parameter set to default).

What happens to the depth map generated by DMAG9b if you change the sample rate range from 8 to 4 (every other parameter set to default)? Let's find out!


"Create depth map from stereo pair" window. Changed DMAG9b sample rate range from 8 to 4 (every other parameter set to default).


Depth map generated by DMAG9b (out.tif). Changed DMAG9b sample rate range from 8 to 4 (every other parameter set to default).

What happens to the depth map generated by DMAG9b if you change lambda from 0.25 to 0.5 (every other parameter set to default)? Let's find out!


"Create depth map from stereo pair" window. Changed DMAG9b lambda from 0.25 to 0.5 (every other parameter set to default).


Depth map generated by DMAG9b (out.tif). Changed DMAG9b lambda from 0.25 to 0.5 (every other parameter set to default).

Now, if you want to manually edit the generated depth map, you can do so in SPM by clicking on "Edit->Depth map->Correct depth map". If you want to do edit the generated depth map semi-automatically, you can use the techniques centered around DMAG11 or DMAG4 that are described in Case Study - How to get depth maps from old stereocards using ER9c, DMAG5, DMAG9b, and DMAG11 and Case Study - How to improve depth map quality with DMAG9b and DMAG4.

For the ultimate experience in editing depth maps semi-automatically, I recommend using 2d to 3d Image Conversion Software - The 3d Converter: load up the left image and the depth map (add an alpha channel if it does not have one), use the eraser tool to delete parts in the depth map you do not like (that becomes sparse_depthmap_rgba_image.png needed by the3dconverter), create a new layer and trace where you don't want the depths to bleed through (that becomes edge_rgba_image.png needed by the3dconverter), and run the3dconverter to get the new depth map called dense_depthmap_image.png. You do not to worry or care about gimp_paths.svg, ignored_gradient_rgba_image.png, and emphasized_gradient_rgba_image.png as they are not needed.

Sunday, May 12, 2019

Case Study - DMAG5/DMAG9b vs DMAG5b/DMAG9b

This post kinda compares a depth map produced by the combo DMAG5/DMAG9b vs the combo DMAG5b/DMAG9b. Thanks to my good friend Katsuhiko Inoue for providing the stereo pair (taken in portrait mode with an iphone X).


Left image of stereo pair after rectification by ER9b.


Right image of stereo pair after rectification by ER9b.

I do not know how the right image was obtained. It certainly was not obtained from a portrait mode stereo photo using the dual lens as it's not possible to extract the right image from an iphone X stereo photo. Even if you could extract the right image, it would not be the same focal length as the left image, meaning you would need specialized depth map generation software to get the depth map. Here, I am talking about the dual lens iphone X (back-facing camera system), not the TrueDepth sensor (front-facing camera system). I think the depth map produced by the iphone X was used here to create a synthetic right image using 3dsteroid pro or stereophoto maker. Basically, what I am gonna be doing here is see if I can recover the original depth map from the left image and a synthetic right image.

The dimensions of the original stereo pair are 3024x4032. I reduced the dimensions to 1800x2400 so that DMAG9b would run faster. The only reason I ran ER9b was to get the min and max disparities. It looks like the original stereo pair was very well aligned. Note that because the baseline is so small, you don't want to reduce the image size too much otherwise you are going to get a depth map with few depth levels (shades of gray) as far as DMAG5 and DMAG5b are concerned. Note that the number of depth levels is equal to the difference between the min and max disparities. So, for example, if the min disparity is -44 and the max disparity is 10, you are gonna get 55 depth levels (shades of gray) in the depth map produced by DMAG5 or DMAG5b. Something to consider.

Now, let's run DMAG5 using the following input file:

image 1 = ../er9b/image_l.png
image 2 = ../er9b/image_r.png
min disparity for image 1 = -44
max disparity for image 1 = 10
disparity map for image 1 = depthmap_l.png
disparity map for image 2 = depthmap_r.png
occluded pixel map for image 1 = occmap_l.png
occluded pixel map for image 2 = occmap_r.png
radius = 16
alpha = 0.9
truncation (color) = 30
truncation (gradient) = 10
epsilon = 255^2*10^-4
disparity tolerance = 0
radius to smooth occlusions = 9
sigma_space = 9
sigma_color = 25.5
downsampling factor = 2

I believe those are the default values in StereoPhoto Maker.


Left depth map obtained by DMAG5.

If you want to experiment, you could change the value for the radius. Maybe try 8 or 32 instead of 16 and see what happens. Also, you may want to change the downsampling factor to 1 instead of 2. It will take longer but you will get more levels of depth in the depth map (shades of gray).

Let's run DMAG9b using the following input file:

reference image = ../../er9b/image_l.png
input disparity map = ../depthmap_l.png
sample_rate_spatial = 32
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 2
output depth map image = depthmap_l_dmag9b.png

I believe those are the default in StereoPhoto Maker except for sigma. Here, I am using sigma = 2.0, SPM uses 32.0. I don't think it matters much to be honest. Recall that the lower the sigma, the less confidence is given to the depth in the input depth map.


Confidence map. White means very confident in input depth, black means little confidence. Since sigma is relative low, the black streaks (poor confidence) are quite prominent.


Depth map produced by DMAG9b.

Let's change sigma from 2.0 to 32.0 and run DMAG9b using the following input file:

reference image = ../../er9b/image_l.png
input disparity map = ../depthmap_l.png
sample_rate_spatial = 32
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 32
output depth map image = depthmap_l_dmag9b.png


Confidence map. Since sigma is relatively high, the black streaks (poor confidence) are pretty narrow.


Depth map produced by DMAG9b.

Not a whole lot of difference so I am gonna continue with sigma = 2.0. Let's change sample_rate_spatial from 32 to 16 and run DMAG9b using the following input file:

reference image = ../../er9b/image_l.png
input disparity map = ../depthmap_l.png
sample_rate_spatial = 16
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 2
output depth map image = depthmap_l_dmag9b.png


Depth map produced by DMAG9b.

I think it's a bit better so let's continue the trend and change sample_rate_spatial from 16 to 8. Let's run DMAG9b using the following input file:

reference image = ../../er9b/image_l.png
input disparity map = ../depthmap_l.png
sample_rate_spatial = 8
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 2
output depth map image = depthmap_l_dmag9b.png


Depth map produced by DMAG9b.

I think I hit the sweet spot so I am gonna stop here. Note that as sample_rate_spatial goes down, the cpu time for DMAG9b goes up.

Because the interocular distance is small, it can be worthwhile to use DMAG5b instead of DMAG5 to get the initial depth map. DMAG5b is a very simple algorithm but it will not perform well at object boundaries if the baseline used to take the stereo pair was (relatively) large. Here, it should perform ok since the pair was taken with an iphone with dual cameras.

Let's run DMAG5b using the following input file:


Depth map produced by DMAG5b.

The depth map produced by DMAG5b is actually better (I think) than the depth map produced by DMAG5. In this particular case. Personally, I would stop here and not even bother with DMAG9b but let's see how the best DMAG5/DMAG9b combo (as seen right above) compares with DMAG5b/DMAG9b.

Let's try to improve this depth map using DMAG9b and the following input file (same as the one right above):

reference image = ../../er9b/image_l.png
input disparity map = ../depthmap_l.png
sample_rate_spatial = 8
sample_rate_range = 8
lambda = 0.25
hash_table_size = 100000
nbr of iterations (linear solver) = 25
sigma_gm = 1
nbr of iterations (irls) = 32
radius (confidence map) = 12
gamma proximity (confidence map) = 12
gamma color similarity (confidence map) = 12
sigma (confidence map) = 2
output depth map image = depthmap_l_dmag9b.png


Depth map produced by DMAG9b.

Here, it does not really matter how the initial depth map was obtained as DMAG9b is quite aggressive. To make DMAG9b less aggressive, lambda is probably the parameter to change. The lower lambda is, the less aggressive DMAG9b is going to be.

Thursday, May 9, 2019

2d to 3d conversion - Great White

This post is an example that shows how to use 2d to 3d Image Conversion - The 3d Converter to create a depth map semi-automatically. I have uploaded on dropbox the gimp file which contains all the layers and the paths: great_white.xcf. See 2d to 3d Image Conversion - The 3d Converter for how to use "The 3d Converter".

This is what the3dconverter_input.txt looks like:
reference_rgb_image.png
sparse_depthmap_rgba_image.png
dense_depthmap_image.png
gimp_paths.svg
ignored_gradient_rgba_image.png
emphasized_gradient_rgba_image.png
edge_rgba_image.png
0.0

All you need is a reference image (layer reference_rgb_image in great_white.xcf saved as reference_rgb_image.png), an "edge image" (layer edge_rgba_image in great_white.xcf saved as edge_rgba_image.png), a sparse depth map (layer sparse_depthmap_rgba_image in great_white.xcf saved as sparse_depthmap_rgba_image.png), and a bunch of equal_depth and relative_depth paths (saved as gimp_paths.svg). This is all done in Gimp and you can get to those by downloading great_white.xcf.

Don't worry about ignored_gradient_rgba_image.png and emphasized_gradient_rgba_image.png as those are not used and therefore don't need to exist.


Reference image aka reference_rgb_image.png.


Reference image, sparse depth map, edge image, and gimp paths as seen in great_white.xcf.

The sparse depth map consists of a white blob for the tip of the nose and a black scribble to denote the background. I know it is a bit weird to relegate the water to the background but I don't see any other way to do it. Still, it kinda places the shark in front of the water. Weird, right. For the edge image, I simply traced the outline of the shark using the pencil tool (keeping the shift key pressed so that the line segments are always straight) with the smallest possible hard brush. The purpose of the edge image is to prevent the depths to bleed across object boundaries. The gimp paths are shown in blue. The ones that kinda look like half circles are the equal_depth paths. The ones that kinda connect the equal_depth paths together are the relative_depth paths. What is cool about using gimp paths is that it is very easy to modify them, in particular, it is very easy to change the relative depths between equal_depth paths as all you have to do is rename the relative_depth paths.


Gimp paths: equal_depth and relative_depth paths.

The +XX in the name of a relative_depth path indicates the relative depth between the beginning and end of the path. So, if you need to change the relative depth, you just need to change the XX in the name of the path.


Dense depth map produced by The 3d Converter.


Wiggle/wobble created using depthy.me.


Wiggle/wobble created using wigglemaker.

Check the 3D effect on Facebook (no need to be registered or logged in into Facebook): Facebook 3D photo. I have got to say that Facebook did an excellent job rendering depth maps with their 3D photos. Clap clap!

This is kinda a trial and error process. So, to help me in visualizing the dense depth map, I use Depth Player. I think it is a great tool. Note that the depth map doesn't have to be too accurate if you just want to post it as a Facebook 3D photo.

Video that kinda explains how to get the input files needed by The3dConverter:

Tuesday, April 23, 2019

Epipolar Rectification 9c (ER9c)

ER9c is a variant of Epipolar Rectification 9b (ER9b). ER9c is far less aggressive as ER9b in rectifying images, so if you use ER9b and see a lot of distortion in the rectified images (when compared to the original images), I invite you to use ER9c instead.

In the ugosoft3d-9-x64.rar which you can download at the 3D Software page, you will find the manual for ER9c (er9c_manual.pdf) and a test case for ER9c under er9c_test.

Here is an example:


Input left and right images.

You can clearly see in this animated gif that switches between left and right image that the pair is not aligned. There is quite a bit of rotation, which is not good for automatic depth map generation.


Output left and right images.

The output left and right images are now aligned. This is good news for the automatic depth map generator that will generate the depth maps.

It should be noted that ER9c like ER9b gives the minimum and maximum disparity of the rectified stereo pair in the console window printout. These disparities can be used as input to the automatic depth map generators that are available here for download. Unlike ER9, there's no need to manipulate those values.

The windows executable (guaranteed to be virus free) is available for free via the 3D Software Page.

Monday, April 22, 2019

2d to 3d conversion - Bruce Lee

This post is an example that shows how to use 2d to 3d Image Conversion - The 3d Converter to create a depth map semi-automatically. I have uploaded on dropbox the gimp file which contains all the layers and the paths: bruce_lee.xcf. See 2d to 3d Image Conversion - The 3d Converter for how to use "The 3d Converter".

This is what the3dconverter_input.txt looks like:
reference_rgb_image.png
sparse_depthmap_rgba_image.png
dense_depthmap_image.png
gimp_paths.svg
ignored_gradient_rgba_image.png
emphasized_gradient_rgba_image.png
edge_rgba_image.png
0.0

All you need is a reference image (layer reference_rgb_image saved as reference_rgb_image.png), an "edge image" (layer edge_rgba_image saved as edge_rgba_image.png), a sparse depth map (layer sparse_depthmap_rgba_image saved as sparse_depthmap_rgba_image.png), and a bunch of equal_depth and relative_depth paths (saved as gimp_paths.svg).

Don't worry about ignored_gradient_rgba_image.png and emphasized_gradient_rgba_image.png as those are not used and therefore don't need to exist.


Reference image.


Dense depth map produced by The 3d Converter.


Wiggle/wobble created by Wiggle Maker.