Wednesday, January 13, 2021

Getting depth maps from single images using Artificial Intelligence (AI)

Is it possible in this day and age to get good, usable depth maps from monocular images using Artificial Intelligence (AI)?

Well, I think it's getting there today. And as the number of data sets used to train neural networks increase, results can only get better.

First, there was Google's "Mannequin Challenge" where the data set was taken from thousands of youtube videos featuring the now forgotten "mannequin challenge" where people freeze while the camera turns around them. Google must have spent a lot of time creating the data set since they had to recreate the full 3d scene for each video (to get the depth map).

Here's a video I made on how to use Google's "Mannequin Challenge" 2d to 3d conversion software on google colab:



I think the results are ok in terms of segmentation, that is, objects are properly detected. But the results are not that great in terms of depth, meaning that they are often at incorrect depths. Good effort though.

Well, that was then and this is now. My good friend waniah alerted me to the fact that there was a new kid in town called MiDaS v2.1. The originating paper is: "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer" by René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun. You can find the code on github: MiDaS on github and it's also available on google colab ready to run: MiDaS on google colab.

Here's a video where I put MiDaS to the test on google colab:



In my opinion, the results are very impressive, much better than google's "Mannequin Challenge". Probably because the data set used to train their neural network was much larger.

Let's dig in a little bit more into that MiDaS v2.1. The paper is available here: "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer" by Ranftl et al. So, they are saying that they have used 3d movies to train their model. Cool but how did they get the depth maps for those 3d movie dual screenshots? The answer is: "PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume" by Sun et al. Now, I have been a little bit out of the loop regarding the latest trends in automatic depth map generation from a stereo pair but this is quite interesting. The main idea behind this particular depth map generation technique is optical flow, which is exactly what is used in the first automatic depth map generator I wrote, the original DMAG (Depth Map Automatic Generator). How cool is that! Of course, DMAG doesn't have any CNN (Convolutional Neural Network) involved but that is interesting nonetheless. Back in the day, I (and other early adopters of my software) quickly dismissed DMAG as a viable depth map generator because it could not handle small and/or fast moving objects. The good thing about DMAG though is that you don't need to input the disparity range and the produced depth maps are very smooth. I think I should probably spend some time figuring out how the CNN plays a role in that new implementation of optical flow but that's gonna be for another post, I think.

Now, I came across this idea of coupling optical flow and CNNs somewhere else. Here, on this website: KeystoneDepth. What they did is take thousands of old stereocards and compute their depth maps. Now, guess what was used to compute the depth maps? Yes, an optical flow method coupled with a CNN. The paper that explains the process is here: KeystoneDepth : History in 3D. We learn that they use FlowNet2 to compute the depth maps. FlowNet2 is described in this paper: "Flownet 2.0: Evolution of optical flow estimation with deep networks" by Ilg et al. It's cool to see that automatic depth map generation using optical flow is not dead, actually far from it. Makes me wanna go back into the original DMAG, the one that started it all. We'll see.

To save the obtained depth map when you run MiDaS v2.1, you need to put the following:

fig = plt.imshow(output)
plt.axis('off')
fig.axes.get_xaxis().set_visible(False)
fig.axes.get_yaxis().set_visible(False)
plt.savefig('depthmap.png', bbox_inches='tight', pad_inches = 0)

right after:

plt.imshow(output)
# plt.show()

Use Ctrl-C Ctrl-V to copy paste inside the cell. When you click on the "Files" icon on the left, the saved depthmap should be there. If it's not there, click on the "Refresh" tab just above. To download, simply click on the dots to the right of the name. Be aware that the depth map that MiDaS generates is small (but at the right aspect ratio). This is because MiDaS downsamples the input image to match the sizes of the images in the training sets. So, you will need to resize it yourself so that it matches the size of the input image. You can do that in Gimp or Photoshop quite easily. To make a "Facebook 3d" post, assuming your photo is called "photo.jpg", you need to rename the depth map as "photo_depth.jpg" but you probably know that. You can feed the depth map to Facebook as is, that is, as a rgb heat map. If you prefer grayscale depth maps, you can use gimp or photoshop to switch mode from rgb to grayscale with no ill effect (I think).

MiDaS is getting better with the release of version 3. In the following video, I compare depth maps obtained by MiDaS v3 and Adobe Photoshop Neural Depth Blur filter:

Tuesday, January 12, 2021

From stereo pair to facebook 3d via depth map using SPM/DMAG

It's relatively easy to create a depth map in SPM if you have a stereo pair. Once you have the depth map, it's child's play to make a facebook 3d post. The stereo pair can be an MPO file coming from a Fuji W3 camera, a scanned stereocard from the 1900s, a scanned 1950s Realist format stereo slide, etc. Here I am focusing on scanned Realist slides mainly because they are amazing, usually.

First thing to do is to separate the left and right images. That's easy enough to do in Gimp using the "Rectangle Select" tool. Once you have the left and right chips, they can be loaded into StereoPhoto Maker (SPM) and this is where the magic should happen. First thing to do is to align the two images using "adjust->auto alignment". Make sure that you have selected " Better Precision (Slow)" in "Edit->Preferences" (tab: Adjust) prior. It's time to generate the depth map by clicking on "Edit->Depth Map->Create depth map from stereo pair". Do not click on "Get values automatic" to get the background and foreground values, use the arrow keys. Do not let SPM resize your images by making sure your image size is smaller than the "maximum image width" (default = 3000). If unsure, just put a super large number in the box for "maximum image width". If your image size is about 1,000 pixels in width, the default values should be good except maybe the radius for DMAG5 which can be changed from 16 to 32. The depth map you get should be ok but most likely far from being perfect. Click on "Edit->Depth Map->Correct depth map" to access the very useful depth map correction tool. It's very simple to use. Keep "Ctrl" pressed and use right-click to select a color in the depth map and left-click to paint the depth map.

To get rid of outliers, you may want to go into Gimp and use Filters->Blur->Median Blur. You can also use Filters->Blur->Selective Gaussian Blur to smooth the depth map without losing too much definition.

Here's a video that shows how I get a proper depth map in SPM from a scanned stereo slide:
 

Here's another video where I spend much more time on depth map correction using the SPM depth map correction tool: