Wednesday, January 13, 2021

Getting depth maps from single images using Artificial Intelligence (AI)

Is it possible in this day and age to get good, usable depth maps from monocular images using Artificial Intelligence (AI)?

Well, I think it's getting there today. And as the number of data sets used to train neural networks increase, results can only get better.

First, there was Google's "Mannequin Challenge" where the data set was taken from thousands of youtube videos featuring the now forgotten "mannequin challenge" where people freeze while the camera turns around them. Google must have spent a lot of time creating the data set since they had to recreate the full 3d scene for each video (to get the depth map).

Here's a video I made on how to use Google's "Mannequin Challenge" 2d to 3d conversion software on google colab:



I think the results are ok in terms of segmentation, that is, objects are properly detected. But the results are not that great in terms of depth, meaning that they are often at incorrect depths. Good effort though.

Well, that was then and this is now. My good friend waniah alerted me to the fact that there was a new kid in town called MiDaS v2.1. The originating paper is: "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer" by René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun. You can find the code on github: MiDaS on github and it's also available on google colab ready to run: MiDaS on google colab.

Here's a video where I put MiDaS to the test on google colab:



In my opinion, the results are very impressive, much better than google's "Mannequin Challenge". Probably because the data set used to train their neural network was much larger.

Let's dig in a little bit more into that MiDaS v2.1. The paper is available here: "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer" by Ranftl et al. So, they are saying that they have used 3d movies to train their model. Cool but how did they get the depth maps for those 3d movie dual screenshots? The answer is: "PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume" by Sun et al. Now, I have been a little bit out of the loop regarding the latest trends in automatic depth map generation from a stereo pair but this is quite interesting. The main idea behind this particular depth map generation technique is optical flow, which is exactly what is used in the first automatic depth map generator I wrote, the original DMAG (Depth Map Automatic Generator). How cool is that! Of course, DMAG doesn't have any CNN (Convolutional Neural Network) involved but that is interesting nonetheless. Back in the day, I (and other early adopters of my software) quickly dismissed DMAG as a viable depth map generator because it could not handle small and/or fast moving objects. The good thing about DMAG though is that you don't need to input the disparity range and the produced depth maps are very smooth. I think I should probably spend some time figuring out how the CNN plays a role in that new implementation of optical flow but that's gonna be for another post, I think.

Now, I came across this idea of coupling optical flow and CNNs somewhere else. Here, on this website: KeystoneDepth. What they did is take thousands of old stereocards and compute their depth maps. Now, guess what was used to compute the depth maps? Yes, an optical flow method coupled with a CNN. The paper that explains the process is here: KeystoneDepth : History in 3D. We learn that they use FlowNet2 to compute the depth maps. FlowNet2 is described in this paper: "Flownet 2.0: Evolution of optical flow estimation with deep networks" by Ilg et al. It's cool to see that automatic depth map generation using optical flow is not dead, actually far from it. Makes me wanna go back into the original DMAG, the one that started it all. We'll see.

To save the obtained depth map when you run MiDaS v2.1, you need to put the following:

fig = plt.imshow(output)
plt.axis('off')
fig.axes.get_xaxis().set_visible(False)
fig.axes.get_yaxis().set_visible(False)
plt.savefig('depthmap.png', bbox_inches='tight', pad_inches = 0)

right after:

plt.imshow(output)
# plt.show()

Use Ctrl-C Ctrl-V to copy paste inside the cell. When you click on the "Files" icon on the left, the saved depthmap should be there. If it's not there, click on the "Refresh" tab just above. To download, simply click on the dots to the right of the name. Be aware that the depth map that MiDaS generates is small (but at the right aspect ratio). This is because MiDaS downsamples the input image to match the sizes of the images in the training sets. So, you will need to resize it yourself so that it matches the size of the input image. You can do that in Gimp or Photoshop quite easily. To make a "Facebook 3d" post, assuming your photo is called "photo.jpg", you need to rename the depth map as "photo_depth.jpg" but you probably know that. You can feed the depth map to Facebook as is, that is, as a rgb heat map. If you prefer grayscale depth maps, you can use gimp or photoshop to switch mode from rgb to grayscale with no ill effect (I think).

3 comments:

  1. Ugo
    I am off topic of your post here
    I wish to thank you for your depth viewer software
    you really should make it into a desktop application
    you may not realize, many people are into fractal programs like mb3d which output depth maps
    I have meen looking for a viewer, I couldn't find one
    can you make it run local?
    I can run a web server on my machine easy to run it local
    its useful
    check "mandelbulb maniacs" on facecrack and you'll see what i mean
    they would use it for sure
    I just did with the prog output.

    ReplyDelete
  2. Dear Ugo,
    I really appreciated for your works.
    The results of dmag are pretty awesome.
    Actually, I have a questions for collaboration work of dmag and google AI depth map from single image.
    I created depth maps from spm google AI from single image.
    But it has quite wrong depth values in some cases.
    I want to use them for your dmag11 input to refine and correct depth.
    But, the result depth map of spm is jpeg format that has no alpha channel.
    So, I can't use it for input depth to your dmag11.
    I've tried adding alpha channel and set the threshold alpha (Layer->Transparency->Threshold) and saved ".png" file in GIMP.
    But it didn't work.
    Do you have any ideas to refine depth using your dmag?

    Thanks

    ReplyDelete
    Replies
    1. You can use the depth map you got from spm or whatever as the sparse depth map in dmag11 or the3dconverter2. Add an alpha channel to the depth map and erase with the erase tool the areas where you don't like the depth. Make sure the erase tool uses a hard brush with no anti-aliasing. I have done it before with the3dconverter2 here http://3dstereophoto.blogspot.com/2020/05/fact-the3dconverter2-can-be-used-to.html. Do you know that you can also correct the depth map directly in SPM? I do it here https://youtu.be/nBllbUbEIqM. You can also check my youtube channel.

      Delete