3D Stereoscopic Photography: Getting depth maps from single images using Artificial Intelligence (AI)

Wednesday, January 13, 2021

Getting depth maps from single images using Artificial Intelligence (AI)

Is it possible in this day and age to get good, usable depth maps from monocular images using Artificial Intelligence (AI)?

Well, I think it's getting there today. And as the number of data sets used to train neural networks increase, results can only get better.

First, there was Google's "Mannequin Challenge" where the data set was taken from thousands of youtube videos featuring the now forgotten "mannequin challenge" where people freeze while the camera turns around them. Google must have spent a lot of time creating the data set since they had to recreate the full 3d scene for each video (to get the depth map).

Here's a video I made on how to use Google's "Mannequin Challenge" 2d to 3d conversion software on google colab:

I think the results are ok in terms of segmentation, that is, objects are properly detected. But the results are not that great in terms of depth, meaning that they are often at incorrect depths. Good effort though.

Well, that was then and this is now. My good friend waniah alerted me to the fact that there was a new kid in town called MiDaS v2.1. The originating paper is: "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer" by René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun. You can find the code on github: MiDaS on github and it's also available on google colab ready to run: MiDaS on google colab.

Here's a video where I put MiDaS to the test on google colab:

In my opinion, the results are very impressive, much better than google's "Mannequin Challenge". Probably because the data set used to train their neural network was much larger.

Let's dig in a little bit more into that MiDaS v2.1. The paper is available here: "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer" by Ranftl et al. So, they are saying that they have used 3d movies to train their model. Cool but how did they get the depth maps for those 3d movie dual screenshots? The answer is: "PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume" by Sun et al. Now, I have been a little bit out of the loop regarding the latest trends in automatic depth map generation from a stereo pair but this is quite interesting. The main idea behind this particular depth map generation technique is optical flow, which is exactly what is used in the first automatic depth map generator I wrote, the original DMAG (Depth Map Automatic Generator). How cool is that! Of course, DMAG doesn't have any CNN (Convolutional Neural Network) involved but that is interesting nonetheless. Back in the day, I (and other early adopters of my software) quickly dismissed DMAG as a viable depth map generator because it could not handle small and/or fast moving objects. The good thing about DMAG though is that you don't need to input the disparity range and the produced depth maps are very smooth. I think I should probably spend some time figuring out how the CNN plays a role in that new implementation of optical flow but that's gonna be for another post, I think.

Now, I came across this idea of coupling optical flow and CNNs somewhere else. Here, on this website: KeystoneDepth. What they did is take thousands of old stereocards and compute their depth maps. Now, guess what was used to compute the depth maps? Yes, an optical flow method coupled with a CNN. The paper that explains the process is here: KeystoneDepth : History in 3D. We learn that they use FlowNet2 to compute the depth maps. FlowNet2 is described in this paper: "Flownet 2.0: Evolution of optical flow estimation with deep networks" by Ilg et al. It's cool to see that automatic depth map generation using optical flow is not dead, actually far from it. Makes me wanna go back into the original DMAG, the one that started it all. We'll see.

To save the obtained depth map when you run MiDaS v2.1, you need to put the following:

fig = plt.imshow(output)
plt.axis('off')
fig.axes.get_xaxis().set_visible(False)
fig.axes.get_yaxis().set_visible(False)
plt.savefig('depthmap.png', bbox_inches='tight', pad_inches = 0)

right after:

plt.imshow(output)
# plt.show()

Use Ctrl-C Ctrl-V to copy paste inside the cell. When you click on the "Files" icon on the left, the saved depthmap should be there. If it's not there, click on the "Refresh" tab just above. To download, simply click on the dots to the right of the name. Be aware that the depth map that MiDaS generates is small (but at the right aspect ratio). This is because MiDaS downsamples the input image to match the sizes of the images in the training sets. So, you will need to resize it yourself so that it matches the size of the input image. You can do that in Gimp or Photoshop quite easily. To make a "Facebook 3d" post, assuming your photo is called "photo.jpg", you need to rename the depth map as "photo_depth.jpg" but you probably know that. You can feed the depth map to Facebook as is, that is, as a rgb heat map. If you prefer grayscale depth maps, you can use gimp or photoshop to switch mode from rgb to grayscale with no ill effect (I think).

MiDaS is getting better with the release of version 3. In the following video, I compare depth maps obtained by MiDaS v3 and Adobe Photoshop Neural Depth Blur filter:

15 comments:

Christuus Gnosis (C.Taylor)January 14, 2021 at 12:13 AM
Ugo
I am off topic of your post here
I wish to thank you for your depth viewer software
you really should make it into a desktop application
you may not realize, many people are into fractal programs like mb3d which output depth maps
I have meen looking for a viewer, I couldn't find one
can you make it run local?
I can run a web server on my machine easy to run it local
its useful
check "mandelbulb maniacs" on facecrack and you'll see what i mean
they would use it for sure
I just did with the prog output.
ReplyDelete
Replies
neotangoJanuary 25, 2021 at 9:39 PM
Dear Ugo,
I really appreciated for your works.
The results of dmag are pretty awesome.
Actually, I have a questions for collaboration work of dmag and google AI depth map from single image.
I created depth maps from spm google AI from single image.
But it has quite wrong depth values in some cases.
I want to use them for your dmag11 input to refine and correct depth.
But, the result depth map of spm is jpeg format that has no alpha channel.
So, I can't use it for input depth to your dmag11.
I've tried adding alpha channel and set the threshold alpha (Layer->Transparency->Threshold) and saved ".png" file in GIMP.
But it didn't work.
Do you have any ideas to refine depth using your dmag?

Thanks
ReplyDelete
Replies
waniahJanuary 29, 2021 at 7:20 AM
this page allows you to easily use MiDaSv2 and other AI utilities

https://app.runwayml.com/home
ReplyDelete
Replies
waniahJanuary 29, 2021 at 7:24 AM
Such a question. Do you intend to work on the AI method, for example on an alternative scrapbook / code, or to improve the existing one. You can also use your existing software to improve the soil maps made with AI?
ReplyDelete
Replies
waniahJanuary 29, 2021 at 7:26 AM
Do you think about using your software to improve soil maps made by AI? If so, will there be a post and a tutorial about it
ReplyDelete
Replies
gunturbabuMay 14, 2021 at 3:36 AM
present this link not working
Downloading: "https://github.com/intel-isl/MiDaS/archive/master.zip" to /root/.cache/torch/hub/master.zip
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
in ()
1 import torch
----> 2 midas = torch.hub.load("intel-isl/MiDaS", "MiDaS")
3 midas.eval()

8 frames
/root/.cache/torch/hub/intel-isl_MiDaS_master/midas/vit.py in ()
1 import torch
2 import torch.nn as nn
----> 3 import timm
4 import types
5 import math

ModuleNotFoundError: No module named 'timm'

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
ReplyDelete
Replies
UnknownJune 2, 2021 at 7:19 AM
Following code automatically add suffix "_depth" with input file name so you do not need to rename every time.

Add:

import os

Right after:

import cv2
import torch
import urllib.request

Add following:

base = os.path.splitext(filename)[0]
fig = plt.imshow(output)
plt.axis('off')
fig.axes.get_xaxis().set_visible(False)
fig.axes.get_yaxis().set_visible(False)
plt.savefig(base + '_depth.png', bbox_inches='tight', pad_inches = 0)

Right after:

plt.imshow(output)
# plt.show()
ReplyDelete
Replies

Add comment

Pages

Wednesday, January 13, 2021

Getting depth maps from single images using Artificial Intelligence (AI)

15 comments: