Automatic depth map generation, stereo matching, multi-view stereo, Structure from Motion (SfM), photogrammetry, 2d to 3d conversion, etc. Check the "3D Software" tab for my free 3d software. Turn photos into paintings like impasto oil paintings, cel shaded cartoons, or watercolors. Check the "Painting Software" tab for my image-based painting software. Problems running my software? Send me your input data and I will do it for you.
As you probably know, when you have an image and its associated depth map, whenever the point of view changes, areas in the background get disoccluded, that is, they become visible. If you are a fan of Facebook 3d photos, you may have observed that these disoccluded areas get blurred. Some people (not I) are not too keen on this effect and would prefer to see the background magically appears out of thin air. Well, apparently, AI (Artifical Intelligence) can take care of that. So, not only can AI generate depth maps from single images, it can also fill the disoccluded areas. Pretty neat, I must say, if the results are up to the hype.
This paper: "3D Photography using Context-aware Layered Depth Inpainting" by Meng-Li Shih promises that inpainting can be done realistically with AI. There's a Google Colab for it, which means we can check it out right there in the browser thanks to Google wthout installing anything and without the need for a gpu card. In the google colab implementation, they use MiDaS to get a depth map from a given reference image and then do extreme inpainting using AI. The output of 3d photo inpainting is the MiDaS depth map, a point cloud of the 3d scene and four videos that kinda show off the inpainting (2 of the zoom type a la Ken Burns and two of the wiggle/wobble type). To visualize the point cloud which is in the ply format, you can use Meshlab or CloudCompare (preferred). Note that the depth map doesn't need to be coming from MiDaS, you can certainly use your own depth map (although you may have to blur it).
Here's a video that explains how to run the Google Colab python notebook. First, I let the software use MiDaS to create the depth map. Then, I bypass MiDaS and use my own depth map which I created with SPM:
If you use your own depth map, make sure that it is grayscale and that it is smooth enough. If your depth map is not smooth, it's going to take forever and google colab might disconnect you before the videos are created. I explain all that in the video.
We all know that MiDaS can create great depth maps from single images. Check this post if you are not yet convinced: Getting depth maps from single images using Artificial Intelligence (AI). It's the inpainting we were not too sure about... until now. I've gotta say that the filling of occlusions looks quite realistic even when the point of view changes drastically. That AI is really doing wonders and it will only get better as the data sets used to train the neural networks get bigger.
Is it possible in this day and age to get good, usable depth maps from monocular images using Artificial Intelligence (AI)?
Well, I think it's getting there today. And as the number of data sets used to train neural networks increase, results can only get better.
First, there was Google's "Mannequin Challenge" where the data set was taken from thousands of youtube videos featuring the now forgotten "mannequin challenge" where people freeze while the camera turns around them. Google must have spent a lot of time creating the data set since they had to recreate the full 3d scene for each video (to get the depth map).
Here's a video I made on how to use Google's "Mannequin Challenge" 2d to 3d conversion software on google colab:
I think the results are ok in terms of segmentation, that is, objects are properly detected. But the results are not that great in terms of depth, meaning that they are often at incorrect depths. Good effort though.
Here's a video where I put MiDaS to the test on google colab:
In my opinion, the results are very impressive, much better than google's "Mannequin Challenge". Probably because the data set used to train their neural network was much larger.
Let's dig in a little bit more into that MiDaS v2.1. The paper is available here: "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer" by Ranftl et al. So, they are saying that they have used 3d movies to train their model. Cool but how did they get the depth maps for those 3d movie dual screenshots? The answer is: "PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume" by Sun et al. Now, I have been a little bit out of the loop regarding the latest trends in automatic depth map generation from a stereo pair but this is quite interesting. The main idea behind this particular depth map generation technique is optical flow, which is exactly what is used in the first automatic depth map generator I wrote, the original DMAG (Depth Map Automatic Generator). How cool is that! Of course, DMAG doesn't have any CNN (Convolutional Neural Network) involved but that is interesting nonetheless. Back in the day, I (and other early adopters of my software) quickly dismissed DMAG as a viable depth map generator because it could not handle small and/or fast moving objects. The good thing about DMAG though is that you don't need to input the disparity range and the produced depth maps are very smooth. I think I should probably spend some time figuring out how the CNN plays a role in that new implementation of optical flow but that's gonna be for another post, I think.
Now, I came across this idea of coupling optical flow and CNNs somewhere else. Here, on this website: KeystoneDepth. What they did is take thousands of old stereocards and compute their depth maps. Now, guess what was used to compute the depth maps? Yes, an optical flow method coupled with a CNN. The paper that explains the process is here: KeystoneDepth : History in 3D. We learn that they use FlowNet2 to compute the depth maps. FlowNet2 is described in this paper: "Flownet 2.0: Evolution of optical flow estimation with deep networks" by Ilg et al. It's cool to see that automatic depth map generation using optical flow is not dead, actually far from it. Makes me wanna go back into the original DMAG, the one that started it all. We'll see.
To save the obtained depth map when you run MiDaS v2.1, you need to put the following:
Use Ctrl-C Ctrl-V to copy paste inside the cell. When you click on the "Files" icon on the left, the saved depthmap should be there. If it's not there, click on the "Refresh" tab just above. To download, simply click on the dots to the right of the name. Be aware that the depth map that MiDaS generates is small (but at the right aspect ratio). This is because MiDaS downsamples the input image to match the sizes of the images in the training sets. So, you will need to resize it yourself so that it matches the size of the input image. You can do that in Gimp or Photoshop quite easily. To make a "Facebook 3d" post, assuming your photo is called "photo.jpg", you need to rename the depth map as "photo_depth.jpg" but you probably know that. You can feed the depth map to Facebook as is, that is, as a rgb heat map. If you prefer grayscale depth maps, you can use gimp or photoshop to switch mode from rgb to grayscale with no ill effect (I think).
MiDaS is getting better with the release of version 3. In the following video, I compare depth maps obtained by MiDaS v3 and Adobe Photoshop Neural Depth Blur filter:
It's relatively easy to create a depth map in SPM if you have a stereo pair. Once you have the depth map, it's child's play to make a facebook 3d post. The stereo pair can be an MPO file coming from a Fuji W3 camera, a scanned stereocard from the 1900s, a scanned 1950s Realist format stereo slide, etc. Here I am focusing on scanned Realist slides mainly because they are amazing, usually.
First thing to do is to separate the left and right images. That's easy enough to do in Gimp using the "Rectangle Select" tool.
Once you have the left and right chips, they can be loaded into StereoPhoto Maker (SPM) and this is where the magic should happen. First thing to do is to align the two images using "adjust->auto alignment". Make sure that you have selected " Better Precision (Slow)" in "Edit->Preferences" (tab: Adjust) prior. It's time to generate the depth map by clicking on "Edit->Depth Map->Create depth map from stereo pair". Do not click on "Get values automatic" to get the background and foreground values, use the arrow keys. Do not let SPM resize your images by making sure your image size is smaller than the "maximum image width" (default = 3000). If unsure, just put a super large number in the box for "maximum image width". If your image size is about 1,000 pixels in width, the default values should be good except maybe the radius for DMAG5 which can be changed from 16 to 32.
The depth map you get should be ok but most likely far from being perfect. Click on "Edit->Depth Map->Correct depth map" to access the very useful depth map correction tool. It's very simple to use. Keep "Ctrl" pressed and use right-click to select a color in the depth map and left-click to paint the depth map.
To get rid of outliers, you may want to go into Gimp and use Filters->Blur->Median Blur. You can also use Filters->Blur->Selective Gaussian Blur to smooth the depth map without losing too much definition.
Here's a video that shows how I get a proper depth map in SPM from a scanned stereo slide:
Here's another video where I spend much more time on depth map correction using the SPM depth map correction tool: