Saturday, February 6, 2021

3d photo inpainting using Artificial Intelligence (AI)

As you probably know, when you have an image and its associated depth map, whenever the point of view changes, areas in the background get disoccluded, that is, they become visible. If you are a fan of Facebook 3d photos, you may have observed that these disoccluded areas get blurred. Some people (not I) are not too keen on this effect and would prefer to see the background magically appears out of thin air. Well, apparently, AI (Artifical Intelligence) can take care of that. So, not only can AI generate depth maps from single images, it can also fill the disoccluded areas. Pretty neat, I must say, if the results are up to the hype.

This paper: "3D Photography using Context-aware Layered Depth Inpainting" by Meng-Li Shih promises that inpainting can be done realistically with AI. There's a Google Colab for it, which means we can check it out right there in the browser thanks to Google wthout installing anything and without the need for a gpu card. In the google colab implementation, they use MiDaS to get a depth map from a given reference image and then do extreme inpainting using AI. The output of 3d photo inpainting is the MiDaS depth map, a point cloud of the 3d scene and four videos that kinda show off the inpainting (2 of the zoom type a la Ken Burns and two of the wiggle/wobble type). To visualize the point cloud which is in the ply format, you can use Meshlab or CloudCompare (preferred). Note that the depth map doesn't need to be coming from MiDaS, you can certainly use your own depth map (although you may have to blur it).

Here's a video that explains how to run the Google Colab python notebook. First, I let the software use MiDaS to create the depth map. Then, I bypass MiDaS and use my own depth map which I created with SPM:



If you use your own depth map, make sure that it is grayscale and that it is smooth enough. If your depth map is not smooth, it's going to take forever and google colab might disconnect you before the videos are created. I explain all that in the video.

We all know that MiDaS can create great depth maps from single images. Check this post if you are not yet convinced: Getting depth maps from single images using Artificial Intelligence (AI). It's the inpainting we were not too sure about... until now. I've gotta say that the filling of occlusions looks quite realistic even when the point of view changes drastically. That AI is really doing wonders and it will only get better as the data sets used to train the neural networks get bigger.

4 comments:

  1. I like your post it looks very interesting so keep posting in future

    ReplyDelete
  2. Hi, we love your work. I was suddenly wondering : what if we exctract a stereopair from a video ?? Micheal Brown from youtube made me realise how cheap the Ai depth estimation was compared to a real stereo pair. meanwhile , most of what we have ib picture now we have also in video , so wouldn't be possible to extract the steropair from a video somehow ?

    ReplyDelete
    Replies
    1. if the subject is static, you can extract pretty much any 2 images from a video as long as it is pointed at the subject. You align/rectify the 2 images using something like ER9b and that gives you a stereo pair. It's a bit like photogrammetry but restricted to 2 just images.

      Delete
  3. This is really an awsome blog.Thank you for your time and effort. Keep posting.

    ReplyDelete