Monday, November 25, 2019

Thoughts about how to keep the original colors of the content image when using Gatys' Neural Style

In this post, I am gonna present three ways to preserve the colors of the original image when using Gatys' Neural Style algorithm.

Here are the content and style images:


Content image.


Style image.

Hmm, I didn't even realize there was a watermark on the style image. Should really have been cropped out but it doesn't really matter.

Experiment 1:

Let's use the content and style images as is and ask Gatys' Neural Style (Justin Johnson's implementation on github) to use the original colors (of the content image). Note that I am using a weight for the content image equal to 1. and a weight for the style image equal to 5. The rest is pretty standard.

Here are the Neural Style parameters used:

#!/bin/bash
th ../../neural_style.lua \
-style_image style_image.jpg \
-style_blend_weights nil \
-content_image content_image.jpg \
-image_size 512 \
-gpu -1 \
-content_weight 1. \
-style_weight 5. \
-tv_weight 1e-5 \
-num_iterations 1000 \
-normalize_gradients \
-init image \
-init_image content_image.jpg \
-optimizer lbfgs \
-learning_rate 1e1 \
-lbfgs_num_correction 0 \
-print_iter 50 \
-save_iter 100 \
-output_image new_content_image.jpg \
-style_scale 1.0 \
-original_colors 1 \
-pooling max \
-proto_file ../../models/VGG_ILSVRC_19_layers_deploy.prototxt \
-model_file ../../models/VGG_ILSVRC_19_layers.caffemodel \
-backend nn \
-seed -1 \
-content_layers relu4_2 \
-style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1


Resulting image (512 pixels).

Yeah, it's not that great. I think we can do better.

Experiment 2:

Let's color match the style image to the content image and ask Gatys' Neural Style not to use the original colors. Note that I am using a weight for the content image equal to 1. and a weight for the style image equal to 5. The rest is pretty standard.

To color match the style image to the content image, I use my own software called thecolormatcher. It's very basic stuff.


New style image (color matched to the content image).

Here are the Neural Style parameters used:

#!/bin/bash
th ../../neural_style.lua \
-style_image new_style_image.jpg \
-style_blend_weights nil \
-content_image content_image.jpg \
-image_size 512 \
-gpu -1 \
-content_weight 1. \
-style_weight 5. \
-tv_weight 1e-5 \
-num_iterations 1000 \
-normalize_gradients \
-init image \
-init_image content_image.jpg \
-optimizer lbfgs \
-learning_rate 1e1 \
-lbfgs_num_correction 0 \
-print_iter 50 \
-save_iter 100 \
-output_image new_content_image.jpg \
-style_scale 1.0 \
-original_colors 0 \
-pooling max \
-proto_file ../../models/VGG_ILSVRC_19_layers_deploy.prototxt \
-model_file ../../models/VGG_ILSVRC_19_layers.caffemodel \
-backend nn \
-seed -1 \
-content_layers relu4_2 \
-style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1


Resulting image (512 pixels).


Resulting image (768 pixels).

Let's change the weight for the style image from 5. to 20. in order to get more of the texture of the style image in the resulting image.

Here are the Neural Style parameters used:

#!/bin/bash
th ../../neural_style.lua \
-style_image new_style_image.jpg \
-style_blend_weights nil \
-content_image content_image.jpg \
-image_size 512 \
-gpu -1 \
-content_weight 1. \
-style_weight 20. \
-tv_weight 1e-5 \
-num_iterations 1000 \
-normalize_gradients \
-init image \
-init_image content_image.jpg \
-optimizer lbfgs \
-learning_rate 1e1 \
-lbfgs_num_correction 0 \
-print_iter 50 \
-save_iter 100 \
-output_image new_content_image.jpg \
-style_scale 1.0 \
-original_colors 0 \
-pooling max \
-proto_file ../../models/VGG_ILSVRC_19_layers_deploy.prototxt \
-model_file ../../models/VGG_ILSVRC_19_layers.caffemodel \
-backend nn \
-seed -1 \
-content_layers relu4_2 \
-style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1


Resulting image (512 pixels).


Resulting image (768 pixels).

Yeah, this method is pretty good at keeping the original colors but there are some colors that are lost, like the red on the lips.

Experiment 3:

Let's create luminance images for both the content image and the style image and ask Gatys' Neural Style not to use the original colors. Note that I am using a weight for the content image equal to 1. and a weight for the style image equal to 5. The rest is pretty standard.

To extract the luminance of the content image and style image, I use my own software called theluminanceextracter. To inject back the resulting luminance image into the content image, I also use my own software called theluminanceswapper. It's very basic stuff.


Content luminance image.


Style luminance image.

Here are the Neural Style parameters used:

#!/bin/bash
th ../../neural_style.lua \
-style_image style_luminance_image.jpg \
-style_blend_weights nil \
-content_image content_luminance_image.jpg \
-image_size 512 \
-gpu -1 \
-content_weight 1. \
-style_weight 5. \
-tv_weight 1e-5 \
-num_iterations 1000 \
-normalize_gradients \
-init image \
-init_image content_luminance_image.jpg \
-optimizer lbfgs \
-learning_rate 1e1 \
-lbfgs_num_correction 0 \
-print_iter 50 \
-save_iter 100 \
-output_image new_content_luminance_image.jpg \
-style_scale 1.0 \
-original_colors 0 \
-pooling max \
-proto_file ../../models/VGG_ILSVRC_19_layers_deploy.prototxt \
-model_file ../../models/VGG_ILSVRC_19_layers.caffemodel \
-backend nn \
-seed -1 \
-content_layers relu4_2 \
-style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1


Resulting luminance image (512 pixels).


Resulting image (512 pixels).


Resulting luminance image (768 pixels).


Resulting image (768 pixels).

Let's change the weight for the style image from 5. to 20. in order to get more of the texture of the style image in the resulting image.

Here are the Neural Style parameters used:

#!/bin/bash
th ../../neural_style.lua \
-style_image style_luminance_image.jpg \
-style_blend_weights nil \
-content_image content_luminance_image.jpg \
-image_size 512 \
-gpu -1 \
-content_weight 1. \
-style_weight 20. \
-tv_weight 1e-5 \
-num_iterations 1000 \
-normalize_gradients \
-init image \
-init_image content_luminance_image.jpg \
-optimizer lbfgs \
-learning_rate 1e1 \
-lbfgs_num_correction 0 \
-print_iter 50 \
-save_iter 100 \
-output_image new_content_luminance_image.jpg \
-style_scale 1.0 \
-original_colors 0 \
-pooling max \
-proto_file ../../models/VGG_ILSVRC_19_layers_deploy.prototxt \
-model_file ../../models/VGG_ILSVRC_19_layers.caffemodel \
-backend nn \
-seed -1 \
-content_layers relu4_2 \
-style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1


Resulting luminance image (512 pixels).


Resulting image (512 pixels).


Resulting luminance image (768 pixels).


Resulting image (768 pixels).

Yeah, this is probably the best method to preserve the original colors of the image. Method 2 is pretty good as well and probably easier to use as you don't have to bother with getting/swapping the luminance images.

If you are interested in trying to understand how Justin Johnson's Torch implementation of Gatys' Neural Style on github works, I made a post about it: Justin Johnson's Neural Style Torch Implementation Explained.

Monday, November 4, 2019

2d to 3d Image Conversion Software - the3dconverter2

The3dconverter2 is an attempt at making easier the conversion of a 2d image into a 3d image via the creation of a dense depthmap. I know there is already 2d to 3d Image Conversion Software - The 3d Converter but I think the idea of relative depth constraints is probably unnecessary, which is why the3dconverter2 does not have them. Unlike DMAG4 and DMAG11 which limit the propagation of assigned depths within to areas of similar colors, the3dconverter2 allows for totally free propagation of assigned depths. This means that the only way to prevent depths from propagating over object boundaries (at different depths) is to use a so-called "edge image" (also used by DMAG4).

After you have extracted the3dconverter2-x64.rar, you should see a directory called "joker". It contains everything you need to check that the software is running correctly on your machine (windows 64 bit). The file called "the3dconverter2.bat" is the one you double-click to run the software. It tells windows where the executable is. You need to edit the file so that the location of the executable is the correct one for your installation. I use notepad++ to edit text files but I am sure you can use notepad or wordpad with equal success. If you want to use the3dconverter2 on your own photograph, create a directory alongside "joker" (in another words, at the same level as "joker"), copy "the3dconverter2.bat" and "the3dconverter2_input.txt" from "joker" to that new directory and double-click "the3dconverter2.bat" after you have properly created all the input files the3dconverter2 needs.

The file "joker.xcf" is a gimp file that contains (as layers) content_image.png (the reference 2d image) and edge_rgba_image.png (the so-called edge image). It also contains the gimp paths for the depth constraints.

The input file "the3dconverter2_input.txt" contains the following lines:
content_image.png
sparse_depthmap_rgba_image.png
dense_depthmap_image.png
gimp_paths.svg
diffusion_direction_rgba_image.png
edge_rgba_image.png

Line 1 contains the name of the content rgb image. The file must exist.

Line 2 contains the name of the sparse depthmap rgba image. The file may or may not exist.

Line 3 contains the name of the dense depthmap image. The file will be created by the3dconverter2.

Line 4 contains the name of the gimp paths giving depth constraints. If the file does not exist, the3dconverter2 assumes there are no depth constraints given as gimp paths. In this case, there should be a sparse depthmap rgba image given.

Line 5 contains the name of the diffusion direction rgba image. If the file doesn't exist, the3dconverter2 assumes there are no regions where the diffusion direction is given. This is something I was working on but you can safely ignore this capability.

Line 6 contains the name of the edge rgba image. If the file doesn't exist, the3dconverter assumes there is no edge image.

The content rgb image is of course the image for which a dense depthmap is to be created. It is recommended not to use mega-large images as it may put an heavy strain on the solver (memory-wise). I always recommend starting small. Here, the reference rgb image is only 600 pixels wide. If the goal is to create 3d wiggles/wobbles or facebook 3d photos, it is more than enough. If you see that your machine can handle it (it doesn't take too long to solve and the memory usage is reasonable), then you can switch to larger reference rgb images.


Content rgb image.

The sparse depthmap rgba image is created just like with Depth Map Automatic Generator 4 (DMAG4) or Depth Map Automatic Generator 11 (DMAG11). You use the "Pencil Tool" (no anti-aliasing) to draw depth clues on a transparent layer. Depths can vary from pure black (deepest background) to pure white (closest foreground). When you zoom, you should see either fully transparent pixels or fully opaque pixels because, hopefully, all the tools you have used do not create anti-aliasing. If, for some reason, there are semi-transparent pixels in your sparse depthmap, click on Layer->Transparency->Threshold Alpha and then press Ok to remove the semi-transparent pixels. Note that you absolutely do not need to create a sparse depthmap rgba image to get the3dconverter2 going.

Gimp paths are used to impose constraints on pixel depths. You use gimp paths to indicate that one pixel should be at a specified depth. If you don't give gimp paths (no gimp paths file present), the3dconverter2 relies solely on the sparse depth map (and possibly the edge image if you provide one) to generate the dense depth map and behaves just like Depth Map Automatic Generator 4 (DMAG4). The name of a gimp path stores the depth at which the pixels making up the gimp path should be. The name of a gimp path must start with 3 digits ranging from 000 (pure black - furthest) to 255 (pure white - closest). For example, if the name of a gimp path is "146-blah blah blah", the pixels that make up the gimp path are supposed to be at a depth equal to 146. What is cool about this system is that it is quite simple to change the depth for a given gimp path: simply change the first 3 digits. To be perfectly clear, all pixels that make up a gimp path receive the depth assigned to the gimp path, not just the starting and ending pixel of each gimp path component.

When you are working with gimp paths, you should always have the Paths dialog window within easy reach. To get to it, click on Windows->Dockable Dialogs->Paths. Once you have created a gimp path and are done with it, it will show up in the Paths dialog window as "Unnamed" but will not be visible. To make it visible, just do the same thing you would do for layers (click to the left of the name until the eye shows up). After you create a gimp path (using the "Paths Tool"), gimp automatically names it "Unnamed". You can assign a depth to the gimp path by double-clicking on the path name and changing it to whatever depth you want the path to be.

To save the gimp paths, right-click the active path and select "Export Gimp Path". Make sure to select "Export all paths from this image" in the drop-down menu before saving. This will save all gimp paths into a single SVG file, eg, "gimp_paths.svg".

It's probably a good idea at this point to explain how to create/edit paths in gimp:

To create a new path, click on "Paths Tool". In the " Tool Options", check the " Polygonal" box as we will only need to create paths that are polygonal (no Bezier curves needed here). Click (left) to create the first anchor.

To add an anchor, click (left). To close a path (you really don't need to do that here but it doesn't hurt to know), press CTRL and hover the cursor over the starting anchor and click (left).

To start a new path component (you really don't need to do that here but it doesn't hurt to know), pres SHIFT and click (left).

To move an anchor, hover the cursor over an existing anchor, click (left) and drag.

To insert an anchor, press CTRL and click (left) anywhere on the segment betwen two existing anchors.

To delete an anchor, press SHIFT+CTRL and click (left) on the anchor you want to delete.

The edge rgba image tells the3dconverted not to diffuse the depths past the pixels that are not transparent in the edge image. It is similar to the edge image used in dmag4. To create an edge rgba image, create a new layer and trace along the object boundaries in the reference image. An object boundary is basically a boundary between two objects at different depths. To trace the edge image, I recommend using the " Pencil Tool" with a hard brush (no anti-aliasing) of size 1 (smallest possible). Press SHIFT and click (left) to draw line segments. I usually use the color "red" for my edge images but you can use whatever color you want. Make sure to always use tools that are not anti-aliased! If you need to use the "Eraser Tool", check the "Hard Edge" in the tool options. When you zoom on your edge image, it should look like crisp staircases.


Edge rgba image.

The white pixels are actually transparent (checkerboard pattern in gimp).

The following screen grab shows all the layers contained in content_image.xcf:


The following screen grab shows all the gimp paths contained in content_image.xcf:


As you can see, the names of the gimp paths all start with 3 digits ranging from 000 to 255. The name of the gimp path indicates its depth.

Once you are done with the editing in gimp, simply save the layer containing the sparse depthmap as sparse_depthmap_rgba_image.png (if you have one), the layer containing the edge image as edge_rgba_image.png, and the gimp paths as gimp_paths.svg. Double-click on "the3dconverter2.bat" and the3dconverter2 should spit out the dense depthmap in no time as dense_depthmap_image.png assuming nothing went wrong.

This is the dense depth map generated by the3dconverter2:


Dense depthmap image created by the3dconverter2.

To see if the dense depthmap is good, you can use Depth Player. To create a wiggle/wobble, you can use Wiggle Maker. If your goal is solely to create wiggles, the depth map really doesn't need to be too accurate and depths not in the immediate foreground really do not have to be too accurate (only worry about the foreground objects).


Wiggle/wobble created by Wiggle Maker.

Any problems? Send me your .xcf gimp file (zipped)! If the file is big, please use something like wetransfer to send it me or make it available via google drive.

I made a video on youtube that explains the whole 2d to 3d image conversion process from start to finish, kinda:


Here's another video tutorial that's much more in-depth but kinda assumes you already know how to install and run the3dconverter2:


Tips and tricks:
- I think it's a good idea to order the paths from foreground to background in the "Paths" dockable window. It's a little bit less confusing especially if you have tons of them.
- Use depthplayer to check your work often.

Download link (for windows 64 bit): the3dconverter2-x64.rar

Source code: the3dconverter2 on github.