3D Stereoscopic Photography: Non Photorealistic Rendering - Image Abstraction by Structure Adaptive Filtering

This post describes all the parameters that impact the rendered image in "Image Abstraction by Structure Adaptive Filtering" by Jan Eric Kyprianidis and Jürgen Döllner. Note that this paper was seriously influenced by "Real-Time Video Abstraction" by Holger Winnemöller, Sven C. Olsen, and Bruce Gooch. If you are a bit confused by the title of the paper, "Image Abstraction by Structure Adaptive Filtering", you are not the only one. In layman terms, it's simply a cartoon filter.

Overview of the method (picture comes from the paper cited above).

I know a picture is worth a thousand words but I think that a picture and a thousand words are worth even more. So, let's try to explain what the method does in words. The first thing it does is to estimate the local orientation aka Edge Tangent Flow (ETF). Then, the input photograph is abstracted using a bilateral filter which happens to be separated (direction 1 is along the gradient and direction 2 is along the normal to the gradient). This bilateral filter that is used to iteratively abstract the original photograph is called "Separated Orientation Aligned Bilateral Filter" or "Separated OABF" or just OABF. Once the input photo is abstracted a bit, the edges are extracted using Difference of Gaussians (DoG) (The DoG is computed along the gradient and then smoothed using a one-dimensional bilateral filter along the flow curves of the ETF). This edge extraction method is called "Separated Flow-based Difference of Gaussians" or "Separated FDoG" or just FDoG. The input photo is abstracted some more using the same technique that was used prior to the edge detection. To give that cartoonish cel shading look, the image is then color quantized (using the luminance). The edges that were detected earlier are blended into the abstracted/quantized image to give the fully rendered image.

There are 2 parameters to control the number of iterations in "Separated OABF":
- n_e. That's the number of iterations before edges are detected. Kyprianidis uses n_e = 1 while Winnemöller uses n_e = 1 or 2.

- n_a. That's the total number of iterations (before and after edges are detected).

Step 1: Local Orientation Estimation

This is to establish the Edge Tangent Flow (ETF) vector field. See Non Photorealistic Rendering - Edge Tangent Flow (ETF).

Parameters used:
- Variance of the Gaussian used to blur the structure tensors. What value should be used is not really discussed in the paper.

Step 2: Separated Orientation-aligned Bilateral Filter

See Non Photorealistic Rendering - Separated Orientation-Aligned Bilateral Filter (Separated OABF).

Parameters used:
- sigma_d. That's the variance of the spatial Gaussian function that's part of the bilateral filter. Both Kyprianidis and Winnemöller use sigma_d = 3.0.
- sigma_r. That's the variance of the color Gaussian function that's part of the bilateral filter. Both Kyprianidis and Winnemöller use sigma_r = 4.25.

For this step, the number of iterations used is the number of iterations before the edges are detected, that is, n_e.

Step 3: Separated Flow-based Difference-of-Gaussians Filter (Separated FDoG)

See Non Photorealistic Rendering - Separated Flow-based Difference of Gaussians (Separated FDoG).

Parameters used for the DoG filter that is applied in the gradient direction:
- sigma_e. That's the variance of the spatial Gaussian. The variance of the other spatial Gaussian is set to 1.6*sigma_e so that DoG approximates LoG (Laplacian of Gaussians). Larger values of sigma_e produce edges that are thicker. Smaller values produce edges that are thinner. Kyprianidis uses sigma_e = 1.0. Don't know about Winnemöller.
- tau. That's the sensitivity of edge detection. For smaller values, tau detects less noise but important edges may be missed. Kyprianidis uses tau = 0.99 while Winnemöller uses tau = 0.98.

Parameters used to smooth the edges in the direction of the flow curves:
- sigma_m. That's the variance of the Gaussian used to smooth the edges. The larger sigma_m is, the more smooth the edges will be. Kyprianidis uses sigma_m = 3.0. Don't know about Winnemöller.

Parameters used to threshold the edges:
- phi_e. Controls the sharpness of the edge output. Kyprianidis uses phi_e = 2.0 while Winnemöller uses phi_e between 0.75 and 5.0.

There is another parameter that can be used:
- n. That's the number of iterations of "Separated FDoG". Kyprianidis uses n = 1 most of the times. Don't know about Winnemöller.

In my humble opinion, the most important parameter when detecting the edges is sigma_e.

Step 4: Separated Orientation-aligned Bilateral Filter (2nd pass after the edges have been detected)

Parameters used:
- sigma_d. Same as before.
- sigma_r. Same as before.

For this step, the number of iterations used is the number of iterations that remain, that is, n_a-n_e.

Step 5: Color Quantization

See Non Photorealistic Rendering - Pseudo-Quantization.

Parameters used:
- phi_q. Controls the softness of the quantization. Winnemöller uses phi_q = 3.0 to 14.0.
- quant_levels. That's the number of levels used to quantize the luminance. Winnemöller uses quant_levels = 8 to 10.

Here's an example:

Input RGB image.

Output blended image.

Parameters used:
tensor_sigma = 3
n_e = 1
n_a = 1
sigma_d = 3
sigma_r = 4.25
fdog_n = 2
fdog_sigma_e = 1
fdog_tau = 0.99
fdog_sigma_m = 3
fdog_phi = 2
phi_q = 3
quant_levels = 8

Here's a quick video:

At the moment, the software is sitting on my linux box but it is not available for download. If you like this cartoon rendering, feel free to send me your photographs and it will be my pleasure to cartoonify them for you.

3D Stereoscopic Photography

Pages

Tuesday, May 15, 2018

Non Photorealistic Rendering - Image Abstraction by Structure Adaptive Filtering

2 comments: