Artistic style transfer

(Disclaimer: Besides for the picture of myself, I don’t own any of the images or artwork shown in this post.)

I recently implemented this paper on artistic style transfer in Python, using Caffe to perform the neural net operations. Dylan Paiton, who worked with the Flickr Vision team as a summer intern, benchmarked the code and committed several nice optimizations as well.

The premise behind the paper is quite simple – it intends to take the style of one image (preferably sketch, painting, or drawing) – and transfer it to another. I was pleasantly surprised by the quality of the results, especially given how difficult of a problem this is. This page contains a few more artistic style transfer examples, and shows some of the features of the code. To fully understand the paper (and the rest of this post), it’s probably good to have a little bit of knowledge in the realm of deep learning, with specific application to convolutional neural networks. If you’re interested in trying the code out for yourself, you can download it at here, along with instructions on how to use it as well as a small set of initial examples.

Changing the style image
Let’s start off with a few artistic styles. The target artwork is (in order of from top to bottom) 1) American Gothic by Grant Wood, 2) Impression, Sunrise by Claude Monet, 3) Rain Princess, by Leonid Afremov, 4) The Scream, by Edvard Munch, and 5) The Persistence of Memory by Salvador Dali. I used the VGG model with a style-to-content ratio of 1e5 and cut off the scalar loss optimization at 500 iterations for each of these examples.

Content Content
American Gothic Output
Impression, Sunrise Output
Rain Princess Output
The Scream Output
Persistence of Memory Output

As expected, different style images generate vastly different results.

Number of iterations
Next, let’s take a look at the effect of modifying the maximum number of iterations performed. If you’re concerned about runtime, this option could be very important for you, since each iteration under the VGG model is quite expensive (especially when run on the CPU). For each of the output images below, I used 50, 100, 200, 400, and 800 iterations respectively. The style image is Picasso’s 1907 self-portrait.

Style and content Style Content
Iteration 50 Iteration 50
Iteration 100 Iteration 50
Iteration 200 Iteration 200
Iteration 400 Iteration 400
Iteration 800 Iteration 800

As with nearly all scalar and vector minimization algorithms, you’ll get diminishing returns as the number of iterations goes up. This effect remains evident here.

Model used
The model used can drastically change the way the results look as well. Currently, the style-transfer code supports three different models: AlexNet1, GoogleNet, and VGG. I let the L-BFGS optimizer run to completion in each case.

Style and content Style Content
AlexNet AlexNet
GoogleNet GoogleNet
VGG VGG

VGG is an extremely wide and deep model. Each convolutional layer has relatively large kernels and a small stride, enabling the convnet to capture an extremely thorough style representation. As a result, VGG-based output images are arguably the “best” in terms of look. The results generated using CaffeNet appear quite awful in comparison, but its runtime is extremely low; optimization took a total of around 30 seconds on my NVidia GeForce GT 750M GPU with AlexNet. GoogleNet’s results probably lie somewhere in the middle. Although it has fewer total weights than AlexNet, it is nonetheless an extremely deep network.

More examples
I’ll continue to update this page with examples as I add functionality to the code.

1Here, AlexNet actually refers to CaffeNet, which is a slightly modified version of AlexNet used by BVLC as a reference convnet in many of their public examples and benchmarks.