We decided to implement A Neural Algorithm of Artistic Style by Gatys et. al. This paper provides a method for extracting the “style” from one image and the “content” from another image and forming a new image with the extracted content and style. Previously, it was not known if the representations of content and style of images were truly separable using Convolutional Neural Networks.
The paper treats the problem of isolating a picture’s content and style as an optimisation problem. A white noise image is the starting point, and is made more and more similar to the desired merged image through the optimization of a cost function. This was a novel way of tackling the problem of style transfer as instead of updating the weights and biases (as in a traditional CNN), they are kept constant and the pixels are modified instead.
Neural style transfer can generate images in the styles of famous painters. In the past, creating images in the style of another painter required a highly skilled artist and a lot of time. Many commercial applications such as the mobile application Prisma and the web application Ostagram have monetized the appeal of neural style transfer as a simple way of allowing users to create art. There are several potential applications for this technology. For example, neural style transfer could be used in the production of animated movies. Creating an animation requires about 24 frames per second, which are usually painted or drawn. Neural style transfer could be used to automatically stylise the frames into a specific animation style quickly, as was the case in the production of the short film Come Swim in 2017.
Gatys et al.’s work in 2015 led to a series of advances in Neural Style algorithms. These algorithms can be broadly divided into Image Optimisation Based Online Neural Methods (IOB-NST) and Model Optimisation Based Offline Neural Methods (MOB-NST). Gatys et al.’s method does not perform well in preserving the fine structure and details of the input content image. This method also does not take into account the different depth information and low level information in the input content image.
IOB-NST algorithms (such as Gatys et al.’s algorithm) first model and extract style and content information, recombine them as the target representation and then iteratively reconstruct a stylized image that closely matches the target representation. These algorithms are very computationally expensive due to the iterative image optimisation procedure.
Risser et al found that the usage of a Gram matrix introduces instabilities during optimisation due to the fact that feature activations with different means and variances can still have the same Gram matrix. Risser et al. introduces the concept of extra histogram loss which forces the optimisations to match the entire histogram of feature activations. This results in a more stable Neural Style transfer with fewer parameter tunings and iterations. This however fails to fix the other weaknesses of Gatys et al.’s work: namely the lack of consideration towards the depth and low level information of the input content image.
Li et al introduces an additional Laplacian loss (which is usually used for edge detection) to incorporate constraints upon low level features in the pixel space. This algorithm shows better performance is preserving finer details and structures but fails to improve on Gatys et al.’s algorithm when it comes to considering semantics and depth variations.
Li and Wand used Markov Random Fields to preserve fine structure and arrangement better. This algorithm is great for photorealistic styles but fails when the content and style images have strong differences. MOB-NST uses Model Optimisation Based Offline Image Reconstruction to reconstruct the stylised result by optimising a feed forward network over a large set of style images.
Ulyanov et al. use a per style per model by pre training a feed forward style specific network to produce a stylised result with a single forward pass (which results in real time transfer). Ulyanov et al. use normalization to every single image to improve stylisation quality.
There are thus many algorithms that try to improve Neural Style Transfer. Different algorithms produce different results (different stroke sizes, different levels of abstractness, etc.). However, the quality of the output image is ultimately subjective and one should choose an algorithm that pertains to ones intended use case.
We are planning on re-implementing an existing solution proposed by Gatys et al. with some changes and additions. We are planning on implementing the solution in the following way:
The output images are subjective and thus there is no definitive criterion for evaluating how good the output images are. However, we are planning on presenting a wide range of output images with varying parameters such as no. of iterations, content weights and style weights, for a given input image. These options would be built into our GUI so that the user can choose the output image subjectively.
If we manage to successfully produce an alternate model with a different image classifier as mentioned above, we could provide a comparison between the output images produced by both the models with the same parameters. We would further, present standardized benchmark comparisons such as memory footprint and time taken for the model to produce an acceptable/similar image to further evaluate both the models. However, what constitutes as an acceptable image produced by both the models is also subjective.
Our current progress involves the following:
Start of November: Choose between Kivy and PyGUI for our GUI library
Mid November: Finish writing most of the python code and start writing code for the GUI
End of November: See if its possible to extend the algorithm to videos and compare performance of VGG 19 classifier with Inception and ResNet