Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cameron Smith - Neural Style Transfer for Image...

Avatar for gdg_rnd gdg_rnd
September 14, 2017

Cameron Smith - Neural Style Transfer for Images and Video in TensorFlow

Convolutional Neural Networks and Graphics Processing Units (GPU) have been at the core of an exciting paradigm shift in computer vision research that some researchers have called “the algorithmic perception revolution.” This talk presents a TensorFlow (Google's machine learning framework) implementation and a description of several algorithmic techniques for performing artistic style transfer using a neural network architecture trained for large-scale image recognition tasks. The neural algorithm separates and recombines the style and content of arbitrary images. During this talk we will see some strange images and gain insights into the deep learning algorithms behind the popular mobile app Prisma.

GDG South DevFest 2017

Avatar for gdg_rnd

gdg_rnd

September 14, 2017
Tweet

More Decks by gdg_rnd

Other Decks in Technology

Transcript

  1. Style Transfer for Images and Video with TensorFlow An introduction

    to neural style transfer using Google’s open-source software library for machine learning, TensorFlow.
  2. темы доклада Cameron Smith Cameron Smith is a research assistant

    at the Irish start-up Artomatix where he is using AI (deep learning), computer vision, and computer graphics to automate the creation of 3d assets for gaming, virtual reality, and film making. He completed a Bachelor of Science and Master of Science in Computer Science from The University of New Mexico. He also took classes in art and graphic design. Research Assistant, Artomatix
  3. Outline Здравствуйте About Me: • Research Assistant at Artomatix (artomatix.com)

    • BS/MS in Computer Science • Interested in the burgeoning relationships between computer graphics, computer vision, and machine learning Road Map: • Brief Introduction to Deep Learning for Computer Vision • Neural Algorithm for Artistic Style • Methods and Theory • Results • Current Work • Question & Answer Session 2
  4. Motivations for Deep Neural Networks A limitation of using “hand-crafted”

    features (e.g. SIFT, SURF, HOG) and machine learning classifiers (K-Nearest Neighbor, Support Vector Machine, and shallow neural networks) is that these approaches do not scale well to large data sets. 3
  5. Computer Vision and Deep Learning In 2012, a team of

    researchers at the University of Montreal won the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) by achieving a top-5 error rate of 15.5%. The result catalyzed research into convolutional neural networks for image recognition. 4
  6. Deep Neural Network Models There are several well-known deep neural

    networks trained on large data sets for state-of-the-art image recognition. • LeNet: The best known convolutional neural network for classifying handwritten digits. The network was developed in the 1980’s. • AlexNet: The winner of the ILSVRC challenge in 2012. The network popularized the use of convolutional neural networks in image recognition. • GoogLeNet: The winner of the ILSVRC challenge in 2014. The network substantially reduced the number of parameters. • ResNet: Very deep networks with residual blocks. Winner of ILSVRC challenge in 2015 in several categories: classification, detection, localization, detection, and segmentation. 5
  7. VGG Network • Architecture built and trained by the Visual

    Geometry Group (VGG) at Oxford University. • The runner-up of the ILSVRC challenge in 2014. • VGG-19 contains 19 weighted layers and requires millions of parameters. 6
  8. Methods and Implementations This talks presents the following: • Single-Image

    Style Transfer [Gatys et al.] • Multiple Style Transfer [Siegel et al.] • Texture and Photorealistic Style Transfer [Gatys et al.] • Weighted Style Transfer [Chan et al.] • Video Style Transfer [Ruder et al.] • Preservation of Color in Style Transfer [Gatys et al.] Additionally, we present an improved Weighted Style Transfer technique and an extension of the Preservation of Color technique. 7
  9. Methods: A Neural Algorithm for Artistic Style In September 2015,

    Gatys et al. published the paper “A Neural Algorithm for Artistic Style.” The paper presents a system that uses neural representations to separate and combine content and style of arbitrary images. conv1 conv2 conv4 conv5 Table 1: Style and Content Representations 8
  10. Methods: Key Insights The style transfer problem was formulated as

    an energy minimization problem consisting of a content loss and a style loss. The key insights behind the algorithm are: • Content Representation: The features extracted from the hidden layers in a CNN carry information about the content of an input image x. • Style Representation: The correlations between the features extracted from the hidden layers in a CNN carry information about the style of an input image x. 9
  11. Methods: Content Loss An image x is encoded in each

    layer of the CNN by the filter responses to the image. A layer with Nl distinct filters has Nl feature maps each of size Ml , where Ml is the height times the width of the feature map. 10
  12. Methods: Content Loss The responses in a layer l can

    be stored in a matrix Fl ∈ RNl×Ml where Fl i,j is the activation of the i-th filter at position j in layer l. Let p and x be the original image and the generated image. Let Pl and Xl be the image’s respective feature representation in layer l. The squared-error loss between the two feature representations is: Lcontent(p, x, l) = 1 2 ∑ i,j (Fl i,j − Pl i,j )2 The derivative of this loss with respect to the activation layer l equals: ∂Lcontent ∂Fl i,j = { (Fl − Pl)i,j if Fl i,j > 0 0 if Fl i,j < 0 11
  13. Methods: Content Loss TensorFlow code Math equation: Lcontent(p, x, l)

    = 1 2 ∑ i,j (Fl i,j − Pl i,j )2 TensorFlow code: def content_layer_loss(p, x): _, h, w, d = p.get_shape() M = h.value * w.value N = d.value K = 1. / 2. loss = K * tf.reduce_sum(tf.pow((x - p), 2)) return loss 12
  14. Methods: Style Loss The style representation computes the correlation between

    the different filter responses. The feature correlations are given by the Gram matrix Gl ∈ RNl×Ml where Gl i,j is the inner product between the vectorized feature map i and j in layer l: Gl i,j = ∑ k Fl i,k Fl j,k TensorFlow code: def gram_matrix(x, area, depth): F = tf.reshape(x, (area, depth)) G = tf.matmul(tf.transpose(F), F) return G 13
  15. Methods: Style Loss Let a and x be the original

    image and the generated image. Let Al and Gl be the image’s respective feature representation in layer l. The contribution of that layer to the total loss is: El = 1 4N2 l M2 l ∑ i,j (Gl i,j − Al i,j )2 and the total loss is: Lstyle(a, x) = L ∑ i=0 wi Ei The derivative of El with respect to the activation layer l equals: ∂El ∂Fl i,j =    1 4N2 l M2 l ((Fl)T(Gl − Al))j,i if Fl i,j > 0 0 if Fl i,j < 0 14
  16. Methods: Style Loss TensorFlow code Math equation: El = 1

    4N2 l M2 l ∑ i,j (Gl i,j − Al i,j )2 TensorFlow code: def style_layer_loss(a, x): _, h, w, d = a.get_shape() M = h.value * w.value N = d.value A = gram_matrix(a, M, N) G = gram_matrix(x, M, N) K = 1. / (4 * N**2 * M**2) loss = K * tf.reduce_sum(tf.pow((G - A), 2)) return loss 15
  17. Methods: Total Loss The new ‘stylized’ image is synthesized by

    jointly minimizing the distance of the feature representations of a white noise image. The combined style and content loss is a linear combination of the style and content losses: Ltotal(p, a, x, LC, LS) = αLcontent(p, x, LC) + βLstyle(a, x, LS) where α is the weighting of the content reconstruction, β is the weighting of the style reconstruction, LC is the set of layers used in the content representation, and LS is the set of layers used in the style representation. 16
  18. Methods: Total Loss Algorithm 1 Neural Algorithm for Artistic Style

    procedure Neural-Style(p, a, x, LC, LS, α, β) Lstyle(a, x) ← ∑ l∈LS wl 1 4N2 l M2 l ∑ i,j (Gl i,j − Al i,j )2 Lcontent(p, x) ← ∑ l∈LC wl 1 2 ∑ i,j (Fl i,j − Pl i,j )2 Ltotal(p, a, x, LC, LS) ← αLcontent(p, x, LC) + βLstyle(a, x, LS) while convergence criterion is not met do x ← Minimize(Ltotal) return x 17
  19. Methods: Neural Style Transfer in Video There are several different

    approaches: • Naive: Initialize gradient descent with white noise or the current frame. • Better: Initialize gradient descent with the previous stylized frame. First frame must be initialized with white noise. • Best: Initialize gradient descent with the previous stylized frame warped to the current frame using the optical flow between the pair of frames. Additionally, use loss functions for temporal consistency. 18
  20. Methods: Temporal Loss Functions The total loss function is the

    same as before but there is an additional loss function [5]. Short-term temporal loss: Lshortterm(p(i), a, x(i), LC, LS) = αLcontent(p, x(i), LC) + βLstyle(a, x(i), LS) + γLtemporal ( x(i), ωi i−1 (x(i−1)), c(i−1,i) ) Long-term temporal loss: Llongterm(p(i), a, x(i), LC, LS) = αLcontent(p, x(i), LC) + βLstyle(a, x(i), LS) + γ ∑ j∈J:i−j≥1 Ltemporal(x(i), ωi i−1 (x(i−1)), c(i−j,i) long ) where: c(i−j,i) long = max(c(i−j,i) − ∑ k∈J:i−k>i−j c(i−k,i), 0) 19
  21. Methods: Video Style Transfer (a) Original frame (b) Ground-truth optical

    flow (c) Temporal consistency weights (d) Previous frame warped to the current frame 20
  22. Methods: Preservation of Original Colors The original color scheme of

    the content image can be preserved using the following image processing algorithm before or after the style transfer [2]: Algorithm 2 Convert stylized image to original colors procedure Convert-To-Original-Colors(xrgb, prgb ) xyuv ← RGB-To-YUV(xrgb ) pyuv ← RGB-To-YUV(prgb ) xy, xu, xv ← SPLIT-CHANNELS(xyuv ) py, pu, pv ← SPLIT-CHANNELS(pyuv ) x′ yuv ← MERGE-CHANNELS(xy, pu, pv ) x′ rgb ← YUV-To-RGB(x′ yuv ) return x′ rgb 21
  23. Methods: Preservation of Original Colors (a) Content image (b) Color

    (UV) channels of the content image (c) Stylized image (d) Luminance (Y) channel of the stylized image 22
  24. Methods: Weighted Style Transfer To perform weighted style transfer we

    define a masking function as: mask(G, m) = G ⊗ expand(m, Nl) where ⊗ denotes an element-wise multiplication. When including content masks to weight the style transfer, we define the style loss function as: Lstyle(a, x, m, LS) = ∑ l∈L wl 1 4N2 l M2 l ∑ i,j (mask(Gl, m)i,j − mask(Al, m)i,j)2 where a is the style image, x is the stylized image, m is the content mask, LS is the set of layers used in the masked style representation. 23
  25. Results Quantitative results: • Plots of Loss Functions Qualitative results:

    • Single-Image Style Transfer • Multiple Style Transfer • Style Interpolation • Gradient Descent Initialization • Masked Style Transfer • Color Preservation 24
  26. Quantitative Results: Loss Functions 0 200 400 600 800 1,000

    109 1010 1011 Iterations log Loss Ltotal Convergence L-BFGS Adam (r = 0.1) Adam (r = 1) Adam (r = 10) 25
  27. Results: Single Style Transfer Transferring the style of various artworks

    to the same content image produces qualitatively convincing results. (a) Content Image: Neckarfront in Tubingen, Germany (b) Gatys et al. [3], Johnson [4], and Our Result 26
  28. Results: Multiple Style Transfer More than one style image can

    be used to blend multiple artistic styles. 29
  29. Results: Style Interpolation When using multiple style images, the degree

    of blending between the images can be controlled. (a) 0.3 The Scream + 0.7 Starry Night (b) 0.7 The Scream + 0.3 Starry Night 30
  30. Results: Weighted Style Transfer Style can be transferred to segmentations

    in the content image. (a) Content Image, Content Mask, Stylized Image 31
  31. Results: Weighted Style Transfer (a) Chan et al. [1] (b)

    Our result (c) Chan et al. [1] (d) Our result (e) Foreground Style 32
  32. Results: Weighted Style Transfer We experimented with multiple masks to

    perform weighted multiple-style transfer. Figure 11: Content Image, Style Image 1, Style Image 2, Content Mask 1, Content Mask 2, Stylized Image 33
  33. Results: Weighted Style Transfer The masking is not done as

    a post-processing step. The masks are inserted into the network. (a) Content Mask (b) Stylized Image 34
  34. Results: Preservation of Color The color scheme of the original

    image can be preserved. (a) Content Image (b) Style Image (c) Style Image Colors (d) Content Image Colors 35
  35. Results: Comparisons of Color Space Conversions (a) Gatys et al.

    [3] (b) L*a*b* color space (c) YCrBr color space (d) L*u*v* color space 36
  36. Style Transfer in Video Video frames rendered using optical flow

    warping and temporal loss functions result in less flickering than the naive approaches. Frame # 1 10 20 Initial Result Initial Result 37
  37. Style Transfer in Video Professional artist Julius Horsthuis (julius-horsthuis.com) used

    my TensorFlow program to add the style of Amsterdam to his entry for the Ryuichi Sakamoto | async International short film competition. [Vimeo] 38
  38. Style Transfer in VR/AR In July 2017, I joined the

    Irish start-up Artomatix. We are applying machine learning to computer graphics in hopes of automating the creation of 3D assets. [YouTube] 39
  39. Future: Style Transfer in VR/AR Machine learning techniques will allow

    artists and designers to create hybrid or photo-surrealistic assets. 40
  40. Limitations We identified several limitations of the algorithm which could

    be researched: • Style Structure: The style representations lose structural relationships. Some artistic styles maintain symmetry and geometry. • Performance: It may be possible to down-sample to a small image, compute a stylized image, up-sample the stylized image, initialize the gradient descent with the larger image, and repeat this process. • VGG Dependence: Other researchers have tried using other networks but they produced inferior results. VGG is trained for classification so it may not learn ideal features for texture synthesis. 41
  41. References I References [1] E. Chan and R. Bhargava. Show,

    divide and neural: Weighted style transfer. Technical report, 2016. [2] L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shechtman. Preserving color in neural artistic style transfer. 2016. [3] L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm of artistic style. 2015. [4] J. Johnson. neural-style, 2015. [5] M. Ruder, A. Dosovitskiy, and T. Brox. Artistic style transfer for videos. 2016. 43
  42. Computer Vision and Object Recognition From the 1950’s until the

    2010’s, machine learning and computer vision research focused on the development of techniques to extract “hand-crafted” features for object recognition tasks. Traditional model of object recognition: 44