Cameron Smith - Neural Style Transfer for Images and Video in TensorFlow

Style Transfer for Images and Video with TensorFlow An introduction
to neural style transfer using Google’s open-source software library for machine learning, TensorFlow.

темы доклада Cameron Smith Cameron Smith is a research assistant
at the Irish start-up Artomatix where he is using AI (deep learning), computer vision, and computer graphics to automate the creation of 3d assets for gaming, virtual reality, and film making. He completed a Bachelor of Science and Master of Science in Computer Science from The University of New Mexico. He also took classes in art and graphic design. Research Assistant, Artomatix

Style Transfer for Images and Video with TensorFlow github.com/cysmith/neural-style-tf Cameron
Smith September 9, 2017

Outline Здравствуйте About Me: • Research Assistant at Artomatix (artomatix.com)
• BS/MS in Computer Science • Interested in the burgeoning relationships between computer graphics, computer vision, and machine learning Road Map: • Brief Introduction to Deep Learning for Computer Vision • Neural Algorithm for Artistic Style • Methods and Theory • Results • Current Work • Question & Answer Session 2

Motivations for Deep Neural Networks A limitation of using “hand-crafted”
features (e.g. SIFT, SURF, HOG) and machine learning classiﬁers (K-Nearest Neighbor, Support Vector Machine, and shallow neural networks) is that these approaches do not scale well to large data sets. 3

Computer Vision and Deep Learning In 2012, a team of
researchers at the University of Montreal won the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) by achieving a top-5 error rate of 15.5%. The result catalyzed research into convolutional neural networks for image recognition. 4

Deep Neural Network Models There are several well-known deep neural
networks trained on large data sets for state-of-the-art image recognition. • LeNet: The best known convolutional neural network for classifying handwritten digits. The network was developed in the 1980’s. • AlexNet: The winner of the ILSVRC challenge in 2012. The network popularized the use of convolutional neural networks in image recognition. • GoogLeNet: The winner of the ILSVRC challenge in 2014. The network substantially reduced the number of parameters. • ResNet: Very deep networks with residual blocks. Winner of ILSVRC challenge in 2015 in several categories: classiﬁcation, detection, localization, detection, and segmentation. 5

VGG Network • Architecture built and trained by the Visual
Geometry Group (VGG) at Oxford University. • The runner-up of the ILSVRC challenge in 2014. • VGG-19 contains 19 weighted layers and requires millions of parameters. 6

Methods and Implementations This talks presents the following: • Single-Image
Style Transfer [Gatys et al.] • Multiple Style Transfer [Siegel et al.] • Texture and Photorealistic Style Transfer [Gatys et al.] • Weighted Style Transfer [Chan et al.] • Video Style Transfer [Ruder et al.] • Preservation of Color in Style Transfer [Gatys et al.] Additionally, we present an improved Weighted Style Transfer technique and an extension of the Preservation of Color technique. 7

Methods: A Neural Algorithm for Artistic Style In September 2015,
Gatys et al. published the paper “A Neural Algorithm for Artistic Style.” The paper presents a system that uses neural representations to separate and combine content and style of arbitrary images. conv1 conv2 conv4 conv5 Table 1: Style and Content Representations 8

Methods: Key Insights The style transfer problem was formulated as
an energy minimization problem consisting of a content loss and a style loss. The key insights behind the algorithm are: • Content Representation: The features extracted from the hidden layers in a CNN carry information about the content of an input image x. • Style Representation: The correlations between the features extracted from the hidden layers in a CNN carry information about the style of an input image x. 9

Methods: Content Loss An image x is encoded in each
layer of the CNN by the ﬁlter responses to the image. A layer with Nl distinct ﬁlters has Nl feature maps each of size Ml , where Ml is the height times the width of the feature map. 10

Methods: Content Loss The responses in a layer l can
be stored in a matrix Fl ∈ RNl×Ml where Fl i,j is the activation of the i-th ﬁlter at position j in layer l. Let p and x be the original image and the generated image. Let Pl and Xl be the image’s respective feature representation in layer l. The squared-error loss between the two feature representations is: Lcontent(p, x, l) = 1 2 ∑ i,j (Fl i,j − Pl i,j )2 The derivative of this loss with respect to the activation layer l equals: ∂Lcontent ∂Fl i,j = { (Fl − Pl)i,j if Fl i,j > 0 0 if Fl i,j < 0 11

Methods: Content Loss TensorFlow code Math equation: Lcontent(p, x, l)
= 1 2 ∑ i,j (Fl i,j − Pl i,j )2 TensorFlow code: def content_layer_loss(p, x): _, h, w, d = p.get_shape() M = h.value * w.value N = d.value K = 1. / 2. loss = K * tf.reduce_sum(tf.pow((x - p), 2)) return loss 12

Methods: Style Loss The style representation computes the correlation between
the different ﬁlter responses. The feature correlations are given by the Gram matrix Gl ∈ RNl×Ml where Gl i,j is the inner product between the vectorized feature map i and j in layer l: Gl i,j = ∑ k Fl i,k Fl j,k TensorFlow code: def gram_matrix(x, area, depth): F = tf.reshape(x, (area, depth)) G = tf.matmul(tf.transpose(F), F) return G 13

Methods: Style Loss Let a and x be the original
image and the generated image. Let Al and Gl be the image’s respective feature representation in layer l. The contribution of that layer to the total loss is: El = 1 4N2 l M2 l ∑ i,j (Gl i,j − Al i,j )2 and the total loss is: Lstyle(a, x) = L ∑ i=0 wi Ei The derivative of El with respect to the activation layer l equals: ∂El ∂Fl i,j =    1 4N2 l M2 l ((Fl)T(Gl − Al))j,i if Fl i,j > 0 0 if Fl i,j < 0 14

Methods: Style Loss TensorFlow code Math equation: El = 1
4N2 l M2 l ∑ i,j (Gl i,j − Al i,j )2 TensorFlow code: def style_layer_loss(a, x): _, h, w, d = a.get_shape() M = h.value * w.value N = d.value A = gram_matrix(a, M, N) G = gram_matrix(x, M, N) K = 1. / (4 * N**2 * M**2) loss = K * tf.reduce_sum(tf.pow((G - A), 2)) return loss 15

Methods: Total Loss The new ‘stylized’ image is synthesized by
jointly minimizing the distance of the feature representations of a white noise image. The combined style and content loss is a linear combination of the style and content losses: Ltotal(p, a, x, LC, LS) = αLcontent(p, x, LC) + βLstyle(a, x, LS) where α is the weighting of the content reconstruction, β is the weighting of the style reconstruction, LC is the set of layers used in the content representation, and LS is the set of layers used in the style representation. 16

Methods: Total Loss Algorithm 1 Neural Algorithm for Artistic Style
procedure Neural-Style(p, a, x, LC, LS, α, β) Lstyle(a, x) ← ∑ l∈LS wl 1 4N2 l M2 l ∑ i,j (Gl i,j − Al i,j )2 Lcontent(p, x) ← ∑ l∈LC wl 1 2 ∑ i,j (Fl i,j − Pl i,j )2 Ltotal(p, a, x, LC, LS) ← αLcontent(p, x, LC) + βLstyle(a, x, LS) while convergence criterion is not met do x ← Minimize(Ltotal) return x 17

Methods: Neural Style Transfer in Video There are several different
approaches: • Naive: Initialize gradient descent with white noise or the current frame. • Better: Initialize gradient descent with the previous stylized frame. First frame must be initialized with white noise. • Best: Initialize gradient descent with the previous stylized frame warped to the current frame using the optical ﬂow between the pair of frames. Additionally, use loss functions for temporal consistency. 18

Methods: Temporal Loss Functions The total loss function is the
same as before but there is an additional loss function [5]. Short-term temporal loss: Lshortterm(p(i), a, x(i), LC, LS) = αLcontent(p, x(i), LC) + βLstyle(a, x(i), LS) + γLtemporal ( x(i), ωi i−1 (x(i−1)), c(i−1,i) ) Long-term temporal loss: Llongterm(p(i), a, x(i), LC, LS) = αLcontent(p, x(i), LC) + βLstyle(a, x(i), LS) + γ ∑ j∈J:i−j≥1 Ltemporal(x(i), ωi i−1 (x(i−1)), c(i−j,i) long ) where: c(i−j,i) long = max(c(i−j,i) − ∑ k∈J:i−k>i−j c(i−k,i), 0) 19

Methods: Video Style Transfer (a) Original frame (b) Ground-truth optical
ﬂow (c) Temporal consistency weights (d) Previous frame warped to the current frame 20

Methods: Preservation of Original Colors The original color scheme of
the content image can be preserved using the following image processing algorithm before or after the style transfer [2]: Algorithm 2 Convert stylized image to original colors procedure Convert-To-Original-Colors(xrgb, prgb ) xyuv ← RGB-To-YUV(xrgb ) pyuv ← RGB-To-YUV(prgb ) xy, xu, xv ← SPLIT-CHANNELS(xyuv ) py, pu, pv ← SPLIT-CHANNELS(pyuv ) x′ yuv ← MERGE-CHANNELS(xy, pu, pv ) x′ rgb ← YUV-To-RGB(x′ yuv ) return x′ rgb 21

Methods: Preservation of Original Colors (a) Content image (b) Color
(UV) channels of the content image (c) Stylized image (d) Luminance (Y) channel of the stylized image 22

Methods: Weighted Style Transfer To perform weighted style transfer we
deﬁne a masking function as: mask(G, m) = G ⊗ expand(m, Nl) where ⊗ denotes an element-wise multiplication. When including content masks to weight the style transfer, we deﬁne the style loss function as: Lstyle(a, x, m, LS) = ∑ l∈L wl 1 4N2 l M2 l ∑ i,j (mask(Gl, m)i,j − mask(Al, m)i,j)2 where a is the style image, x is the stylized image, m is the content mask, LS is the set of layers used in the masked style representation. 23

Results Quantitative results: • Plots of Loss Functions Qualitative results:
• Single-Image Style Transfer • Multiple Style Transfer • Style Interpolation • Gradient Descent Initialization • Masked Style Transfer • Color Preservation 24

Quantitative Results: Loss Functions 0 200 400 600 800 1,000
109 1010 1011 Iterations log Loss Ltotal Convergence L-BFGS Adam (r = 0.1) Adam (r = 1) Adam (r = 10) 25

Results: Single Style Transfer Transferring the style of various artworks
to the same content image produces qualitatively convincing results. (a) Content Image: Neckarfront in Tubingen, Germany (b) Gatys et al. [3], Johnson [4], and Our Result 26

Results: Single Style Transfer (a) Gatys et al. [3], Johnson
[4], and Our Result 27

Results: Single Style Transfer (a) Gatys et al. [3], Johnson
[4], and Our Result 28

Results: Multiple Style Transfer More than one style image can
be used to blend multiple artistic styles. 29

Results: Style Interpolation When using multiple style images, the degree
of blending between the images can be controlled. (a) 0.3 The Scream + 0.7 Starry Night (b) 0.7 The Scream + 0.3 Starry Night 30

Results: Weighted Style Transfer Style can be transferred to segmentations
in the content image. (a) Content Image, Content Mask, Stylized Image 31

Results: Weighted Style Transfer (a) Chan et al. [1] (b)
Our result (c) Chan et al. [1] (d) Our result (e) Foreground Style 32

Results: Weighted Style Transfer We experimented with multiple masks to
perform weighted multiple-style transfer. Figure 11: Content Image, Style Image 1, Style Image 2, Content Mask 1, Content Mask 2, Stylized Image 33

Results: Weighted Style Transfer The masking is not done as
a post-processing step. The masks are inserted into the network. (a) Content Mask (b) Stylized Image 34

Results: Preservation of Color The color scheme of the original
image can be preserved. (a) Content Image (b) Style Image (c) Style Image Colors (d) Content Image Colors 35

Results: Comparisons of Color Space Conversions (a) Gatys et al.
[3] (b) L*a*b* color space (c) YCrBr color space (d) L*u*v* color space 36

Style Transfer in Video Video frames rendered using optical ﬂow
warping and temporal loss functions result in less ﬂickering than the naive approaches. Frame # 1 10 20 Initial Result Initial Result 37

Style Transfer in Video Professional artist Julius Horsthuis (julius-horsthuis.com) used
my TensorFlow program to add the style of Amsterdam to his entry for the Ryuichi Sakamoto | async International short ﬁlm competition. [Vimeo] 38

Style Transfer in VR/AR In July 2017, I joined the
Irish start-up Artomatix. We are applying machine learning to computer graphics in hopes of automating the creation of 3D assets. [YouTube] 39

Future: Style Transfer in VR/AR Machine learning techniques will allow
artists and designers to create hybrid or photo-surrealistic assets. 40

Limitations We identiﬁed several limitations of the algorithm which could
be researched: • Style Structure: The style representations lose structural relationships. Some artistic styles maintain symmetry and geometry. • Performance: It may be possible to down-sample to a small image, compute a stylized image, up-sample the stylized image, initialize the gradient descent with the larger image, and repeat this process. • VGG Dependence: Other researchers have tried using other networks but they produced inferior results. VGG is trained for classiﬁcation so it may not learn ideal features for texture synthesis. 41

Спасибо. Questions? 42

References I References [1] E. Chan and R. Bhargava. Show,
divide and neural: Weighted style transfer. Technical report, 2016. [2] L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shechtman. Preserving color in neural artistic style transfer. 2016. [3] L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm of artistic style. 2015. [4] J. Johnson. neural-style, 2015. [5] M. Ruder, A. Dosovitskiy, and T. Brox. Artistic style transfer for videos. 2016. 43

Computer Vision and Object Recognition From the 1950’s until the
2010’s, machine learning and computer vision research focused on the development of techniques to extract “hand-crafted” features for object recognition tasks. Traditional model of object recognition: 44

Cameron Smith - Neural Style Transfer for Image...

Cameron Smith - Neural Style Transfer for Images and Video in TensorFlow

More Decks by gdg_rnd

Other Decks in Technology

Featured

Transcript