Convolutional Neural Networks
for Artistic Style Transfer
Harish Narayanan
@copingbear
harishnarayanan.org/writing/artistic-style-transfer
github.com/hnarayanan/artistic-style-transfer
Slide 2
Slide 2 text
Artistic style transfer is gorgeous and popular
Slide 3
Slide 3 text
Artistic style transfer is gorgeous and popular
Slide 4
Slide 4 text
Artistic style transfer is gorgeous and popular
3
Slide 5
Slide 5 text
Artistic style transfer is gorgeous and popular
3
Slide 6
Slide 6 text
Content image c
Style image s
4
Slide 7
Slide 7 text
Content image c
Style image s
Style-transferred image x
4
Slide 8
Slide 8 text
Content image c
Style image s
Style-transferred image x
4
Slide 9
Slide 9 text
Content image c
Style image s
Style-transferred image x
4
Slide 10
Slide 10 text
We formally pose it as an optimisation problem!
5
Slide 11
Slide 11 text
We formally pose it as an optimisation problem!
5
L
content
( ) ≈ 0
,
Slide 12
Slide 12 text
We formally pose it as an optimisation problem!
5
L
content
( ) ≈ 0
,
( ) ≈ 0
,
Lstyle
Slide 13
Slide 13 text
We formally pose it as an optimisation problem!
L(
c
,
s
,
x
) = ↵L
content
(
c
,
x
) + L
style
(
s
,
x
)
6
x
⇤ = argmin
x
L(
c
,
s
,
x
)
Slide 14
Slide 14 text
We formally pose it as an optimisation problem!
L(
c
,
s
,
x
) = ↵L
content
(
c
,
x
) + L
style
(
s
,
x
)
6
x
⇤ = argmin
x
L(
c
,
s
,
x
)
⭐ Content and style losses are not defined in a per-pixel
difference sense, but higher-level semantic differences!
Slide 15
Slide 15 text
But then how does one write a program to
perceive semantic differences?!?
7
Slide 16
Slide 16 text
But then how does one write a program to
perceive semantic differences?!?
7
⭐ We don’t! We turn to machine learning.
Slide 17
Slide 17 text
But then how does one write a program to
perceive semantic differences?!?
• Image classification problem
(linear ➞ neural network ➞ convnet)
7
⭐ We don’t! We turn to machine learning.
Slide 18
Slide 18 text
But then how does one write a program to
perceive semantic differences?!?
• Image classification problem
(linear ➞ neural network ➞ convnet)
• Break
7
⭐ We don’t! We turn to machine learning.
Slide 19
Slide 19 text
But then how does one write a program to
perceive semantic differences?!?
• Image classification problem
(linear ➞ neural network ➞ convnet)
• Break
• Download a pre-trained convnet classifier,
and repurpose it for style transfer
7
⭐ We don’t! We turn to machine learning.
Slide 20
Slide 20 text
But then how does one write a program to
perceive semantic differences?!?
• Image classification problem
(linear ➞ neural network ➞ convnet)
• Break
• Download a pre-trained convnet classifier,
and repurpose it for style transfer
• Concluding thoughts
7
⭐ We don’t! We turn to machine learning.
Slide 21
Slide 21 text
Let’s start with a more basic problem to motivate
our approach: The image classification problem
8
f( ) 99% Baby
0.8% Dog
0.1% Car
0.1% Toothbrush
Slide 22
Slide 22 text
Let’s start with a more basic problem to motivate
our approach: The image classification problem
8
f( ) 99% Baby
0.8% Dog
0.1% Car
0.1% Toothbrush
z }| {
D = W ⇥ H ⇥ 3 z}|{
K
Slide 23
Slide 23 text
Image classification is a challenging problem
9
Slide 24
Slide 24 text
Image classification is a challenging problem
9
Slide 25
Slide 25 text
⭐ There is a semantic gap between the input
representation and the task at hand
Slide 26
Slide 26 text
The pieces that make up a supervised learning
solution to the image classification problem
11
Slide 27
Slide 27 text
The pieces that make up a supervised learning
solution to the image classification problem
11
Slide 28
Slide 28 text
The pieces that make up a supervised learning
solution to the image classification problem
11
Slide 29
Slide 29 text
The pieces that make up a supervised learning
solution to the image classification problem
11
Slide 30
Slide 30 text
The pieces that make up a supervised learning
solution to the image classification problem
11
Slide 31
Slide 31 text
The simplest learning image classifier:
The linear classifier
12
f
(
x
;
W
,
b
) =
Wx
+
b
sj = (
f
)j =
efj
PK
k=1
efk
✓ = (W, b)
Linear score function Parameters to learn
Slide 32
Slide 32 text
The simplest learning image classifier:
The linear classifier
12
f
(
x
;
W
,
b
) =
Wx
+
b
sj = (
f
)j =
efj
PK
k=1
efk
✓ = (W, b)
Linear score function Parameters to learn
Cross-entropy loss function
Ly(
s
) =
X
i
yi log(
si)
Slide 33
Slide 33 text
A simplified look at gradient descent
13
L(w)
w
Slide 34
Slide 34 text
A simplified look at gradient descent
13
L(w)
w
w0
Slide 35
Slide 35 text
A simplified look at gradient descent
13
L(w)
w
w0
w1
w1 = w0 ⌘
dL
dw
(w0)
Slide 36
Slide 36 text
A simplified look at gradient descent
13
L(w)
w
w0
w1
w1 = w0 ⌘
dL
dw
(w0)
w2
w2 = w1 ⌘
dL
dw
(w1)
Slide 37
Slide 37 text
A simplified look at gradient descent
13
L(w)
w
w0
w1
w1 = w0 ⌘
dL
dw
(w0)
w2
w2 = w1 ⌘
dL
dw
(w1)
w
optimal
dL
dw
= 0
.
.
.
w
optimal
Slide 38
Slide 38 text
"# The linear image classifier in TensorFlow
14
github.com/hnarayanan/artistic-style-transfer
Slide 39
Slide 39 text
No content
Slide 40
Slide 40 text
No content
Slide 41
Slide 41 text
– TensorFlow Docs Authors
“Getting 92% accuracy on MNIST is bad.
It’s almost embarrassingly bad.”
Slide 42
Slide 42 text
Moving to a nonlinear score function:
Introducing the neuron
17
Slide 43
Slide 43 text
Moving to a nonlinear score function:
Introducing the neuron
17
Slide 44
Slide 44 text
Moving to a nonlinear score function:
Stacking neurons into a first neural network
18
����� �����
������ ����� �����
������ ����
x
y1 = W1x + b1
h1 = max(0
,
y1)
y2 = W2h1 + b2
s = (y2)
Slide 45
Slide 45 text
Moving to a nonlinear score function:
Stacking neurons into a first neural network
18
����� �����
������ ����� �����
������ ����
x
y1 = W1x + b1
h1 = max(0
,
y1)
y2 = W2h1 + b2
s = (y2)
Slide 46
Slide 46 text
Moving to a nonlinear score function:
Stacking neurons into a first neural network
18
����� �����
������ ����� �����
������ ����
x
y1 = W1x + b1
h1 = max(0
,
y1)
y2 = W2h1 + b2
s = (y2)
Slide 47
Slide 47 text
"# A first neural network-based image classifier
in TensorFlow
19
github.com/hnarayanan/artistic-style-transfer
Slide 48
Slide 48 text
"# An improved neural network-based classifier
in TensorFlow
20
Slide 49
Slide 49 text
"# An improved neural network-based classifier
in TensorFlow
20
⭐ Just because we can fit anything doesn’t
mean our learning algorithm will find that fit!
Slide 50
Slide 50 text
"# An improved neural network-based classifier
in TensorFlow
20
github.com/hnarayanan/artistic-style-transfer
Slide 51
Slide 51 text
Tinkering with neural network architectures to get a
feeling for approximation capabilities
21
Example 1 Example 2 Example 3
playground.tensorflow.org
Slide 52
Slide 52 text
⭐ Neural networks can learn features we’d otherwise
need to hand-engineer with domain knowledge.
Slide 53
Slide 53 text
Standard neural networks are not the best option
when it comes to dealing with image data
23
28 px × 28 px
784 px
…
They disregard the structure of the image
Slide 54
Slide 54 text
Standard neural networks are not the best option
when it comes to dealing with image data
24
Number of parameters they need to learn grows rapidly
Linear:
784×10 + 10 = 7,850
����� �����
������ ����
Slide 55
Slide 55 text
Standard neural networks are not the best option
when it comes to dealing with image data
24
Number of parameters they need to learn grows rapidly
Neural Network (1 hidden layer):
784×100 + 100 + 100×10 + 10 = 79,510
����� �����
������ ����� �����
������ ����
Slide 56
Slide 56 text
Standard neural networks are not the best option
when it comes to dealing with image data
24
Number of parameters they need to learn grows rapidly
Neural Network (2 hidden layers):
784×400 + 400 + 400×100 + 100 + 100×10 + 10 = 355,110
����� �����
������ ����� � �����
������ ����� � �����
������ ����
Slide 57
Slide 57 text
Convolutional neural networks to the rescue!
25
Regular (Fully Connected)
Neural Network
Convolutional
Neural Network
Slide 58
Slide 58 text
Core pieces of a convolutional neural network:
The convolutional layer
26
K (filters) = 2
F (extent) = 3
S (stride) = 2
P (padding) = 1
Slide 59
Slide 59 text
Core pieces of a convolutional neural network:
The convolutional layer
26
K (filters) = 2
F (extent) = 3
S (stride) = 2
P (padding) = 1
Slide 60
Slide 60 text
Core pieces of a convolutional neural network:
The pooling layer
27
F (extent) = 2
S (stride) = 2
Slide 61
Slide 61 text
Core pieces of a convolutional neural network:
The pooling layer
27
F (extent) = 2
S (stride) = 2
Slide 62
Slide 62 text
"# An accurate convnet-based image classifier
in TensorFlow
28
github.com/hnarayanan/artistic-style-transfer
Slide 63
Slide 63 text
Better understanding what a convnet-based
classifier does with the MNIST data
29
transcranial.github.io/keras-js/#/mnist-cnn
Slide 64
Slide 64 text
⭐ Deep learning (and convnets in particular) are all
about learning representations
30
Slide 65
Slide 65 text
No content
Slide 66
Slide 66 text
No content
Slide 67
Slide 67 text
Introducing a powerful convnet-based classifier at
the heart of the Gatys style transfer paper
32
VGG Net: Networks systematically composed of
3×3 CONV layers. (ReLU not shown for brevity.)
Slide 68
Slide 68 text
Introducing a powerful convnet-based classifier at
the heart of the Gatys style transfer paper
32
VGG Net: Networks systematically composed of
3×3 CONV layers. (ReLU not shown for brevity.)
Slide 69
Slide 69 text
Let’s start with a pre-trained VGG Net in Keras
33
138 million parameters (VGG16) trained on ImageNet
Slide 70
Slide 70 text
Let’s start with a pre-trained VGG Net in Keras
33
Keras coming to TensorFlow core in 1.2!
Slide 71
Slide 71 text
"# Fetching and playing with a pre-trained VGG Net
in Keras
34
github.com/hnarayanan/artistic-style-transfer
Slide 72
Slide 72 text
35
Content image c
Style image s
Style-transferred image x
Slide 73
Slide 73 text
Recall the style transfer optimisation problem
36
L
content
( ) ≈ 0
,
( ) ≈ 0
,
Lstyle
L(
c
,
s
,
x
) = ↵L
content
(
c
,
x
) + L
style
(
s
,
x
)
x
⇤ = argmin
x
L(
c
,
s
,
x
)
Slide 74
Slide 74 text
VGG Net has already learnt to encode perceptual and
semantic information that we need to measure our losses!
37
Slide 75
Slide 75 text
VGG Net has already learnt to encode perceptual and
semantic information that we need to measure our losses!
37
x
s
c
Slide 76
Slide 76 text
How we explicitly calculate the style and
content losses
38
Ll
content
(
c
,
x
) =
1
2
X
i,j
Cl
ij
Xl
ij
2
Slide 77
Slide 77 text
How we explicitly calculate the style and
content losses
38
Ll
content
(
c
,
x
) =
1
2
X
i,j
Cl
ij
Xl
ij
2
Gij(
A
) =
X
k
AikAjk
El(
s
,
x
) =
1
4N2
l
M2
l
X
i,j
Gij(
S
l) Gij(
X
l) 2
Lstyle(
s
,
x
) =
L
X
l=0
wlEl(
s
,
x
)
Slide 78
Slide 78 text
The last remaining technical bits and bobs
39
Total variation loss to control smoothness of the generated image
LTV(
x
) =
X
i,j
(
xi,j+1 xij)2 + (
xi+1,j xij)2
Slide 79
Slide 79 text
The last remaining technical bits and bobs
39
Total variation loss to control smoothness of the generated image
LTV(
x
) =
X
i,j
(
xi,j+1 xij)2 + (
xi+1,j xij)2
L-BFGS used as the optimisation algorithm since we’re only generating one image
Slide 80
Slide 80 text
"# Concrete implementation of the artistic style
transfer algorithm in Keras
40
github.com/hnarayanan/artistic-style-transfer
Slide 81
Slide 81 text
41
Slide 82
Slide 82 text
41
Slide 83
Slide 83 text
Let’s look at some examples over a range of styles
c_w = 0.025
s_w = 5
t_v_w = 0.1
c_w = 0.025
s_w = 5
t_v_w = 5
c_w = 0.025
s_w = 5
t_v_w = 0.5
c_w = 0.025
s_w = 5
t_v_w = 1
Slide 84
Slide 84 text
Let’s look at some examples over a range of styles
c_w = 0.025
s_w = 5
t_v_w = 1
c_w = 0.025
s_w = 5
t_v_w = 0.1
c_w = 0.025
s_w = 5
t_v_w = 1
c_w = 0.025
s_w = 5
t_v_w = 0.5
Slide 85
Slide 85 text
And over a range of hyperparameters
Slide 86
Slide 86 text
And over a range of hyperparameters
Slide 87
Slide 87 text
And over a range of hyperparameters
c_w = 0.025
s_w = 0.1—10
t_v_w = 0.1
Slide 88
Slide 88 text
Prisma
Us
Style
Slide 89
Slide 89 text
Prisma
Us
Style
Slide 90
Slide 90 text
Some broad concluding thoughts
46
Slide 91
Slide 91 text
Some broad concluding thoughts
• Turn to machine learning when you have general problems
that seem intuitive to state, but where it’s hard to explicitly
write down all the solution steps
• Note that this difficulty often stems from a semantic gap
between the input representation and the task at hand
46
Slide 92
Slide 92 text
Some broad concluding thoughts
• Turn to machine learning when you have general problems
that seem intuitive to state, but where it’s hard to explicitly
write down all the solution steps
• Note that this difficulty often stems from a semantic gap
between the input representation and the task at hand
• Just because a function can fit something doesn’t mean the
learning algorithm will always find that fit
46
Slide 93
Slide 93 text
Some broad concluding thoughts
• Turn to machine learning when you have general problems
that seem intuitive to state, but where it’s hard to explicitly
write down all the solution steps
• Note that this difficulty often stems from a semantic gap
between the input representation and the task at hand
• Just because a function can fit something doesn’t mean the
learning algorithm will always find that fit
• Deep learning is all about representation learning. They can
learn features we’d otherwise need to hand-engineer with
domain knowledge.
46
Slide 94
Slide 94 text
… and closer to this evening’s workshop
47
Slide 95
Slide 95 text
… and closer to this evening’s workshop
• In studying the problem of cat vs. baby deeply, you’ve
learnt how to see. You can repurpose this knowledge!
47
Slide 96
Slide 96 text
… and closer to this evening’s workshop
• In studying the problem of cat vs. baby deeply, you’ve
learnt how to see. You can repurpose this knowledge!
• Convnets are really good at computer vision tasks, but
they’re not infallible
47
Slide 97
Slide 97 text
… and closer to this evening’s workshop
• In studying the problem of cat vs. baby deeply, you’ve
learnt how to see. You can repurpose this knowledge!
• Convnets are really good at computer vision tasks, but
they’re not infallible
• TensorFlow is great, but Keras is what you likely want to
be using to experiment quickly
47
Slide 98
Slide 98 text
… and closer to this evening’s workshop
• In studying the problem of cat vs. baby deeply, you’ve
learnt how to see. You can repurpose this knowledge!
• Convnets are really good at computer vision tasks, but
they’re not infallible
• TensorFlow is great, but Keras is what you likely want to
be using to experiment quickly
• Instead of solving an optimisation problem, train a
network to approximate solutions to it for 1000x speedup
47